[Yazlist] Fields ending in combining diacritics

Adam Dickmeiss adam at indexdata.dk
Thu Mar 8 23:07:29 CET 2007


Gary Anderson wrote:
> I am using the siconv interface.  I have a programmatic process that 
> deals with very large files of records.
> 
> Adam Dickmeiss wrote:
> 
>> Gary Anderson wrote:
>>
>>> I recently ran some tests using records from the National Library of 
>>> Canada.  Of the 600,000+ records in their name and subject authority 
>>> file, six records had 670 tags where the subfield a data ended in a 
>>> combining diacritic character with no following character.
>>>
>>> Submitting that data string 
>>> (indicators+subfieldmark+subfieldcode+data+fieldmark) to siconvert 
>>> resulted in an output string that did not contain the diacritic 
>>> character.  It was dropped.  The field mark character was retained.  
>>> Can you suggest a means for notifying the caller when this condition 
>>> occurs?  Byte counts don't really work because UTF8 is one side or 
>>> the other of the conversion transaction.
>>>
>>> The ending diacritic values were:  0xE2, 0xE5, 0xE8, 0xEA, and 0xF6.

I think you need to do is to "flush" reset to the "initial state". The 
flush would take place after a field or subfield ends.

That's done by iconv and, hopefully, yaz_iconv by setting inbuf or 
*inbuf to NULL, but outbut to non-NULL, i.e.

yaz_iconv(cd, 0, 0, &outbuf, &outbytesleft);

 From 'man 3 iconv':
"
A different case is when inbuf is NULL or *inbuf is NULL, but outbuf is
not NULL and *outbuf is not NULL. In this case,  the  iconv()  function
attempts  to set cd's conversion state to the initial state and store a
corresponding shift sequence at *outbuf.  At most *outbytesleft  bytes,
starting at *outbuf, will be written.  If the output buffer has no more
room for this reset sequence,  it  sets  errno  to  E2BIG  and  returns
(size_t)(-1).  Otherwise  it  increments  *outbuf  and decrements *out-
bytesleft by the number of bytes written.
"

Use YAZ 2.1.48 or later for this to work.

/ Adam

>>>
>> Did you use yaz-marcdump for the conversion?
>>
>> Or did you do something else ? (such as programming towards the siconv 
>> interface)?
>>
>> / Adam
>>
>>> Thanks
>>> Gary
>>>
>>> _______________________________________________
>>> Yazlist mailing list
>>> Yazlist at lists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
> 
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist




More information about the Yazlist mailing list