[Yazlist] Fields ending in combining diacritics
adam at indexdata.dk
Thu Mar 8 23:07:29 CET 2007
Gary Anderson wrote:
> I am using the siconv interface. I have a programmatic process that
> deals with very large files of records.
> Adam Dickmeiss wrote:
>> Gary Anderson wrote:
>>> I recently ran some tests using records from the National Library of
>>> Canada. Of the 600,000+ records in their name and subject authority
>>> file, six records had 670 tags where the subfield a data ended in a
>>> combining diacritic character with no following character.
>>> Submitting that data string
>>> (indicators+subfieldmark+subfieldcode+data+fieldmark) to siconvert
>>> resulted in an output string that did not contain the diacritic
>>> character. It was dropped. The field mark character was retained.
>>> Can you suggest a means for notifying the caller when this condition
>>> occurs? Byte counts don't really work because UTF8 is one side or
>>> the other of the conversion transaction.
>>> The ending diacritic values were: 0xE2, 0xE5, 0xE8, 0xEA, and 0xF6.
I think you need to do is to "flush" reset to the "initial state". The
flush would take place after a field or subfield ends.
That's done by iconv and, hopefully, yaz_iconv by setting inbuf or
*inbuf to NULL, but outbut to non-NULL, i.e.
yaz_iconv(cd, 0, 0, &outbuf, &outbytesleft);
From 'man 3 iconv':
A different case is when inbuf is NULL or *inbuf is NULL, but outbuf is
not NULL and *outbuf is not NULL. In this case, the iconv() function
attempts to set cd's conversion state to the initial state and store a
corresponding shift sequence at *outbuf. At most *outbytesleft bytes,
starting at *outbuf, will be written. If the output buffer has no more
room for this reset sequence, it sets errno to E2BIG and returns
(size_t)(-1). Otherwise it increments *outbuf and decrements *out-
bytesleft by the number of bytes written.
Use YAZ 2.1.48 or later for this to work.
>> Did you use yaz-marcdump for the conversion?
>> Or did you do something else ? (such as programming towards the siconv
>> / Adam
>>> Yazlist mailing list
>>> Yazlist at lists.indexdata.dk
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
More information about the Yazlist