[Yazlist] Fields ending in combining diacritics

Gary Anderson ganderson at bslw.com
Thu Mar 8 23:25:51 CET 2007


I am not sure how this will help.  In the application, the last 2 bytes 
of the data string are oxea and 0x1e - the diacritic and the record 
mark.  yaz_iconv seems to drop the diacritic because it doesn't have a 
trailing character, but it does process the record mark.  What I need is 
something that will tell me that this case has occurred.  It looks to me 
like yaz just drops the diacritic.

My checking indicates that on completion of conversion of the record 
mark, the yaz_iconv library is left in its 'initial state'.  The next 
string converts just fine.
Gary

Adam Dickmeiss wrote:

> Gary Anderson wrote:
>
>> I am using the siconv interface.  I have a programmatic process that 
>> deals with very large files of records.
>>
>> Adam Dickmeiss wrote:
>>
>>> Gary Anderson wrote:
>>>
>>>> I recently ran some tests using records from the National Library 
>>>> of Canada.  Of the 600,000+ records in their name and subject 
>>>> authority file, six records had 670 tags where the subfield a data 
>>>> ended in a combining diacritic character with no following character.
>>>>
>>>> Submitting that data string 
>>>> (indicators+subfieldmark+subfieldcode+data+fieldmark) to siconvert 
>>>> resulted in an output string that did not contain the diacritic 
>>>> character.  It was dropped.  The field mark character was 
>>>> retained.  Can you suggest a means for notifying the caller when 
>>>> this condition occurs?  Byte counts don't really work because UTF8 
>>>> is one side or the other of the conversion transaction.
>>>>
>>>> The ending diacritic values were:  0xE2, 0xE5, 0xE8, 0xEA, and 0xF6.
>>>
>
> I think you need to do is to "flush" reset to the "initial state". The 
> flush would take place after a field or subfield ends.
>
> That's done by iconv and, hopefully, yaz_iconv by setting inbuf or 
> *inbuf to NULL, but outbut to non-NULL, i.e.
>
> yaz_iconv(cd, 0, 0, &outbuf, &outbytesleft);
>
> From 'man 3 iconv':
> "
> A different case is when inbuf is NULL or *inbuf is NULL, but outbuf is
> not NULL and *outbuf is not NULL. In this case,  the  iconv()  function
> attempts  to set cd's conversion state to the initial state and store a
> corresponding shift sequence at *outbuf.  At most *outbytesleft  bytes,
> starting at *outbuf, will be written.  If the output buffer has no more
> room for this reset sequence,  it  sets  errno  to  E2BIG  and  returns
> (size_t)(-1).  Otherwise  it  increments  *outbuf  and decrements *out-
> bytesleft by the number of bytes written.
> "
>
> Use YAZ 2.1.48 or later for this to work.
>
> / Adam
>
>>>>
>>> Did you use yaz-marcdump for the conversion?
>>>
>>> Or did you do something else ? (such as programming towards the 
>>> siconv interface)?
>>>
>>> / Adam
>>>
>>>> Thanks
>>>> Gary
>>>>
>>>> _______________________________________________
>>>> Yazlist mailing list
>>>> Yazlist at lists.indexdata.dk
>>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Yazlist mailing list
>>> Yazlist at lists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ganderson.vcf
Type: text/x-vcard
Size: 235 bytes
Desc: not available
Url : http://lists.indexdata.dk/pipermail/yazlist/attachments/20070308/f9c02277/ganderson.vcf


More information about the Yazlist mailing list