[Yazlist] Another MARC8 conversion problem.

Adam Dickmeiss adam at indexdata.dk
Wed Mar 21 18:47:50 CET 2007


Gary Anderson wrote:
> Can you point me to a URL for the CVS?

CVSROOT=:pserver:cvs at cvs.indexdata.dk:/cvs
password is anonymous
cvs co yaz

/ Adam

> Adam Dickmeiss wrote:
> 
>> Gary Anderson wrote:
>>
>>> I am passing the following UTF8 string (Values are hangul characters 
>>> given in hex.  Ignore spaces) to the converter:
>>>
>>> E8 87 BA  E7 81 A3  E5 9C B0  E5 8D 80  E5 9C 8B  E6 B0 91  E6 89 80  
>>> E5 BE 97.
>>>
>>> YAZ correctly translates this string to (output in MARC8, hex, ignore 
>>> spaces):
>>>
>>> 1B  28  42  21 54 2B  21 49 43  21 37 79  21 34 55  21 37 6f  21 46 
>>> 4d  21 3F 75  21 30 6A
>>> esc  $    1
>>>
>>> Notice that the ending escape sequence (ESC ( B) was not appended to 
>>> this string.  It appeared at the beginning of my
>>> next string.
>>
>>
>> How did you test this? With yaz-iconv?
>>
>> A call to
>>  yaz_iconv(cd, 0, 0, &outp, &outbytesleft);
>>
>> will set the conversion to the inital state and generate the ESC(B .
>>
>> I can tell you this: yaz-iconv did not do it . And that's a mistake.
>>
>>> I'm thinking that the yaz_write_marc8_page_chr module you sent in the 
>>> patch isn't working, or it needs to be called from somewhere else.
>>
>>
>> Yesterday major changes to siconv.c were made. The new code is 
>> simpler, IMHO. I really suggest you check YAZ out via CVS. One thing 
>> you'll notice is that the last parameter is gone.
>>
>> The yaz_flush_marc8, yaz_flush_ISO8859_1 does the flushing.. And are 
>> called when yaz_iconv(cd, 0,0, &outp, &outbytesleft), is used.
>>
>> You may ask: why this flushing? And why get rid of the last parameter?
>>
>> The last parameter was set(to 1) when for the last byte/character in a 
>> call to yaz_iconv  (with inbuf != 0). Problem is that it may not be 
>> the last of the whole input byte sequence.
>>
>> The last is a problematic. Conversion of (large) files require 
>> multiple calls to iconv anyway with chunks of input. Not necessarily 
>> complete input sequences.. We must therefore flush in the end anyway.
>>
>> More importantly: we want yaz_iconv to have iconv semantics.
>>
>> See:
>> http://www.gnu.org/software/libc/manual/html_node/iconv-Examples.html#iconv-Examples 
>>
>>
>> In case of MARC we want each field data to self-contained. And hence 
>> to  ensure this, we flush for each field data. For YAZ' MARC utility 
>> that's done in marc_iconv_reset (src/marcdisp.c).
>>
>> / Adam
>>
>>> Gary
>>>
>>> _______________________________________________
>>> Yazlist mailing list
>>> Yazlist at lists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
> 
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist




More information about the Yazlist mailing list