[Yazlist] Conversion MARC8 to UTF8 ?

David Beukers dave at bslw.com
Thu Sep 18 22:09:06 CEST 2008


According to the Library of Congress MARC8 to Unicode xml mapping file 
(Dec 2007), the (single character)  COMBINING DOUBLE INVERTED BREVE is 
the preferred form in UTF-8, although MARC8 only has left- and 
right-half forms of this.

Dave Beukers

Adam Dickmeiss wrote:
> Lamon Jean-Pierre wrote:
>   
>> Hi all,
>>
>>  
>>
>> I’m coming with another conversion problem from MARC8 to UTF8 reported 
>> by the Library network of Western Switzerland (RERO).
>>
>> I’ve tried with many yaz3.dll versions. It’s about ligatures.
>>
>>  
>>
>>     
> I'm not an expert either.. But you might post the original MARC-8 
> content or how to retrieve the original MARC records (in MARC-8).
>
> / Adam
>
>   
>> String coming from the LOC webopac :
>>
>>  
>>
>> Avtomatizat︠s︡ii︠a︡ issledovaniĭ i analiz dannykh
>>
>>  
>>
>> t︠s︡
>>
>> sequence analyze
>>
>>  
>>
>> U+ 0074 = LATIN SMALL LETTER T
>>
>> U+FE20 = COMBINING LIGATURE LEFT HALF
>>
>> U+0073 = LATIN SMALL LETTER S
>>
>> U+FE21 = COMBINING LIGATURE RIGHT HALF
>>
>>  
>>
>> String coming with YAZ
>>
>>  
>>
>> Avtomatizat͡sii͡a issledovaniĭ i analiz dannykh
>>
>>  
>>
>> t͡sii͡
>>
>>  
>>
>> Sequence analyze :
>>
>>  
>>
>> U+ 0074 = LATIN SMALL LETTER T
>>
>> U+0361 COMBINING DOUBLE INVERTED BREVE
>>
>> U+0073 = LATIN SMALL LETTER S
>>
>>  
>>
>> I really don’t know enough about russian language,  translitteration 
>> etc… but is this difference normal?
>>
>>  
>>
>> Regards
>>
>> JPL
>>
>>  
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>   
>>     
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>   



More information about the Yazlist mailing list