[Oclist] Character set issues
Giannis Kosmas
kosmas at lib.uoc.gr
Tue Mar 10 12:48:50 CET 2009
Hi everybody!
We are using quite a lot most of the opencontent databases provided by
Indexdata as they are very useful in a metasearching environment. It
seems that there are some character set issues though, at least with the
databases we tried so far i.e. dmoz, wikipedia and gutenberg.
More specifically, there are problems when someone searches with Greek
text. Not all the records that come back from the server match the
search criteria. For example, when I try to search with term "Αλκηστις"
against gutenberg I expect to get records matching that Greek word but I
get records with Russian text as well and there is no Greek text
anywhere in those records. I tried the opposite, searching with
"История", Russian word for "history" and I got back records with Greek
text as well so I believe this happens with queries expressed in a
script residing outside latin-1. All of my search queries were formed in
UTF-8.
Another problem appears when a record is requested in iso2709 and MARC8
character set. All the greek accented letters are not shown at all. The
records are presented ok when they are requested as marcxml though. I
hope this helps.
Giannis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kosmas.vcf
Type: text/x-vcard
Size: 259 bytes
Desc: not available
Url : http://lists.indexdata.dk/pipermail/oclist/attachments/20090310/cf9d829c/attachment.vcf
More information about the Oclist
mailing list