[Zebralist] UTF-8 search issue in Zebra 2.0.26
marc at indexdata.dk
Wed Feb 20 09:20:15 CET 2008
Ata ur Rehman wrote:
> Dear All
> I have installed Zebra 2.0.26 on Windows. I added:
> encoding UTF-8 in my usmarc.abs file and
> encoding: UTF-8 C:\Program Files\Zebra\test\usmarc\zebra.cfg file. Now
> i added some Arabic records in C:\Program
> Files\Zebra\test\usmarc\records folder and run zebraidx. Indexing is
> alright. Then i started zebra at 210 port as zebrasrv @:210
> This process is also fine. Then i used MarcEdit z39.50 client to
> search these results. I can only search English records. If arabic is
> in those records that is showing correctly but i can not search arabic
> terms from my keyboard.
> What should i do?
> Note: I am running Zebra as windows service in XP
The default zebra configurations using the older charmap config files
for string parsing and tokenization, like the instructions in
are not able to index the full Unicode range.
You need to use the newest Zebra 2.0.26 windows exe, which is bundled
with ICU support, and you need to switch to Unicode ICU support, see
Zebra is shipped with a field types file icu.idx which is an ICU chain
version of default.idx. You might want to start there.
It is a good idea to check out you rules using the yaz-icu test utility
found in the YAZ binaries, otherwise it can get hard to figure out how
the ICU rules work.
In additon, after indexing, it's always a good idea to control that the
arabic terms/tokens really got inside Zebra indexes. Try to look at your
data using the Zebra::index element set.
Happy debugging !!
Your's Marc Cromme, Index Data
> Zebralist mailing list
> Zebralist at lists.indexdata.dk
M.Sc and Ph.D in Mathematical Modeling and Computation
Senior Developer, Project Manager
Index Data Aps
Købmagergade 43, 2
1150 Copenhagen K.
tel: +45 3341 0100
fax: +45 3341 0101
INDEX DATA Means Business
for Open Source and Open Standards
More information about the Zebralist