[Zebralist] UTF-8 search issue in Zebra 2.0.26

Marc Cromme marc at indexdata.dk
Wed Feb 20 09:20:15 CET 2008


Ata ur Rehman wrote:
> Dear All
> 
> I have installed Zebra 2.0.26 on Windows. I added:
> encoding UTF-8 in my usmarc.abs file and
> encoding: UTF-8 C:\Program Files\Zebra\test\usmarc\zebra.cfg file.  Now 
> i added some Arabic records in C:\Program 
> Files\Zebra\test\usmarc\records folder and run zebraidx.  Indexing is 
> alright.  Then i started zebra at 210 port as zebrasrv @:210
> 
> This process is also fine.  Then  i used MarcEdit z39.50 client to 
> search these results.  I can only search English records.  If arabic is 
> in those records that is showing correctly but i can not search arabic 
> terms from my keyboard.
> 
> What should i do?
> 
> Note: I am running Zebra as windows service in XP
> 
> 
> Ata
> 

Dear Ata

The default zebra configurations using the older charmap config files 
for string parsing and tokenization, like the instructions in

  tab/default.idx
    charmap numeric.chr
    charmap urx.chr
    charmap string.chr

are not able to index the full Unicode range.

You need to use the newest Zebra 2.0.26 windows exe, which is bundled 
with ICU support, and you need to switch to Unicode ICU support, see

http://www.indexdata.com/zebra/doc/fields-and-charsets.tkl
and especially
http://www.indexdata.com/zebra/doc/icuchain-files.tkl

Zebra is shipped with a field types file icu.idx  which is an ICU chain 
version of default.idx. You might want to start there.

It is a good idea to check out you rules using the yaz-icu test utility 
found in the YAZ binaries, otherwise it can get hard to figure out how 
the ICU rules work.

see http://www.indexdata.dk/yaz/doc/yaz-icu.tkl

In additon, after indexing, it's always a good idea to control that the 
arabic terms/tokens really got inside Zebra indexes. Try to look at your 
data using the Zebra::index element set.

See http://www.indexdata.com/zebra/doc/special-retrieval.tkl

Happy debugging !!

Your's Marc Cromme, Index Data


> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Zebralist mailing list
> Zebralist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist


-- 

Marc Cromme
M.Sc and Ph.D in Mathematical Modeling and Computation
Senior Developer, Project Manager

Index Data Aps
Købmagergade 43, 2
1150 Copenhagen K.
Denmark

tel: +45 3341 0100
fax: +45 3341 0101

http://www.indexdata.com

INDEX DATA Means Business
for Open Source and Open Standards







More information about the Zebralist mailing list