[Zebralist] Differences when using icu.idx

Adam Dickmeiss adam at indexdata.dk
Fri Apr 24 15:55:13 CEST 2009


Larry E. Dixson wrote:
> I have noted some functionality differences when I use icu.idx
> instead of default.idx for indexing.  I am able to index and search
> non-Latin script characters, which was the primary goal of switching
> to ICU.  However, I have noticed that some Relation and Structure
> attributes that were supported previously are no longer supported.  
> I am also not able to retrieve indexed data when requesting
> "zebra::index", "zebra::index::name", etc.  ("Zebra::data" and
> "zebra::snippet" work as expected, however.)
>
> Are these known issues when using ICU, or is this a problem 
> created by my configuration?
>
>   
It is a known bug. http://bugzilla.indexdata.dk/show_bug.cgi?id=2048

We'll make it "higher" priority and hope to have it fixed in next release.

/ Adam
> Thanks.
> Larry
>
> ---------------------------------------------------------------
> Indexed using default.idx -- not ICU
> ---------------------------------------
> Z> f @attr 1=1016 "<UTF-8 characters>"
> Number of hits: 6340, setno 4    [doesn't match]
> SearchResult-1: term=<UTF-8 characters> cnt=6340
>
> Z> f @attr 1=1016 fur
> Number of hits: 9, setno 5
> SearchResult-1: term=fur cnt=9
>
> Z> f @attr 1=1016 fu.jr
> Number of hits: 9, setno 6
> SearchResult-1: term=fu.jr cnt=9
>
> Z> f @attr 1=1016 @attr 2=1 2005    [Less than]
> Number of hits: 10866, setno 3
> SearchResult-1: term=2005 cnt=10866
>
> Z> f @attr 1=1016 @attr 2=2 2005    [Less than or equal]
> Number of hits: 10965, setno 4
> SearchResult-1: term=2005 cnt=10965
>
> Z> f @attr 1=1016 @attr 2=3 2005    [Equal]
> Number of hits: 310, setno 5
> SearchResult-1: term=2005 cnt=310
>
> Z> f @attr 1=1016 @attr 2=4 2005    [Greater or equal]
> Number of hits: 11129, setno 6
> Result Set Status: subset
> SearchResult-1: term=2005 cnt=11129
>
> Z> f @attr 1=1016 @attr 2=5 2005    [Greater than]
> Number of hits: 11067, setno 7
> Result Set Status: subset
> SearchResult-1: term=2005 cnt=11067
>
> Z> f @attr 1=1016 @attr 2=102 2005   [Relevance]
> Number of hits: 310, setno 8
> SearchResult-1: term=2005 cnt=310
>
> Z> f @attr 1=1016 @attr 2=103 2005    [AlwaysMatches]
> Number of hits: 12807, setno 9
> SearchResult-1: term= cnt=12807
>
> Z> f @attr 1=1016 @attr 5=1 wyomin    [Right truncation]
> Number of hits: 1, setno 16
> SearchResult-1: term=wyomin cnt=1
>
> Z> f @attr 1=1016 @attr 5=2 yoming    [Left truncation]
> Number of hits: 1, setno 17
> SearchResult-1: term=yoming cnt=1
>
> Z> f @attr 1=1016 @attr 5=3 yomin     [Left and right truncation]
> Number of hits: 2, setno 18
> SearchResult-1: term=yomin cnt=2
>
>
> Z> format xml
> Z> e zebra::index::name
> Z> s 1
> Record type: XML
> <record
> xmlns="http://www.indexdata.com/zebra/" sysno="10140" set="zebra::index:
> :name/">
>   <index name="name" type="w" seq="20">@^</index>
>   <index name="name" type="w" seq="1"></index>
>   <index name="name" type="w" seq="21">fan</index>     [data is present]
>   <index name="name" type="w" seq="22">linbo</index>
>   <index name="name" type="p" seq="20">fan linbo</index>
>   <index name="name" type="w" seq="23">@^</index>
>   <index name="name" type="w" seq="24">@@@@@@@@@</index>
>   <index name="name" type="p" seq="23">@@@@@@@@@</index>
> </record>
>
>
> ----------------------------------------------------
> Indexed using icu.idx -- ICU
> ----------------------------------------------------
>
> Z> f @attr 1=1016 "<UTF-8 characters>"
> Number of hits: 1, setno 2
> SearchResult-1: term=<term1> cnt=4, term=<term2> cnt=51, term=<term3> 
> cnt=32
>
> Z> f @attr 1=1016 fur
> Number of hits: 9, setno 3
> SearchResult-1: term=fur cnt=9
>
> Z> f @attr 1=1016 fu.jr
> Number of hits: 9, setno 4
> SearchResult-1: term=fur cnt=9
>
> Z> f @attr 1=1016 @attr 2=1 2005     [Less than]
> Search was a bloomin' failure.
> Number of hits: 0, setno 6
> Diagnostic message(s) from database:
>     [117] Unsupported Relation attribute -- v2 addinfo '1'
>
> Z> f @attr 1=1016 @attr 2=2 2005     [Less than or equal]
> Search was a bloomin' failure.
> Number of hits: 0, setno 7
> Diagnostic message(s) from database:
>     [117] Unsupported Relation attribute -- v2 addinfo '2'
>
> Z> f @attr 1=1016 @attr 2=3 2005 
> Number of hits: 310, setno 8
> SearchResult-1: term=music cnt=54
>
> Z> f @attr 1=1016 @attr 2=4 2005     [Greater or equal]
> Search was a bloomin' failure.
> Number of hits: 0, setno 9
> Diagnostic message(s) from database:
>     [117] Unsupported Relation attribute -- v2 addinfo '4'
>
> Z> f @attr 1=1016 @attr 2=5 2005     [Greater than]
> Search was a bloomin' failure.
> Number of hits: 0, setno 10
> Diagnostic message(s) from database:
>     [117] Unsupported Relation attribute -- v2 addinfo '5'
>
> Z> f @attr 1=1016 @attr 2=102 2005   [Relevance]
> Search was a success.
> Number of hits: 310, setno 11
> SearchResult-1: term=music cnt=54
>
> Z> f @attr 1=1016 @attr 2=103 2005   [AlwaysMatches]
> Number of hits: 17155, setno 12
> SearchResult-1: term= cnt=17155
>
> Z> f @attr 1=1016 @attr 5=1 wyomin    [Right truncation]
> Number of hits: 1, setno 13
> SearchResult-1: term=wyomin cnt=1
>
> Z> f @attr 1=1016 @attr 5=2 yoming    [Left truncation]
> Search was a bloomin' failure.
> Number of hits: 0, setno 14
> Diagnostic message(s) from database:
>     [120] Unsupported Truncation attribute -- v2 addinfo '2'
>
> Z> f @attr 1=1016 @attr 5=3 yomin     [Left and right truncation]
> Search was a bloomin' failure.
> Number of hits: 0, setno 15
> Diagnostic message(s) from database:
>     [120] Unsupported Truncation attribute -- v2 addinfo '3'
>
>
> Z> format xml
> Z> e zebra::index::name
> Z> s 1
> Record type: XML
> <record
> xmlns="http://www.indexdata.com/zebra/" sysno="10140" set="zebra::index:
> :name/">
>   <index name="name" type="w" seq="11"></index> 
>   <index name="name" type="w" seq="1"></index> 
>   <index name="name" type="w" seq="12"></index>       [no data]
>   <index name="name" type="p" seq="11"></index>       [no data]
>   <index name="name" type="w" seq="13"></index>
>   <index name="name" type="w" seq="14"></index>
>   <index name="name" type="w" seq="15"></index>
>   <index name="name" type="p" seq="13"></index>
> </record>
>
> Z> e zebra::snippet
> Z> s 1
> Record type: XML
> <record xmlns="http://www.indexdata.com/zebra/">
>   <snippet
> name="any" type="w" fields="note"><s>Wyoming</s>forestindustrydirecto
> ry1999</snippet>
> </record>
>
> Z> e zebra::data
> Z> s 1
> Record type: XML
> <?xml version="1.0"?>
> <record xmlns="http://www.loc.gov/MARC21/slim"><leader>00504cz  a2200169n
> 4500<
> /leader><controlfield tag="001">no 99088913 </controlfield><controlfield
> tag="00
> 3">DLC</controlfield><controlfield
> tag="005">20081112052255.0</controlfield><con
> trolfield tag="008">991213n| acannaabn          |b aaa
> c</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield
> code="a">no 99088913</subfield></datafield><datafield
> tag="035" ind1=" " ind2=" "><subfield 
> code="a">(OCoLC)oca05120188</subfield></datafield><datafield
> tag="040" ind1=" " ind2=" "><subfield code
> ="a">WyU</subfield><subfield code="b">eng</subfield><subfield
> code="c">WyU</subfield><subfield code="d">OCoLC</subfield></datafield><datafield
> tag="100" ind1="1" ind2=" "><subfield code="a">Fan,
> Linbo</subfield></datafield><datafield tag="4
> 00" ind1="1" ind2=" "><subfield code="a">&#x6A0A;&#x6797;&#x6CE2;</subfield>
> </datafield><datafield tag="667" ind1=" " ind2=" "><subfield
> code="a">Machine-derived non-Latin script reference
> project.</subfield></datafield><datafield tag="667"
>  ind1=" " ind2=" "><subfield code="a">Non-Latin script reference not
> evaluated.</subfield></datafield><datafield
> tag="670" ind1=" " ind2=" "><subfield
> code="a">Wyoming forest industry directory, 1999:</subfield><subfield
> code="b">t.p. (Linbo Fan)</subfield></datafield></record>
>
>
> ------------------------------------------------------------
> Larry E. Dixson                    Internet:    ldix at loc.gov
> Network Development and MARC
>    Standards Office, LA327
> Library of Congress                Telephone: (202) 707-5807
> Washington, D.C.  20540-4402       Fax:       (202) 707-0115
>
>
> _______________________________________________
> Zebralist mailing list
> Zebralist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist
>
>   




More information about the Zebralist mailing list