1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.6 2006-06-15 13:41:49 marc Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
9 <sect2 id="querymodel-query-languages">
10 <title>Query Languages</title>
13 Zebra is born as a networking Information Retrieval engine adhering
14 to the international standards
15 <ulink url="&url.z39.50;">Z39.50</ulink> and
16 <ulink url="&url.sru;">SRU</ulink>,
17 and implement the query model defined there.
18 Unfortunately, the Z39.50 query model has only defined a binary
19 encoded representation, which is used as transport packaging in
20 the Z39.50 protocol layer. This representation is not human
21 readable, nor defines any convenient way to specify queries.
23 <!-- tell about RPN - include link to YAZ
26 <sect3 id="querymodel-query-languages-pqf">
27 <title>Prefix Query Format (PQF)</title>
30 Index Data has defined a textual representaion in the
31 <literal>Prefix Query Format</literal>, short
32 <literal>PQF</literal>, which then has been adopted by other
33 parties developing Z39.50 software. It is also often referred to as
34 <literal>Prefix Query Notation</literal>, or in short
35 <literal>PQN</literal>, and is thoroughly explained in
36 <xref linkend="querymodel-pqf"/>.
41 <!-- PQF/RPN is natively supported. CQL is NOT . So we need a map -->
42 <sect3 id="querymodel-query-languages-cql">
43 <title>Common Query Language (CQL)</title>
45 In addition, Zebra can be configured to understand and map the
46 <literal>Common Query Language</literal>
47 (<ulink url="&url.cql;">CQL</ulink>)
48 to PQF. See an introduction on the mapping to the internal query
50 <xref linkend="querymodel-cql-to-pqf"/>.
56 <sect2 id="querymodel-query-types">
57 <title>Query types</title>
61 <sect3 id="querymodel-query-type-explain">
62 <title>Explain Queries</title>
67 <sect3 id="querymodel-query-type-search">
68 <title>Search Queries</title>
73 <sect3 id="querymodel-query-type-scan">
74 <title>Scan Queries</title>
84 <sect1 id="querymodel-pqf">
85 <title>Prefix Query Format structure and syntax</title>
87 The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
88 is documented in the YAZ manual, and shall not be
89 repeated here. This textual PQF representation
90 is always during search mapped to the equivalent Zebra internal
94 <sect2 id="querymodel-pqf-tree">
95 <title>PQF tree structure</title>
97 The PQF parse tree - or the equivalent textual representation -
98 may start with one specification of the
99 <emphasis>attribute set</emphasis> used. Following is a query
101 consists of <emphasis>atomic query parts</emphasis>, eventually
102 paired by <emphasis>boolean binary operators</emphasis>, and
103 finally <emphasis>recursively combined </emphasis> into
107 <sect3 id="querymodel-attribute-sets">
108 <title>Attribute sets</title>
110 Attribute sets define the exact meaning and semantics of queries
111 issued. Zebra comes with some predefined attribute set
112 definitions, others can easily be defined and added to the
115 The Zebra internal query procesing is modeled after
116 the <literal>Bib1</literal> attribute set, and the non-use
117 attributes type 2-6 are hard-wired in. It is therefore essential
118 to be familiar with <xref linkend="querymodel-bib1"/>.
122 <table id="querymodel-attribute-sets-table">
123 <caption>Attribute sets predefined in Zebra</caption>
126 <tr><td>one</td><td>two</td></tr>
131 <td><emphasis>exp-1</emphasis></td>
132 <td><literal>Explain</literal> attribute set</td>
133 <td>Special attribute set used on the special automagic
134 <literal>IR-Explain-1</literal> database to gain information on
135 server capabilities, database names, and database
139 <td><emphasis>bib-1</emphasis></td>
140 <td><literal>Bib1</literal> attribute set</td>
141 <td>Standard PQF query language attribute set which defines the
142 semantics of Z39.50 searching. In addition, all of the
143 non-use attributes (type 2-9) define the Zebra internal query
147 <td><emphasis>gils</emphasis></td>
148 <td><literal>GILS</literal> attribute set</td>
149 <td>Extention to the <literal>Bib1</literal> attribute set.</td>
155 <sect3 id="querymodel-boolean-operators">
156 <title>Boolean operators</title>
158 A pair of subquery trees, or of atomic queries, is combined
159 using the standard boolean operators into new query trees.
162 <table id="querymodel-boolean-operators-table">
163 <caption>Boolean operators</caption>
166 <tr><td>one</td><td>two</td></tr>
170 <tr><td><emphasis>@and</emphasis></td>
171 <td>binary <literal>AND</literal> operator</td>
172 <td>Set intersection of two atomic queries hit sets</td>
174 <tr><td><emphasis>@or</emphasis></td>
175 <td>binary <literal>OR</literal> operator</td>
176 <td>Set union of two atomic queries hit sets</td>
178 <tr><td><emphasis>@not</emphasis></td>
179 <td>binary <literal>AND NOT</literal> operator</td>
180 <td>Set complement of two atomic queries hit sets</td>
182 <tr><td><emphasis>@prox</emphasis></td>
183 <td>binary <literal>PROXIMY</literal> operator</td>
184 <td>Set intersection of two atomic queries hit sets. In
185 addition, the intersection set is purged for all
186 documents which do not satisfy the requested query
187 term proximity. Usually a proper subset of the AND
194 For example, we can combine the terms
195 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
196 into different searches in the default index of the default
197 attribute set as follows.
198 Querying for the union of all documents containing the
199 terms <emphasis>information</emphasis> OR
200 <emphasis>retrieval</emphasis>:
202 Z> find @or information retrieval
206 Querying for the intersection of all documents containing the
207 terms <emphasis>information</emphasis> AND
208 <emphasis>retrieval</emphasis>:
209 The hit set is a subset of the coresponding
212 Z> find @and information retrieval
216 Querying for the intersection of all documents containing the
217 terms <emphasis>information</emphasis> AND
218 <emphasis>retrieval</emphasis>, taking proximity into account:
219 The hit set is a subset of the coresponding
222 Z> find @prox information retrieval
226 Querying for the intersection of all documents containing the
227 terms <emphasis>information</emphasis> AND
228 <emphasis>retrieval</emphasis>, in the same order and near each
229 other as described in the term list
230 The hit set is a subset of the coresponding
233 Z> find "information retrieval"
239 <sect3 id="querymodel-atomic-queries">
240 <title>Atomic queries</title>
242 Atomic queries are the query parts which work on one acess point
243 only. These consist of <literal>an attribute list</literal>
244 followed by a <literal>single term</literal> or a
245 <literal>quoted term list</literal>.
248 Unsupplied non-use attributes type 2-9 are either inherited from
249 higher nodes in the query tree, or are set to Zebra's default values.
250 See <xref linkend="querymodel-bib1"/> for details.
253 <table id="querymodel-atomic-queries-table">
254 <caption>Atomic queries</caption>
257 <tr><td>one</td><td>two</td></tr>
261 <tr><td><emphasis>attribute list</emphasis></td>
262 <td>List of <literal>orthogonal</literal> attributes</td>
263 <td>Any of the orthogonal attribute types may be omitted,
264 these are inherited from higher query tree nodes, or if not
265 inherited, are set to the default Zebra configuration values.
268 <tr><td><emphasis>term</emphasis></td>
269 <td>single <literal>term</literal>
270 or <literal>quoted term list</literal> </td>
271 <td>Here the search terms or list of search terms is added
277 Querying for the term <emphasis>information</emphasis> in the
278 default index using the default attribite set, the server choice
279 of access point/index, and the default non-use attributes.
281 Z> find "information"
285 Equivalent query fully specified:
287 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
292 Finding all documents which have empty titles. Notice that the
293 empty term must be quoted, but is otherwise legal.
301 <sect3 id="querymodel-use-string">
302 <title>Zebra's special use attribute type 1 of form 'string'</title>
304 The numeric <literal>use (type 1)</literal> attribute is usually
305 refered to from a given
306 attribute set. In addition, Zebra let you use
307 <emphasis>any internal index
308 name defined in your configuration</emphasis>
309 as use atribute value. This is a great feature for
310 debugging, and when you do
311 not need the complecity of defined use attribute values. It is
312 the preferred way of accessing Zebra indexes directly.
315 Finding all documents which have the term list "information
316 retrieval" in an Zebra index, using it's internal full string name.
318 Z> find @attr 1=sometext "information retrieval"
322 Searching the bib-1 use attribute 54 using it's string name:
324 Z> find @attr 1=Code-language eng
328 Searching in any silly string index - if it's defined in your
329 indexation rules and can be parsed by the PQF parser.
330 This is definitely not the recommended use of
331 this facility, as it might confuse your users with some very
334 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
338 See <xref linkend="querymodel-bib1-mapping"/> for details, and
339 <xref linkend="server-sru"/>
340 for the SRU PQF query extention using string names as a fast
345 <sect3 id="querymodel-use-xpath">
346 <title>Zebra's special use attribute type 1 of form 'XPath'
347 for GRS filters</title>
349 As we have seen above, it is possible (albeit seldom a great
351 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
352 search by defining <literal>use (type 1)</literal>
353 <emphasis>string</emphasis> attributes which in appearence
354 <emphasis>resemble XPath queries</emphasis>. There are two
355 problems with this approach: first, the XPath-look-alike has to
356 be defined at indexation time, no new undefined
357 XPath queries can entered at search time, and second, it might
358 confuse users very much that an XPath-alike index name in fact
359 gets populated from a possible entirely different XML element
360 than it pretends to acess.
363 When using the <literal>GRS Record Model</literal>
364 (see <xref linkend="record-model-grs"/>), we have the
365 possibility to embed <emphasis>life</emphasis>
367 in the PQF queries, which are here called
368 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
369 attributes. You must enable the
370 <literal>xpath enable</literal> directive in your
371 <literal>.abs</literal> config files.
374 Only a <emphasis>very</emphasis> restricted subset of the
375 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
376 standard is supported as the GRS record model is simpler than
377 a full XML DOM structure. See the following examples for
381 Finding all documents which have the term "content"
382 inside a text node found in a specific XML DOM
383 <emphasis>subtree</emphasis>, whose starting element is
386 Z> find @attr 1=/root content
387 Z> find @attr 1=/root/first content
389 <emphasis>Notice that the
390 XPath must be absolute, i.e., must start with '/', and that the
391 XPath <literal>decendant-or-self</literal> axis followed by a
392 text node selection <literal>text()</literal> is implicitly
393 appended to the stated XPath.
395 It follows that the above searches are interpreted as:
397 Z> find @attr 1=/root//text() content
398 Z> find @attr 1=/root/first//text() content
403 Filter the adressing XPath by a predicate working on exact
405 attributes (in the XML sense) can be done: return all those docs which
406 have the term "english" contained in one of all text subnodes of
407 the subtree defined by the XPath
408 <literal>/record/title[@lang='en']</literal>
410 Z> find @attr 1=/record/title[@lang='en'] english
415 Combining numeric indexes, boolean expressions,
416 and xpath based searches is possible:
418 Z> find @attr 1=/record/title @and foo bar
419 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
423 Escaping PQF keywords and other non-parseable XPath constructs
424 with <literal>'{ }'</literal> to prevent syntax errors:
426 Z> find @attr {1=/root/first[@attr='danish']} content
427 Z> find @attr {1=/root/second[@attr='danish lake']}
428 Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
432 It is worth mentioning that these dynamic performed XPath
433 queries are a performance bottelneck, as no optimized
434 specialized indexes can be used. Therefore, avoid the use of
435 this facility when speed is essential, and the database content
436 size is medium to large.
442 <sect2 id="querymodel-exp1">
443 <title>Explain Attribute Set</title>
445 The Z39.50 standard defines the
446 <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
447 <literal>exp-1</literal>, which is used to discover information
448 about a server's search semantics and functional capabilities
449 Zebra exposes a "classic"
450 Explain database by base name <literal>IR-Explain-1</literal>, which
451 is populated with system internal information.
454 The attribute-set <literal>exp-1</literal> consists of a single
455 <literal>Use (type 1)</literal> attribute.
458 In addition, the non-Use
459 <literal>bib-1</literal> attributes, that is, the types
460 <literal>Relation</literal>, <literal>Position</literal>,
461 <literal>Structure</literal>, <literal>Truncation</literal>,
462 and <literal>Completeness</literal> are imported from
463 the <literal>bib-1</literal> attribute set, and may be used
464 within any explain query.
467 <sect3 id="querymodel-exp1-use">
468 <title>Use Attributes (type = 1)</title>
470 The following Explain search atributes are supported:
471 <literal>ExplainCategory</literal> (@attr 1=1),
472 <literal>DatabaseName</literal> (@attr 1=3),
473 <literal>DateAdded</literal> (@attr 1=9),
474 <literal>DateChanged</literal>(@attr 1=10).
477 A search in the use attribute <literal>ExplainCategory</literal>
478 supports only these predefined values:
479 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
480 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
483 See <filename>tab/explain.att</filename> and the
484 <ulink url="&url.z39.50;">Z39.50</ulink> standard
485 for more information.
490 <title>Explain searches with yaz-client</title>
492 Classic Explain only defines retrieval of Explain information
493 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
494 they don't have to - Zebra allows retrieval of this information
496 <literal>SUTRS</literal>, <literal>XML</literal>,
497 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
501 List supported categories to find out which explain commands are
505 Z> find @attr exp1 1=1 categorylist
512 Get target info, that is, investigate which databases exist at
513 this server endpoint:
516 Z> find @attr exp1 1=1 targetinfo
527 List all supported databases, the number of hits
528 is the number of databases found, which most commonly are the
530 the <literal>Default</literal> and the
531 <literal>IR-Explain-1</literal> databases.
534 Z> find @attr exp1 1=1 databaseinfo
541 Get database info record for database <literal>Default</literal>.
544 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
546 Identical query with explicitly specified attribute set:
549 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
554 Get attribute details record for database
555 <literal>Default</literal>.
556 This query is very useful to study the internal Zebra indexes.
557 If records have been indexed using the <literal>alvis</literal>
558 XSLT filter, the string representation names of the known indexes can be
562 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
564 Identical query with explicitly specified attribute set:
567 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
574 <sect2 id="querymodel-bib1">
575 <title>Bib1 Attribute Set</title>
577 Something about querying to be written ..
580 Most of the information contained in this section is an excerpt of
581 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
583 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
584 Attribute Set Semantics</ulink> from 1995, also in an updated
585 <ulink url="&url.z39.50.attset.bib1;">Bib-1
586 Attribute Set</ulink>
587 version from 2003. Index Data is not the copyright holder of this
592 <sect3 id="querymodel-bib1-use">
593 <title>Use Attributes (type 1)</title>
597 A use attribute specifies an access point for any atomic query.
598 These acess points are highly dependent on the attribute set used
599 in the query, and are user configurable using the following
600 default configuration files:
601 <filename>tab/bib1.att</filename>,
602 <filename>tab/dan1.att</filename>,
603 <filename>tab/explain.att</filename>, and
604 <filename>tab/gils.att</filename>.
605 New attribute sets can be added by adding new
606 <filename>tab/*.att</filename> configuration files, which need to
607 be sourced in the main configuration <filename>zebra.cfg</filename>.
611 In addition, Zebra allows the acess of
612 <emphasis>internal index names</emphasis> and <emphasis>dynamic
613 XPath</emphasis> as use attributes.
614 See <xref linkend="querymodel-use-string and "/>
615 <xref linkend="querymodel-use-xpath"/> for
616 alternative acess to the Zebra internal index names and XPath queries.
620 Phrase search for <emphasis>information retrieval</emphasis> in
623 Z> find @attr 1=4 "information retrieval"
628 <sect3 id="querymodel-bib1-relation">
629 <title>Relation Attributes (type 2)</title>
632 Relation attributes describe the relationship of the access
634 of the relation) to the search term as qualified by the attributes (right
635 side of the relation), e.g., Date-publication <= 1975.
638 <table id="querymodel-bib1-relation-table">
639 <caption>Relation Attributes (type 2)</caption>
654 <td>Less than or equal</td>
664 <td>Greater or equal</td>
669 <td>Greater than</td>
694 <td>AlwaysMatches</td>
702 The relation attribute
703 <literal>relevance (102)</literal> is supported, see
704 <xref linkend="administration-ranking"/> for full information.
705 <!-- always-matches (103) not supported for all indexes -->
709 All ordering operations are based on a lexicographical ordering,
710 <emphasis>expect</emphasis> when the
711 structure attribute <literal>numeric (109)</literal> is used. In
712 this case, ordering is numerical. See
713 <xref linkend="querymodel-bib1-structure"/>.
717 Ranked search for <emphasis>information retrieval</emphasis> in
719 (see <xref linkend="administration-ranking"/> for the glory details):
721 Z> find @attr 1=4 @attr 2=102 "information retrieval"
726 <sect3 id="querymodel-bib1-position">
727 <title>Position Attributes (type 3)</title>
730 The position attribute specifies the location of the search term
731 within the field or subfield in which it appears.
734 <table id="querymodel-bib1-position-table">
735 <caption>Position Attributes (type 3)</caption>
745 <td>First in field </td>
750 <td>First in subfield</td>
755 <td>Any position in field</td>
763 The position attribute values <literal>first in field (1)</literal>,
764 and <literal>first in subfield(2)</literal> are unsupported.
765 Using them does not trigger an error, but silent defaults to
766 <literal>any position in field (3)</literal>.
771 <sect3 id="querymodel-bib1-structure">
772 <title>Structure Attributes (type 4)</title>
775 The structure attribute specifies the type of search
776 term. This causes the search to be mapped on
777 different Zebra internal indexes, which must have been defined
782 The possible values of the
783 <literal>structure attribute (type 4)</literal> can be defined
784 using the configuraiton file <filename>
785 tab/default.idx</filename>.
786 The default configuration is summerized in this table.
789 <table id="querymodel-bib1-structure-table">
790 <caption>Structure Attributes (type 4)</caption>
820 <td>Date (normalized)</td>
830 <td>Date (un-normalized)</td>
835 <td>Name (normalized) </td>
840 <td>Name (un-normalized) </td>
855 <td>Free-form-text</td>
860 <td>Document-text</td>
865 <td>Local-number</td>
875 <td>Numeric string</td>
884 The structure attribute value <literal>local-number
886 is supported, and maps always to the Zebra internal document ID.
891 the GILS schema (<literal>gils.abs</literal>), the
892 west-bounding-coordinate is indexed as type <literal>n</literal>,
893 and is therefore searched by specifying
894 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
895 To match all those records with west-bounding-coordinate greater
896 than -114 we use the following query:
898 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
902 <sect3 id="querymodel-bib1-truncation">
903 <title>Truncation Attributes (type = 5)</title>
906 The truncation attribute specifies whether variations of one or
907 more characters are allowed between serch term and hit terms, or
908 not. Using non-default truncation attributes will broaden the
909 document hit set of a search query.
912 <table id="querymodel-bib1-truncation-table">
913 <caption>Truncation Attributes (type 5)</caption>
923 <td>Right truncation </td>
928 <td>Left truncation</td>
933 <td>Left and right truncation</td>
938 <td>Do not truncate</td>
943 <td>Process # in search term</td>
961 Truncation attribute value
962 <literal>Process # in search term (100)</literal> is a
963 poor-man's regular expression search. It maps
964 each <literal>#</literal> to <literal>.*</literal>, and
965 performes then a <literal>Regexp-1 (102)</literal> regular
969 Truncation attribute value
970 <literal>Regexp-1 (102)</literal> is a normal regular search,
974 Truncation attribute value
975 <literal>Regexp-2 (103) </literal> is a Zebra specific extention
976 which allows <emphasis>fuzzy</emphasis> matches. One single
977 error in spelling of search terms is allowed, i.e., a document
978 is hit if it includes a term which can be mapped to the used
979 search term by one character substitution, addition, deletion or
983 Special 104, 105, 106 are deprecated and will be removed! -->
986 <sect3 id="querymodel-bib1-completeness">
987 <title>Completeness Attributes (type = 6)</title>
989 This attribute is ONLY used if structure w, p is to be
990 chosen. completeness is ignorned if not w, p is to be
992 Incomplete field(1) is the default and makes Zebra use
994 complete subfield(2) and complete field(3) both triggers
1001 <sect2 id="querymodel-zebra-attr-search">
1002 <title>Zebra specific Search Extentions to all Attribute Sets</title>
1004 Zebra extends the Bib1 attribute types, and these extentions are
1005 recognized regardless of attribute
1006 set used in a <literal>search</literal> operation query.
1009 <table id="querymodel-zebra-attr-search-table">
1010 <caption>Zebra Search Attribute Extentions</caption>
1016 <td>Zebra version</td>
1021 <td>Embedded Sort</td>
1033 <td>Rank Weight</td>
1039 <td>Approx Limit</td>
1045 <td>Term Reference</td>
1053 <sect3 id="querymodel-zebra-attr-sorting">
1054 <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1057 The embedded sort is a way to specify sort within a query - thus
1058 removing the need to send a Sort Request separately. It is both
1059 faster and does not require clients to deal with the Sort
1063 The possible values after attribute <literal>type 7</literal> are
1064 <literal>1</literal> ascending and
1065 <literal>2</literal> descending.
1066 The attributes+term (APT) node is separate from the
1067 rest and must be <literal>@or</literal>'ed.
1068 The term associated with APT is the sorting level in integers,
1069 where <literal>0</literal> means primary sort,
1070 <literal>1</literal> means secondary sort, and so forth.
1071 See also <xref linkend="administration-ranking"/>.
1074 For example, searching for water, sort by title (ascending)
1076 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1080 Or, searching for water, sort by title ascending, then date descending
1082 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1086 <sect3 id="querymodel-zebra-attr-estimation">
1087 <title>Zebra Extention Term Set Attribute (type 8)</title>
1090 The Term Set feature is a facility that allows a search to store
1091 hitting terms in a "pseudo" resultset; thus a search (as usual) +
1092 a scan-like facility. Requires a client that can do named result
1093 sets since the search generates two result sets. The value for
1094 attribute 8 is the name of a result set (string). The terms in
1095 the named term set are returned as SUTRS records.
1098 For example, searching for u in title, right truncated, and
1099 storing the result in term set named 'aset'
1101 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1105 The model has one serious flaw: we don't know the size of term
1106 set. Experimental. Do not use in production code.
1109 <sect3 id="querymodel-zebra-attr-weight">
1110 <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1113 Rank weight is a way to pass a value to a ranking algorithm - so
1114 that one APT has one value - while another as a different one.
1115 See also <xref linkend="administration-ranking"/>.
1118 For example, searching for utah in title with weight 30 as well
1119 as any with weight 20:
1121 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1125 <sect3 id="querymodel-zebra-attr-limit">
1126 <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1129 Newer Zebra versions normally estemiates hit count for every APT
1130 (leaf) in the query tree. These hit counts are returned as part of
1131 the searchResult-1 facility in the binary encoded Z39.50 search
1135 By setting a limit for the APT we can make Zebra turn into
1136 approximate hit count when a certain hit count limit is
1137 reached. A value of zero means exact hit count.
1140 For example, we might be intersted in exact hit count for a, but
1141 for b we allow hit count estimates for 1000 and higher.
1143 Z> find @and a @attr 9=1000 b
1147 The estimated hit count fascility makes searches faster, as one
1148 only needs to process large hit lists partially.
1151 This facility clashes with rank weight, because there all
1152 documents in the hit lists need to be examined for scoring and
1154 It is an experimental
1155 extention. Do not use in production code.
1158 <sect3 id="querymodel-zebra-attr-termref">
1159 <title>Zebra Extention Term Reference Attribute (type 10)</title>
1162 Zebra supports the searchResult-1 facility. If attribute 10 is
1163 given, that specifies a subqueryId value returned as part of the
1164 search result. It is a way for a client to name an APT part of a
1174 Experimental. Do not use in production code.
1181 <sect2 id="querymodel-zebra-attr-scan">
1182 <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1184 Zebra extends the Bib1 attribute types, and these extentions are
1185 recognized regardless of attribute
1186 set used in a <literal>scan</literal> operation query.
1188 <table id="querymodel-zebra-attr-scan-table">
1189 <caption>Zebra Scan Attribute Extentions</caption>
1192 <td><emphasis>Name and Type</emphasis></td>
1194 <td>Zebra version</td>
1199 <td><emphasis>Result Set Narrow (type 8)</emphasis></td>
1204 <td><emphasis>Approximative Limit (type 9)</emphasis></td>
1211 <sect3 id="querymodel-zebra-attr-xyz">
1212 <title>Zebra Extention Result Set Narrow (type 8)</title>
1215 If attribute 8 is given for scan, the value is the name of a
1216 result set. Each hit count in scan is @and'ed with the result set
1226 Experimental and buggy. Definitely not to be used in production code.
1229 <sect3 id="querymodel-zebra-attr-xyz">
1230 <title>Zebra Extention Approximative Limit (type 9)</title>
1233 The approximative limit (as for search) is a way to enable approx
1234 hit counts for scan hit counts.
1243 Experimental. Do not use in production code.
1250 <sect2 id="querymodel-bib1-mapping">
1251 <title>Mapping from Bib1 Attributes to Zebra internal
1252 register indexes</title>
1258 <!-- see in util/zebramap.c
1261 if (completeness_value == 2 || completeness_value == 3)
1267 *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1268 *search_type = "phrase";
1269 strcpy(rank_type, "void");
1270 if (relation_value == 102)
1272 if (weight_value == -1)
1274 sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1276 if (relation_value == 103)
1278 *search_type = "always";
1286 switch (structure_value)
1288 case 6: /* word list */
1289 *search_type = "and-list";
1291 case 105: /* free-form-text */
1292 *search_type = "or-list";
1294 case 106: /* document-text */
1295 *search_type = "or-list";
1298 case 1: /* phrase */
1300 case 108: /* string */
1301 *search_type = "phrase";
1303 case 107: /* local-number */
1304 *search_type = "local";
1307 case 109: /* numeric string */
1309 *search_type = "numeric";
1313 *search_type = "phrase";
1317 *search_type = "phrase";
1321 *search_type = "phrase";
1325 *search_type = "phrase";
1336 <emphasis>Use</emphasis> attributes are interpreted according to the
1337 attribute sets which have been loaded in the
1338 <literal>zebra.cfg</literal> file, and are matched against specific
1339 fields as specified in the <literal>.abs</literal> file which
1340 describes the profile of the records which have been loaded.
1341 If no Use attribute is provided, a default of Bib-1 Any is assumed.
1345 If a <emphasis>Structure</emphasis> attribute of
1346 <emphasis>Phrase</emphasis> is used in conjunction with a
1347 <emphasis>Completeness</emphasis> attribute of
1348 <emphasis>Complete (Sub)field</emphasis>, the term is matched
1349 against the contents of the phrase (long word) register, if one
1350 exists for the given <emphasis>Use</emphasis> attribute.
1351 A phrase register is created for those fields in the
1352 <literal>.abs</literal> file that contains a
1353 <literal>p</literal>-specifier.
1354 <!-- ### whatever the hell _that_ is -->
1358 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1359 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1360 default value for <emphasis>Completeness</emphasis>, the
1361 search is directed against the normal word registers, but if the term
1362 contains multiple words, the term will only match if all of the words
1363 are found immediately adjacent, and in the given order.
1364 The word search is performed on those fields that are indexed as
1365 type <literal>w</literal> in the <literal>.abs</literal> file.
1369 If the <emphasis>Structure</emphasis> attribute is
1370 <emphasis>Word List</emphasis>,
1371 <emphasis>Free-form Text</emphasis>, or
1372 <emphasis>Document Text</emphasis>, the term is treated as a
1373 natural-language, relevance-ranked query.
1374 This search type uses the word register, i.e. those fields
1375 that are indexed as type <literal>w</literal> in the
1376 <literal>.abs</literal> file.
1380 If the <emphasis>Structure</emphasis> attribute is
1381 <emphasis>Numeric String</emphasis> the term is treated as an integer.
1382 The search is performed on those fields that are indexed
1383 as type <literal>n</literal> in the <literal>.abs</literal> file.
1387 If the <emphasis>Structure</emphasis> attribute is
1388 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1389 The search is performed on those fields that are indexed as type
1390 <literal>u</literal> in the <literal>.abs</literal> file.
1394 If the <emphasis>Structure</emphasis> attribute is
1395 <emphasis>Local Number</emphasis> the term is treated as
1396 native Zebra Record Identifier.
1400 If the <emphasis>Relation</emphasis> attribute is
1401 <emphasis>Equals</emphasis> (default), the term is matched
1402 in a normal fashion (modulo truncation and processing of
1403 individual words, if required).
1404 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1405 <emphasis>Less Than or Equal</emphasis>,
1406 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1407 Equal</emphasis>, the term is assumed to be numerical, and a
1408 standard regular expression is constructed to match the given
1410 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1411 the standard natural-language query processor is invoked.
1415 For the <emphasis>Truncation</emphasis> attribute,
1416 <emphasis>No Truncation</emphasis> is the default.
1417 <emphasis>Left Truncation</emphasis> is not supported.
1418 <emphasis>Process # in search term</emphasis> is supported, as is
1419 <emphasis>Regxp-1</emphasis>.
1420 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1421 search. As a default, a single error (deletion, insertion,
1422 replacement) is accepted when terms are matched against the register
1427 <sect2 id="querymodel-regular">
1428 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1431 Each term in a query is interpreted as a regular expression if
1432 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1433 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1434 Both query types follow the same syntax with the operands:
1437 <table id="querymodel-regular-operands-table">
1438 <caption>Regular Expression Operands</caption>
1441 <tr><td>one</td><td>two</td></tr>
1446 <td><emphasis>x</emphasis></td>
1447 <td>Matches the character <emphasis>x</emphasis>.</td>
1450 <td><emphasis>.</emphasis></td>
1451 <td>Matches any character.</td>
1454 <td><emphasis>[ .. ]</emphasis></td>
1455 <td>Matches the set of characters specified;
1456 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1462 The above operands can be combined with the following operators:
1466 <table id="querymodel-regular-operators-table">
1467 <caption>Regular Expression Operators</caption>
1470 <tr><td>one</td><td>two</td></tr>
1475 <td><emphasis>x*</emphasis></td>
1476 <td>Matches <emphasis>x</emphasis> zero or more times.
1477 Priority: high.</td>
1480 <td><emphasis>x+</emphasis></td>
1481 <td>Matches <emphasis>x</emphasis> one or more times.
1482 Priority: high.</td>
1485 <td><emphasis>x?</emphasis></td>
1486 <td> Matches <emphasis>x</emphasis> zero or once.
1487 Priority: high.</td>
1490 <td><emphasis>xy</emphasis></td>
1491 <td> Matches <emphasis>x</emphasis>, then <emphasis>y</emphasis>.
1492 Priority: medium.</td>
1495 <td><emphasis>x|y</emphasis></td>
1496 <td> Matches either <emphasis>x</emphasis> or <emphasis>y</emphasis>.
1500 <td><emphasis>( )</emphasis></td>
1501 <td>The order of evaluation may be changed by using parentheses.</td>
1507 If the first character of the <emphasis>Regxp-2</emphasis> query
1508 is a plus character (<literal>+</literal>) it marks the
1509 beginning of a section with non-standard specifiers.
1510 The next plus character marks the end of the section.
1511 Currently Zebra only supports one specifier, the error tolerance,
1512 which consists one digit.
1516 Since the plus operator is normally a suffix operator the addition to
1517 the query syntax doesn't violate the syntax for standard regular
1522 For example, a phrase search with regular expressions in
1523 the title-register is performed like this:
1525 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1530 Combinations with other attributes are possible. For example, a
1531 ranked search with a regular expression
1532 (see <xref linkend="administration-ranking"/> for the glory details):
1534 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1542 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1543 the <literal>-t</literal> option to the indexer tells Zebra how to
1544 process input records.
1545 Two basic types of processing are available - raw text and structured
1546 data. Raw text is just that, and it is selected by providing the
1547 argument <emphasis>text</emphasis> to Zebra. Structured records are
1548 all handled internally using the basic mechanisms described in the
1549 subsequent sections.
1550 Zebra can read structured records in many different formats.
1556 <sect1 id="querymodel-cql-to-pqf">
1557 <title>Server Side CQL to PQF Query Translation</title>
1560 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
1561 YAZ Frontend Virtual
1562 Hosts option, one can configure
1563 the YAZ Frontend CQL-to-PQF
1564 converter, specifying the interpretation of various
1565 <ulink url="&url.cql;">CQL</ulink>
1566 indexes, relations, etc. in terms of Type-1 query attributes.
1567 <!-- The yaz-client config file -->
1570 For example, using server-side CQL-to-PQF conversion, one might
1571 query a zebra server like this:
1574 yaz-client localhost:9999
1576 Z> find text=(plant and soil)
1579 and - if properly configured - even static relevance ranking can
1580 be performed using CQL query syntax:
1583 Z> find text = /relevant (plant and soil)
1589 By the way, the same configuration can be used to
1590 search using client-side CQL-to-PQF conversion:
1591 (the only difference is <literal>querytype cql2rpn</literal>
1593 <literal>querytype cql</literal>, and the call specifying a local
1597 yaz-client -q local/cql2pqf.txt localhost:9999
1598 Z> querytype cql2rpn
1599 Z> find text=(plant and soil)
1605 Exhaustive information can be found in the
1606 Section "Specification of CQL to RPN mappings" in the YAZ manual.
1607 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1608 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1609 and shall therefore not be repeated here.
1614 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1615 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1616 for the Maintenance Agency's work-in-progress mapping of Dublin Core
1617 indexes to Attribute Architecture (util, XD and BIB-2)
1627 <!-- Keep this comment at the end of the file
1632 sgml-minimize-attributes:nil
1633 sgml-always-quote-attributes:t
1636 sgml-parent-document: "zebra.xml"
1637 sgml-local-catalogs: nil
1638 sgml-namecase-general:t