1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.10 2006-06-21 13:32:33 marc Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
9 <sect2 id="querymodel-query-languages">
10 <title>Query Languages</title>
13 Zebra is born as a networking Information Retrieval engine adhering
14 to the international standards
15 <ulink url="&url.z39.50;">Z39.50</ulink> and
16 <ulink url="&url.sru;">SRU</ulink>,
18 <literal>type-1 Reverse Polish Notation (RPN)</literal> query
20 Unfortunately, this model has only defined a binary
21 encoded representation, which is used as transport packaging in
22 the Z39.50 protocol layer. This representation is not human
23 readable, nor defines any convenient way to specify queries.
26 Since the <literal>type-1 (RPN)</literal>
27 query structure has no direct, useful string
28 representation, every origin application needs to provide some
29 form of mapping from a local query notation or representation to it.
33 <sect3 id="querymodel-query-languages-pqf">
34 <title>Prefix Query Format (PQF)</title>
37 Index Data has defined a textual representaion in the
38 <literal>Prefix Query Format</literal>, short
39 <literal>PQF</literal>, which mappes
40 <literal>one-to-one</literal> to binary encoded
41 <literal>type-1 RPN</literal> query packages.
42 It has been adopted by other
43 parties developing Z39.50 software, and is often referred to as
44 <literal>Prefix Query Notation</literal>, or in short
45 <literal>PQN</literal>. See
46 <xref linkend="querymodel-pqf"/> for further explanaitions and
47 descriptions of Zebra's capabilities.
51 <sect3 id="querymodel-query-languages-cql">
52 <title>Common Query Language (CQL)</title>
54 The query model of the <literal>type-1 RPN</literal>,
55 expressed in <literal>PQF/PQN</literal> is natively supported.
56 On the other hand, the default <literal>SRU</literal>
57 webservices <literal>Common Query Language</literal>
58 <ulink url="&url.cql;">CQL</ulink> is not natively supported.
61 Zebra can be configured to understand and map CQL to PQF. See
62 <xref linkend="querymodel-cql-to-pqf"/>.
68 <sect2 id="querymodel-operation-types">
69 <title>Operation types</title>
71 Zebra supports all of the three different
72 <literal>Z39.50/SRU</literal> operations defined in the
73 standards: <literal>explain</literal>, <literal>search</literal>,
74 and <literal>scan</literal>. A short description of the
75 functionality and purpose of each is quite in order here.
78 <sect3 id="querymodel-operation-type-explain">
79 <title>Explain Operation</title>
81 The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
82 well known to any client, but the specific
83 <emphasis>semantics</emphasis> - taking into account a
84 particular servers functionalities and abilities - must be
85 discovered from case to case. Enters the
86 <literal>explain</literal> operation, which provides the means
88 <emphasis>fields</emphasis> (also called
89 <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>
90 are provided, which default parameter the server uses, which
91 retrieve document formats are defined, and which specific parts
92 of the general query model are supported.
95 The Z39.50 embeddes the <literal>explain</literal> operation
97 <literal>search</literal> in the magic
98 <literal>IR-Explain-1</literal> database;
99 see <xref linkend="querymodel-exp1"/>.
102 In SRU, <literal>explain</literal> is an entirely seperate
103 operation, which returns an <literal>Zeerex
104 XML</literal> record according to the
105 structure defined by the protocol.
108 In both cases, the information gathered through
109 <literal>explain</literal> operations can be used to
110 auto-configure a client user interface to the servers
115 <sect3 id="querymodel-operation-type-search">
116 <title>Search Operation</title>
118 Search and retrieve interactions are the raison d'ĂȘtre.
119 They are used to query the remote database and
120 return search result documents. Search queries span from
121 simple free text searches to nested complex boolean queries,
122 targeting specific indexes, and possibly enhanced with many
123 query semantic specifications. Search interactions are the heart
124 and soul of Z39.50/SRU servers.
128 <sect3 id="querymodel-operation-type-scan">
129 <title>Scan Operation</title>
131 The <literal>scan</literal> operation is a helper functionality,
132 which operates on one index or access point a time.
136 the means to investigate the content of specific indexes.
137 Scanning an index returns a handfull of terms actually fond in
138 the indexes, and in addition the <literal>scan</literal>
139 operation returns th enumber of documents indexed by each term.
140 A search client can use this information to propose proper
141 spelling of search terms, to auto-fill search boxes, or to
142 display controlled vocabularies.
151 <sect1 id="querymodel-pqf">
152 <title>Prefix Query Format structure and syntax</title>
154 The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
155 is documented in the YAZ manual, and shall not be
156 repeated here. This textual PQF representation
157 is always during search mapped to the equivalent Zebra internal
161 <sect2 id="querymodel-pqf-tree">
162 <title>PQF tree structure</title>
164 The PQF parse tree - or the equivalent textual representation -
165 may start with one specification of the
166 <emphasis>attribute set</emphasis> used. Following is a query
168 consists of <emphasis>atomic query parts (APT)</emphasis> or
169 <emphasis>named result sets</emphasis>, eventually
170 paired by <emphasis>boolean binary operators</emphasis>, and
171 finally <emphasis>recursively combined </emphasis> into
175 <sect3 id="querymodel-attribute-sets">
176 <title>Attribute sets</title>
178 Attribute sets define the exact meaning and semantics of queries
179 issued. Zebra comes with some predefined attribute set
180 definitions, others can easily be defined and added to the
185 <table id="querymodel-attribute-sets-table"
186 frame="all" rowsep="1" colsep="1" align="center">
188 <caption>Attribute sets predefined in Zebra</caption>
192 <td>Attribute set</td>
201 <td><literal>Explain</literal></td>
202 <td><literal>exp-1</literal></td>
203 <td>Special attribute set used on the special automagic
204 <literal>IR-Explain-1</literal> database to gain information on
205 server capabilities, database names, and database
210 <td><literal>Bib1</literal></td>
211 <td><literal>bib-1</literal></td>
212 <td>Standard PQF query language attribute set which defines the
213 semantics of Z39.50 searching. In addition, all of the
214 non-use attributes (type 2-9) define the hard-wired
220 <td><literal>GILS</literal></td>
221 <td><literal>gils</literal></td>
222 <td>Extention to the <literal>Bib1</literal> attribute set.</td>
226 <td><literal>IDXPATH</literal></td>
227 <td><literal>idxpath</literal></td>
228 <td>Hardwired XPATH like attribute set, only available for
229 indexing with the GRS record model</td>
237 The use attributes (type 1) of the predefined attribute sets can
238 be reconfigured by tweaking the files
239 <filename>tab/*.att</filename>.
240 New attribute sets can be defined by adding similar files in the
241 configuration path of the server.
245 The Zebra internal query processing is modeled after
246 the <literal>Bib1</literal> attribute set, and the non-use
247 attributes type 2-6 are hard-wired in. It is therefore essential
248 to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
252 <sect3 id="querymodel-boolean-operators">
253 <title>Boolean operators</title>
255 A pair of subquery trees, or of atomic queries, is combined
256 using the standard boolean operators into new query trees.
259 <table id="querymodel-boolean-operators-table"
260 frame="all" rowsep="1" colsep="1" align="center">
262 <caption>Boolean operators</caption>
265 <tr><td>one</td><td>two</td></tr>
269 <tr><td><literal>@and</literal></td>
270 <td>binary <literal>AND</literal> operator</td>
271 <td>Set intersection of two atomic queries hit sets</td>
273 <tr><td><literal>@or</literal></td>
274 <td>binary <literal>OR</literal> operator</td>
275 <td>Set union of two atomic queries hit sets</td>
277 <tr><td><literal>@not</literal></td>
278 <td>binary <literal>AND NOT</literal> operator</td>
279 <td>Set complement of two atomic queries hit sets</td>
281 <tr><td><literal>@prox</literal></td>
282 <td>binary <literal>PROXIMY</literal> operator</td>
283 <td>Set intersection of two atomic queries hit sets. In
284 addition, the intersection set is purged for all
285 documents which do not satisfy the requested query
286 term proximity. Usually a proper subset of the AND
293 For example, we can combine the terms
294 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
295 into different searches in the default index of the default
296 attribute set as follows.
297 Querying for the union of all documents containing the
298 terms <emphasis>information</emphasis> OR
299 <emphasis>retrieval</emphasis>:
301 Z> find @or information retrieval
305 Querying for the intersection of all documents containing the
306 terms <emphasis>information</emphasis> AND
307 <emphasis>retrieval</emphasis>:
308 The hit set is a subset of the coresponding
311 Z> find @and information retrieval
315 Querying for the intersection of all documents containing the
316 terms <emphasis>information</emphasis> AND
317 <emphasis>retrieval</emphasis>, taking proximity into account:
318 The hit set is a subset of the coresponding
321 Z> find @prox information retrieval
325 Querying for the intersection of all documents containing the
326 terms <emphasis>information</emphasis> AND
327 <emphasis>retrieval</emphasis>, in the same order and near each
328 other as described in the term list
329 The hit set is a subset of the coresponding
332 Z> find "information retrieval"
338 <sect3 id="querymodel-atomic-queries">
339 <title>Atomic queries (APT)</title>
341 Atomic queries are the query parts which work on one acess point
342 only. These consist of <literal>an attribute list</literal>
343 followed by a <literal>single term</literal> or a
344 <literal>quoted term list</literal>, and are often called
345 <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
348 Unsupplied non-use attributes type 2-9 are either inherited from
349 higher nodes in the query tree, or are set to Zebra's default values.
350 See <xref linkend="querymodel-bib1"/> for details.
353 <table id="querymodel-atomic-queries-table"
354 frame="all" rowsep="1" colsep="1" align="center">
356 <caption>Atomic queries</caption>
359 <tr><td>one</td><td>two</td></tr>
363 <tr><td><emphasis>attribute list</emphasis></td>
364 <td>List of <literal>orthogonal</literal> attributes</td>
365 <td>Any of the orthogonal attribute types may be omitted,
366 these are inherited from higher query tree nodes, or if not
367 inherited, are set to the default Zebra configuration values.
370 <tr><td><emphasis>term</emphasis></td>
371 <td>single <literal>term</literal>
372 or <literal>quoted term list</literal> </td>
373 <td>Here the search terms or list of search terms is added
379 Querying for the term <emphasis>information</emphasis> in the
380 default index using the default attribite set, the server choice
381 of access point/index, and the default non-use attributes.
383 Z> find "information"
387 Equivalent query fully specified including all default values:
389 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
394 Finding all documents which have empty titles. Notice that the
395 empty term must be quoted, but is otherwise legal.
404 <sect3 id="querymodel-resultset">
405 <title>Named Result Sets</title>
407 Named result sets are supported in Zebra, and result sets can be
408 used as operands without limitations.
411 After the execution of a search, the result set is available at
412 the server, such that the client can use it for subsequent
413 searches or retrieval requests. The Z30.50 standard actually
414 stresses the fact that result sets are voliatile. It may cease
415 to exist at any time point after search, and the server will
416 send a diagnostic to the effect that the requested
417 result set does not exist any more.
421 Defining a named result set and re-using it in the next query,
422 using <literal>yaz-client</literal>.
424 Z> f @attr 1=4 mozart
426 Number of hits: 43, setno 1
428 Z> f @and @set 1 @attr 1=4 amadeus
430 Number of hits: 14, setno 2
432 Z> f @attr 1=1016 beethoven
434 Number of hits: 26, setno 3
440 Named result sets are only supported by the Z39.50 protocol.
441 The SRU web service is stateless, and therefore the notion of
442 named result sets does not exist when acessing a Zebra server by
448 <sect3 id="querymodel-use-string">
449 <title>Zebra's special use attribute type 1 of form 'string'</title>
451 The numeric <literal>use (type 1)</literal> attribute is usually
452 refered to from a given
453 attribute set. In addition, Zebra let you use
454 <emphasis>any internal index
455 name defined in your configuration</emphasis>
456 as use atribute value. This is a great feature for
457 debugging, and when you do
458 not need the complecity of defined use attribute values. It is
459 the preferred way of accessing Zebra indexes directly.
462 Finding all documents which have the term list "information
463 retrieval" in an Zebra index, using it's internal full string name.
465 Z> find @attr 1=sometext "information retrieval"
469 Searching the bib-1 use attribute 54 using it's string name:
471 Z> find @attr 1=Code-language eng
475 Searching in any silly string index - if it's defined in your
476 indexation rules and can be parsed by the PQF parser.
477 This is definitely not the recommended use of
478 this facility, as it might confuse your users with some very
481 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
485 See <xref linkend="querymodel-bib1-mapping"/> for details, and
486 <xref linkend="server-sru"/>
487 for the SRU PQF query extention using string names as a fast
492 <sect3 id="querymodel-use-xpath">
493 <title>Zebra's special use attribute type 1 of form 'XPath'
494 for GRS filters</title>
496 As we have seen above, it is possible (albeit seldom a great
498 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
499 search by defining <literal>use (type 1)</literal>
500 <emphasis>string</emphasis> attributes which in appearence
501 <emphasis>resemble XPath queries</emphasis>. There are two
502 problems with this approach: first, the XPath-look-alike has to
503 be defined at indexation time, no new undefined
504 XPath queries can entered at search time, and second, it might
505 confuse users very much that an XPath-alike index name in fact
506 gets populated from a possible entirely different XML element
507 than it pretends to acess.
510 When using the <literal>GRS Record Model</literal>
511 (see <xref linkend="record-model-grs"/>), we have the
512 possibility to embed <emphasis>life</emphasis>
514 in the PQF queries, which are here called
515 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
516 attributes. You must enable the
517 <literal>xpath enable</literal> directive in your
518 <literal>.abs</literal> config files.
521 Only a <emphasis>very</emphasis> restricted subset of the
522 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
523 standard is supported as the GRS record model is simpler than
524 a full XML DOM structure. See the following examples for
528 Finding all documents which have the term "content"
529 inside a text node found in a specific XML DOM
530 <emphasis>subtree</emphasis>, whose starting element is
533 Z> find @attr 1=/root content
534 Z> find @attr 1=/root/first content
536 <emphasis>Notice that the
537 XPath must be absolute, i.e., must start with '/', and that the
538 XPath <literal>decendant-or-self</literal> axis followed by a
539 text node selection <literal>text()</literal> is implicitly
540 appended to the stated XPath.
542 It follows that the above searches are interpreted as:
544 Z> find @attr 1=/root//text() content
545 Z> find @attr 1=/root/first//text() content
550 Filter the adressing XPath by a predicate working on exact
552 attributes (in the XML sense) can be done: return all those docs which
553 have the term "english" contained in one of all text subnodes of
554 the subtree defined by the XPath
555 <literal>/record/title[@lang='en']</literal>
557 Z> find @attr 1=/record/title[@lang='en'] english
562 Combining numeric indexes, boolean expressions,
563 and xpath based searches is possible:
565 Z> find @attr 1=/record/title @and foo bar
566 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
570 Escaping PQF keywords and other non-parseable XPath constructs
571 with <literal>'{ }'</literal> to prevent syntax errors:
573 Z> find @attr {1=/root/first[@attr='danish']} content
574 Z> find @attr {1=/root/second[@attr='danish lake']}
575 Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
579 It is worth mentioning that these dynamic performed XPath
580 queries are a performance bottelneck, as no optimized
581 specialized indexes can be used. Therefore, avoid the use of
582 this facility when speed is essential, and the database content
583 size is medium to large.
590 <sect2 id="querymodel-exp1">
591 <title>Explain Attribute Set</title>
593 The Z39.50 standard defines the
594 <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
595 <literal>exp-1</literal>, which is used to discover information
596 about a server's search semantics and functional capabilities
597 Zebra exposes a "classic"
598 Explain database by base name <literal>IR-Explain-1</literal>, which
599 is populated with system internal information.
602 The attribute-set <literal>exp-1</literal> consists of a single
603 <literal>Use (type 1)</literal> attribute.
606 In addition, the non-Use
607 <literal>bib-1</literal> attributes, that is, the types
608 <literal>Relation</literal>, <literal>Position</literal>,
609 <literal>Structure</literal>, <literal>Truncation</literal>,
610 and <literal>Completeness</literal> are imported from
611 the <literal>bib-1</literal> attribute set, and may be used
612 within any explain query.
615 <sect3 id="querymodel-exp1-use">
616 <title>Use Attributes (type = 1)</title>
618 The following Explain search atributes are supported:
619 <literal>ExplainCategory</literal> (@attr 1=1),
620 <literal>DatabaseName</literal> (@attr 1=3),
621 <literal>DateAdded</literal> (@attr 1=9),
622 <literal>DateChanged</literal>(@attr 1=10).
625 A search in the use attribute <literal>ExplainCategory</literal>
626 supports only these predefined values:
627 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
628 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
631 See <filename>tab/explain.att</filename> and the
632 <ulink url="&url.z39.50;">Z39.50</ulink> standard
633 for more information.
638 <title>Explain searches with yaz-client</title>
640 Classic Explain only defines retrieval of Explain information
641 via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
642 they don't have to - Zebra allows retrieval of this information
644 <literal>SUTRS</literal>, <literal>XML</literal>,
645 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
649 List supported categories to find out which explain commands are
653 Z> find @attr exp1 1=1 categorylist
660 Get target info, that is, investigate which databases exist at
661 this server endpoint:
664 Z> find @attr exp1 1=1 targetinfo
675 List all supported databases, the number of hits
676 is the number of databases found, which most commonly are the
678 the <literal>Default</literal> and the
679 <literal>IR-Explain-1</literal> databases.
682 Z> find @attr exp1 1=1 databaseinfo
689 Get database info record for database <literal>Default</literal>.
692 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
694 Identical query with explicitly specified attribute set:
697 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
702 Get attribute details record for database
703 <literal>Default</literal>.
704 This query is very useful to study the internal Zebra indexes.
705 If records have been indexed using the <literal>alvis</literal>
706 XSLT filter, the string representation names of the known indexes can be
710 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
712 Identical query with explicitly specified attribute set:
715 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
722 <sect2 id="querymodel-bib1">
723 <title>Bib1 Attribute Set</title>
725 Most of the information contained in this section is an excerpt of
726 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
728 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
729 Attribute Set Semantics</ulink> from 1995, also in an updated
730 <ulink url="&url.z39.50.attset.bib1;">Bib-1
731 Attribute Set</ulink>
732 version from 2003. Index Data is not the copyright holder of this
733 information, except for the configuration details, the listing of
734 Zebra's capabilities, and the example queries.
738 <sect3 id="querymodel-bib1-use">
739 <title>Use Attributes (type 1)</title>
742 A use attribute specifies an access point for any atomic query.
743 These acess points are highly dependent on the attribute set used
744 in the query, and are user configurable using the following
745 default configuration files:
746 <filename>tab/bib1.att</filename>,
747 <filename>tab/dan1.att</filename>,
748 <filename>tab/explain.att</filename>, and
749 <filename>tab/gils.att</filename>.
750 New attribute sets can be added by adding new
751 <filename>tab/*.att</filename> configuration files, which need to
752 be sourced in the main configuration <filename>zebra.cfg</filename>.
756 In addition, Zebra allows the acess of
757 <emphasis>internal index names</emphasis> and <emphasis>dynamic
758 XPath</emphasis> as use attributes.
759 See <xref linkend="querymodel-use-string"/> and
760 <xref linkend="querymodel-use-xpath"/> for
761 alternative acess to the Zebra internal index names and XPath queries.
765 Phrase search for <emphasis>information retrieval</emphasis> in
768 Z> find @attr 1=4 "information retrieval"
776 <sect2 id="querymodel-bib1-nonuse">
777 <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
779 <sect3 id="querymodel-bib1-relation">
780 <title>Relation Attributes (type 2)</title>
783 Relation attributes describe the relationship of the access
785 of the relation) to the search term as qualified by the attributes (right
786 side of the relation), e.g., Date-publication <= 1975.
789 <table id="querymodel-bib1-relation-table"
790 frame="all" rowsep="1" colsep="1" align="center">
792 <caption>Relation Attributes (type 2)</caption>
807 <td>Less than or equal</td>
817 <td>Greater or equal</td>
822 <td>Greater than</td>
847 <td>AlwaysMatches</td>
855 The relation attribute
856 <literal>relevance (102)</literal> is supported, see
857 <xref linkend="administration-ranking"/> for full information.
858 <!-- always-matches (103) not supported for all indexes -->
862 All ordering operations are based on a lexicographical ordering,
863 <emphasis>expect</emphasis> when the
864 <literal>structure attribute numeric (109)</literal> is used. In
865 this case, ordering is numerical. See
866 <xref linkend="querymodel-bib1-structure"/>.
870 Ranked search for <emphasis>information retrieval</emphasis> in
873 Z> find @attr 1=4 @attr 2=102 "information retrieval"
878 <sect3 id="querymodel-bib1-position">
879 <title>Position Attributes (type 3)</title>
882 The position attribute specifies the location of the search term
883 within the field or subfield in which it appears.
886 <table id="querymodel-bib1-position-table"
887 frame="all" rowsep="1" colsep="1" align="center">
889 <caption>Position Attributes (type 3)</caption>
899 <td>First in field </td>
904 <td>First in subfield</td>
909 <td>Any position in field</td>
917 The position attribute values <literal>first in field (1)</literal>,
918 and <literal>first in subfield(2)</literal> are unsupported.
919 Using them does not trigger an error, but silent defaults to
920 <literal>any position in field (3)</literal>.
925 <sect3 id="querymodel-bib1-structure">
926 <title>Structure Attributes (type 4)</title>
929 The structure attribute specifies the type of search
930 term. This causes the search to be mapped on
931 different Zebra internal indexes, which must have been defined
936 The possible values of the
937 <literal>structure attribute (type 4)</literal> can be defined
938 using the configuraiton file <filename>
939 tab/default.idx</filename>.
940 The default configuration is summerized in this table.
943 <table id="querymodel-bib1-structure-table"
944 frame="all" rowsep="1" colsep="1" align="center">
946 <caption>Structure Attributes (type 4)</caption>
976 <td>Date (normalized)</td>
986 <td>Date (un-normalized)</td>
991 <td>Name (normalized) </td>
996 <td>Name (un-normalized) </td>
1003 <td>unsupported</td>
1011 <td>Free-form-text</td>
1016 <td>Document-text</td>
1021 <td>Local-number</td>
1028 <td>unsupported</td>
1031 <td>Numeric string</td>
1040 The structure attribute value <literal>local-number
1042 is supported, and maps always to the Zebra internal document ID.
1047 the GILS schema (<literal>gils.abs</literal>), the
1048 west-bounding-coordinate is indexed as type <literal>n</literal>,
1049 and is therefore searched by specifying
1050 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1051 To match all those records with west-bounding-coordinate greater
1052 than -114 we use the following query:
1054 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1058 <sect3 id="querymodel-bib1-truncation">
1059 <title>Truncation Attributes (type = 5)</title>
1062 The truncation attribute specifies whether variations of one or
1063 more characters are allowed between serch term and hit terms, or
1064 not. Using non-default truncation attributes will broaden the
1065 document hit set of a search query.
1068 <table id="querymodel-bib1-truncation-table"
1069 frame="all" rowsep="1" colsep="1" align="center">
1071 <caption>Truncation Attributes (type 5)</caption>
1081 <td>Right truncation </td>
1086 <td>Left truncation</td>
1091 <td>Left and right truncation</td>
1096 <td>Do not truncate</td>
1101 <td>Process # in search term</td>
1119 Truncation attribute value
1120 <literal>Process # in search term (100)</literal> is a
1121 poor-man's regular expression search. It maps
1122 each <literal>#</literal> to <literal>.*</literal>, and
1123 performes then a <literal>Regexp-1 (102)</literal> regular
1127 Truncation attribute value
1128 <literal>Regexp-1 (102)</literal> is a normal regular search,
1132 Truncation attribute value
1133 <literal>Regexp-2 (103) </literal> is a Zebra specific extention
1134 which allows <emphasis>fuzzy</emphasis> matches. One single
1135 error in spelling of search terms is allowed, i.e., a document
1136 is hit if it includes a term which can be mapped to the used
1137 search term by one character substitution, addition, deletion or
1141 Special 104, 105, 106 are deprecated and will be removed! -->
1144 <sect3 id="querymodel-bib1-completeness">
1145 <title>Completeness Attributes (type = 6)</title>
1147 This attribute is ONLY used if structure w, p is to be
1148 chosen. completeness is ignorned if not w, p is to be
1150 Incomplete field(1) is the default and makes Zebra use
1152 complete subfield(2) and complete field(3) both triggers
1153 search field type p.
1159 <sect2 id="querymodel-zebra-attr-search">
1160 <title>Zebra specific Search Extentions to all Attribute Sets</title>
1162 Zebra extends the Bib1 attribute types, and these extentions are
1163 recognized regardless of attribute
1164 set used in a <literal>search</literal> operation query.
1167 <table id="querymodel-zebra-attr-search-table"
1168 frame="all" rowsep="1" colsep="1" align="center">
1170 <caption>Zebra Search Attribute Extentions</caption>
1176 <td>Zebra version</td>
1181 <td>Embedded Sort</td>
1193 <td>Rank Weight</td>
1199 <td>Approx Limit</td>
1205 <td>Term Reference</td>
1213 <sect3 id="querymodel-zebra-attr-sorting">
1214 <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1217 The embedded sort is a way to specify sort within a query - thus
1218 removing the need to send a Sort Request separately. It is both
1219 faster and does not require clients to deal with the Sort
1223 The possible values after attribute <literal>type 7</literal> are
1224 <literal>1</literal> ascending and
1225 <literal>2</literal> descending.
1226 The attributes+term (APT) node is separate from the
1227 rest and must be <literal>@or</literal>'ed.
1228 The term associated with APT is the sorting level in integers,
1229 where <literal>0</literal> means primary sort,
1230 <literal>1</literal> means secondary sort, and so forth.
1231 See also <xref linkend="administration-ranking"/>.
1234 For example, searching for water, sort by title (ascending)
1236 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1240 Or, searching for water, sort by title ascending, then date descending
1242 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1246 <sect3 id="querymodel-zebra-attr-estimation">
1247 <title>Zebra Extention Term Set Attribute (type 8)</title>
1250 The Term Set feature is a facility that allows a search to store
1251 hitting terms in a "pseudo" resultset; thus a search (as usual) +
1252 a scan-like facility. Requires a client that can do named result
1253 sets since the search generates two result sets. The value for
1254 attribute 8 is the name of a result set (string). The terms in
1255 the named term set are returned as SUTRS records.
1258 For example, searching for u in title, right truncated, and
1259 storing the result in term set named 'aset'
1261 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1265 The model has one serious flaw: we don't know the size of term
1266 set. Experimental. Do not use in production code.
1269 <sect3 id="querymodel-zebra-attr-weight">
1270 <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1273 Rank weight is a way to pass a value to a ranking algorithm - so
1274 that one APT has one value - while another as a different one.
1275 See also <xref linkend="administration-ranking"/>.
1278 For example, searching for utah in title with weight 30 as well
1279 as any with weight 20:
1281 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1285 <sect3 id="querymodel-zebra-attr-limit">
1286 <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1289 Newer Zebra versions normally estemiates hit count for every APT
1290 (leaf) in the query tree. These hit counts are returned as part of
1291 the searchResult-1 facility in the binary encoded Z39.50 search
1295 By setting a limit for the APT we can make Zebra turn into
1296 approximate hit count when a certain hit count limit is
1297 reached. A value of zero means exact hit count.
1300 For example, we might be intersted in exact hit count for a, but
1301 for b we allow hit count estimates for 1000 and higher.
1303 Z> find @and a @attr 9=1000 b
1307 The estimated hit count fascility makes searches faster, as one
1308 only needs to process large hit lists partially.
1311 This facility clashes with rank weight, because there all
1312 documents in the hit lists need to be examined for scoring and
1314 It is an experimental
1315 extention. Do not use in production code.
1318 <sect3 id="querymodel-zebra-attr-termref">
1319 <title>Zebra Extention Term Reference Attribute (type 10)</title>
1322 Zebra supports the <literal>searchResult-1</literal> facility.
1323 If the <literal>Term Reference Attribute (type 10)</literal> is
1324 given, that specifies a subqueryId value returned as part of the
1325 search result. It is a way for a client to name an APT part of a
1335 Experimental. Do not use in production code.
1342 <sect2 id="querymodel-zebra-attr-scan">
1343 <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1345 Zebra extends the Bib1 attribute types, and these extentions are
1346 recognized regardless of attribute
1347 set used in a <literal>scan</literal> operation query.
1349 <table id="querymodel-zebra-attr-scan-table"
1350 frame="all" rowsep="1" colsep="1" align="center">
1352 <caption>Zebra Scan Attribute Extentions</caption>
1358 <td>Zebra version</td>
1363 <td>Result Set Narrow</td>
1369 <td>Approximative Limit</td>
1377 <sect3 id="querymodel-zebra-attr-narrow">
1378 <title>Zebra Extention Result Set Narrow (type 8)</title>
1381 If attribute <literal>Result Set Narrow (type 8)</literal>
1382 is given for <literal>scan</literal>, the value is the name of a
1383 result set. Each hit count in <literal>scan</literal> is
1384 <literal>@and</literal>'ed with the result set given.
1387 Consider for example
1388 the case of scanning all title fields around the
1389 scanterm <emphasis>mozart</emphasis>, then refining the scan by
1390 issuing a filtering query for <emphasis>amadeus</emphasis> to
1391 restric the scan to the result set of the query:
1393 Z> scan @attr 1=4 mozart
1396 mozartforskningen (1)
1400 Z> f @attr 1=4 amadeus
1402 Number of hits: 15, setno 2
1404 Z> scan @attr 1=4 @attr 8=2 mozart
1407 mozartforskningen (0)
1415 Experimental. Do not use in production code.
1418 <sect3 id="querymodel-zebra-attr-approx">
1419 <title>Zebra Extention Approximative Limit (type 9)</title>
1422 The <literal>Zebra Extention Approximative Limit (type
1423 9)</literal> is a way to enable approx
1424 hit counts for <literal>scan</literal> hit counts, in the same
1425 way as for <literal>search</literal> hit counts.
1434 Experimental and buggy. Definitely not to be used in production code.
1441 <sect2 id="querymodel-idxpath">
1442 <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
1444 The attribute-set <literal>idxpath</literal> consists of a single
1445 <literal>Use (type 1)</literal> attribute. All non-use attributes
1449 This feature is enabled when defining the
1450 <literal>xpath enable</literal> option in the GRS filter
1451 <literal>*.abs</literal> configuration files. If one wants to use
1452 the special <literal>idxpath</literal> numeric attribute set, the
1453 main Zebra configuraiton file <filename>zebra.cfg</filename>
1454 directive <literal>attset: idxpath.att</literal> must be enabled.
1456 <warning>The <literal>idxpath</literal> is depreciated, may not be
1457 supported in future Zebra versions, and should definitely
1458 not be used in production code.
1461 <sect3 id="querymodel-idxpath-use">
1462 <title>IDXPATH Use Attributes (type = 1)</title>
1464 This attribute set allows one to search GRS filter indexed
1465 records by XPATH like structured index names. It is enabled by
1466 specifying the <literal></literal>
1470 <warning>The <literal>idxpath</literal> option defines hard-coded
1471 index names, which might clash with your own index names.
1474 <table id="querymodel-idxpath-use-table"
1475 frame="all" rowsep="1" colsep="1" align="center">
1477 <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
1482 <td>String Index</td>
1488 <td>XPATH Begin</td>
1490 <td>_XPATH_BEGIN</td>
1491 <td>depreciated</td>
1497 <td>depreciated</td>
1500 <td>XPATH CData</td>
1502 <td>_XPATH_CDATA</td>
1503 <td>depreciated</td>
1506 <td>XPATH Attribute Name</td>
1508 <td>_XPATH_ATTR_NAME</td>
1509 <td>depreciated</td>
1512 <td>XPATH Attribute CData</td>
1514 <td>_XPATH_ATTR_CDATA</td>
1515 <td>depreciated</td>
1522 See <filename>tab/idxpath.att</filename> for more information.
1525 Search for all documents starting with root element
1526 <literal>/root</literal> (either using the numeric or the string
1529 Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1530 Z> find @attr idxpath 1=1 @attr 4=3 root/
1531 Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1535 Search for all documents where specific nested XPATH
1536 <literal>/c1/c2/../cn</literal> exists. Notice the very
1537 counter-intuitive <emphasis>reverse</emphasis> notation!
1539 Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1540 Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1544 Search for CDATA string <emphasis>text</emphasis> in any element
1546 Z> find @attrset idxpath @attr 1=1016 text
1547 Z> find @attr 1=_XPATH_CDATA text
1551 Search for CDATA string <emphasis>anothertext</emphasis> in any
1554 Z> find @attrset idxpath @attr 1=1015 anothertext
1555 Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1559 Search for all documents with have an XML element node
1560 including an XML attribute named <emphasis>creator</emphasis>
1562 Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1563 Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1567 Combining usual <literal>bib-1</literal> attribut set searches
1568 with <literal>idxpath</literal> attribute set searches:
1570 Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1571 Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1579 <sect2 id="querymodel-bib1-mapping">
1580 <title>Mapping from Bib1 Attributes to Zebra internal
1581 register indexes</title>
1587 <!-- see in util/zebramap.c
1590 if (completeness_value == 2 || completeness_value == 3)
1596 *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1597 *search_type = "phrase";
1598 strcpy(rank_type, "void");
1599 if (relation_value == 102)
1601 if (weight_value == -1)
1603 sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1605 if (relation_value == 103)
1607 *search_type = "always";
1615 switch (structure_value)
1617 case 6: /* word list */
1618 *search_type = "and-list";
1620 case 105: /* free-form-text */
1621 *search_type = "or-list";
1623 case 106: /* document-text */
1624 *search_type = "or-list";
1627 case 1: /* phrase */
1629 case 108: /* string */
1630 *search_type = "phrase";
1632 case 107: /* local-number */
1633 *search_type = "local";
1636 case 109: /* numeric string */
1638 *search_type = "numeric";
1642 *search_type = "phrase";
1646 *search_type = "phrase";
1650 *search_type = "phrase";
1654 *search_type = "phrase";
1665 <emphasis>Use</emphasis> attributes are interpreted according to the
1666 attribute sets which have been loaded in the
1667 <literal>zebra.cfg</literal> file, and are matched against specific
1668 fields as specified in the <literal>.abs</literal> file which
1669 describes the profile of the records which have been loaded.
1670 If no Use attribute is provided, a default of Bib-1 Any is assumed.
1674 If a <emphasis>Structure</emphasis> attribute of
1675 <emphasis>Phrase</emphasis> is used in conjunction with a
1676 <emphasis>Completeness</emphasis> attribute of
1677 <emphasis>Complete (Sub)field</emphasis>, the term is matched
1678 against the contents of the phrase (long word) register, if one
1679 exists for the given <emphasis>Use</emphasis> attribute.
1680 A phrase register is created for those fields in the
1681 <literal>.abs</literal> file that contains a
1682 <literal>p</literal>-specifier.
1683 <!-- ### whatever the hell _that_ is -->
1687 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1688 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1689 default value for <emphasis>Completeness</emphasis>, the
1690 search is directed against the normal word registers, but if the term
1691 contains multiple words, the term will only match if all of the words
1692 are found immediately adjacent, and in the given order.
1693 The word search is performed on those fields that are indexed as
1694 type <literal>w</literal> in the <literal>.abs</literal> file.
1698 If the <emphasis>Structure</emphasis> attribute is
1699 <emphasis>Word List</emphasis>,
1700 <emphasis>Free-form Text</emphasis>, or
1701 <emphasis>Document Text</emphasis>, the term is treated as a
1702 natural-language, relevance-ranked query.
1703 This search type uses the word register, i.e. those fields
1704 that are indexed as type <literal>w</literal> in the
1705 <literal>.abs</literal> file.
1709 If the <emphasis>Structure</emphasis> attribute is
1710 <emphasis>Numeric String</emphasis> the term is treated as an integer.
1711 The search is performed on those fields that are indexed
1712 as type <literal>n</literal> in the <literal>.abs</literal> file.
1716 If the <emphasis>Structure</emphasis> attribute is
1717 <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1718 The search is performed on those fields that are indexed as type
1719 <literal>u</literal> in the <literal>.abs</literal> file.
1723 If the <emphasis>Structure</emphasis> attribute is
1724 <emphasis>Local Number</emphasis> the term is treated as
1725 native Zebra Record Identifier.
1729 If the <emphasis>Relation</emphasis> attribute is
1730 <emphasis>Equals</emphasis> (default), the term is matched
1731 in a normal fashion (modulo truncation and processing of
1732 individual words, if required).
1733 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1734 <emphasis>Less Than or Equal</emphasis>,
1735 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1736 Equal</emphasis>, the term is assumed to be numerical, and a
1737 standard regular expression is constructed to match the given
1739 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1740 the standard natural-language query processor is invoked.
1744 For the <emphasis>Truncation</emphasis> attribute,
1745 <emphasis>No Truncation</emphasis> is the default.
1746 <emphasis>Left Truncation</emphasis> is not supported.
1747 <emphasis>Process # in search term</emphasis> is supported, as is
1748 <emphasis>Regxp-1</emphasis>.
1749 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1750 search. As a default, a single error (deletion, insertion,
1751 replacement) is accepted when terms are matched against the register
1756 <sect2 id="querymodel-regular">
1757 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1760 Each term in a query is interpreted as a regular expression if
1761 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1762 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1763 Both query types follow the same syntax with the operands:
1766 <table id="querymodel-regular-operands-table"
1767 frame="all" rowsep="1" colsep="1" align="center">
1769 <caption>Regular Expression Operands</caption>
1772 <tr><td>one</td><td>two</td></tr>
1777 <td><literal>x</literal></td>
1778 <td>Matches the character <literal>x</literal>.</td>
1781 <td><literal>.</literal></td>
1782 <td>Matches any character.</td>
1785 <td><literal>[ .. ]</literal></td>
1786 <td>Matches the set of characters specified;
1787 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1793 The above operands can be combined with the following operators:
1796 <table id="querymodel-regular-operators-table"
1797 frame="all" rowsep="1" colsep="1" align="center">
1798 <caption>Regular Expression Operators</caption>
1801 <tr><td>one</td><td>two</td></tr>
1806 <td><literal>x*</literal></td>
1807 <td>Matches <literal>x</literal> zero or more times.
1808 Priority: high.</td>
1811 <td><literal>x+</literal></td>
1812 <td>Matches <literal>x</literal> one or more times.
1813 Priority: high.</td>
1816 <td><literal>x?</literal></td>
1817 <td> Matches <literal>x</literal> zero or once.
1818 Priority: high.</td>
1821 <td><literal>xy</literal></td>
1822 <td> Matches <literal>x</literal>, then <literal>y</literal>.
1823 Priority: medium.</td>
1826 <td><literal>x|y</literal></td>
1827 <td> Matches either <literal>x</literal> or <literal>y</literal>.
1831 <td><literal>( )</literal></td>
1832 <td>The order of evaluation may be changed by using parentheses.</td>
1838 If the first character of the <literal>Regxp-2</literal> query
1839 is a plus character (<literal>+</literal>) it marks the
1840 beginning of a section with non-standard specifiers.
1841 The next plus character marks the end of the section.
1842 Currently Zebra only supports one specifier, the error tolerance,
1843 which consists one digit.
1847 Since the plus operator is normally a suffix operator the addition to
1848 the query syntax doesn't violate the syntax for standard regular
1853 For example, a phrase search with regular expressions in
1854 the title-register is performed like this:
1856 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1861 Combinations with other attributes are possible. For example, a
1862 ranked search with a regular expression:
1864 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1872 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1873 the <literal>-t</literal> option to the indexer tells Zebra how to
1874 process input records.
1875 Two basic types of processing are available - raw text and structured
1876 data. Raw text is just that, and it is selected by providing the
1877 argument <literal>text</literal> to Zebra. Structured records are
1878 all handled internally using the basic mechanisms described in the
1879 subsequent sections.
1880 Zebra can read structured records in many different formats.
1886 <sect1 id="querymodel-cql-to-pqf">
1887 <title>Server Side CQL to PQF Query Translation</title>
1890 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
1891 YAZ Frontend Virtual
1892 Hosts option, one can configure
1893 the YAZ Frontend CQL-to-PQF
1894 converter, specifying the interpretation of various
1895 <ulink url="&url.cql;">CQL</ulink>
1896 indexes, relations, etc. in terms of Type-1 query attributes.
1897 <!-- The yaz-client config file -->
1900 For example, using server-side CQL-to-PQF conversion, one might
1901 query a zebra server like this:
1904 yaz-client localhost:9999
1906 Z> find text=(plant and soil)
1909 and - if properly configured - even static relevance ranking can
1910 be performed using CQL query syntax:
1913 Z> find text = /relevant (plant and soil)
1919 By the way, the same configuration can be used to
1920 search using client-side CQL-to-PQF conversion:
1921 (the only difference is <literal>querytype cql2rpn</literal>
1923 <literal>querytype cql</literal>, and the call specifying a local
1927 yaz-client -q local/cql2pqf.txt localhost:9999
1928 Z> querytype cql2rpn
1929 Z> find text=(plant and soil)
1935 Exhaustive information can be found in the
1936 Section "Specification of CQL to RPN mappings" in the YAZ manual.
1937 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1938 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1939 and shall therefore not be repeated here.
1944 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1945 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1946 for the Maintenance Agency's work-in-progress mapping of Dublin Core
1947 indexes to Attribute Architecture (util, XD and BIB-2)
1957 <!-- Keep this comment at the end of the file
1962 sgml-minimize-attributes:nil
1963 sgml-always-quote-attributes:t
1966 sgml-parent-document: "zebra.xml"
1967 sgml-local-catalogs: nil
1968 sgml-namecase-general:t