1 <chapter id="querymodel">
2 <!-- $Id: querymodel.xml,v 1.18 2006-06-29 16:02:12 heikki Exp $ -->
3 <title>Query Model</title>
5 <sect1 id="querymodel-overview">
6 <title>Query Model Overview</title>
8 <sect2 id="querymodel-query-languages">
9 <title>Query Languages</title>
12 Zebra is born as a networking Information Retrieval engine adhering
13 to the international standards
14 <ulink url="&url.z39.50;">Z39.50</ulink> and
15 <ulink url="&url.sru;">SRU</ulink>,
17 <literal>type-1 Reverse Polish Notation (RPN)</literal> query
19 Unfortunately, this model has only defined a binary
20 encoded representation, which is used as transport packaging in
21 the Z39.50 protocol layer. This representation is not human
22 readable, nor defines any convenient way to specify queries.
25 Since the <literal>type-1 (RPN)</literal>
26 query structure has no direct, useful string
27 representation, every origin application needs to provide some
28 form of mapping from a local query notation or representation to it.
32 <sect3 id="querymodel-query-languages-pqf">
33 <title>Prefix Query Format (PQF)</title>
35 Index Data has defined a textual representation in the
36 <ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
37 <emphasis>PQF</emphasis>, which maps
38 one-to-one to binary encoded
39 <emphasis>type-1 RPN</emphasis> queries.
40 PQF has been adopted by other
41 parties developing Z39.50 software, and is often referred to as
42 <literal>Prefix Query Notation</literal>, or in short
43 <literal>PQN</literal>. See
44 <xref linkend="querymodel-pqf"/> for further explanations and
45 descriptions of Zebra's capabilities.
49 <sect3 id="querymodel-query-languages-cql">
50 <title>Common Query Language (CQL)</title>
52 The query model of the type-1 RPN,
53 expressed in PQF/PQN is natively supported.
54 On the other hand, the default SRU
55 web services <emphasis>Common Query Language</emphasis>
56 <ulink url="&url.cql;">CQL</ulink> is not natively supported.
59 Zebra can be configured to understand and map CQL to PQF. See
60 <xref linkend="querymodel-cql-to-pqf"/>.
66 <sect2 id="querymodel-operation-types">
67 <title>Operation types</title>
69 Zebra supports all of the three different
70 <literal>Z39.50/SRU</literal> operations defined in the
71 standards: <literal>explain</literal>, <literal>search</literal>,
72 and <literal>scan</literal>. A short description of the
73 functionality and purpose of each is quite in order here.
76 <sect3 id="querymodel-operation-type-explain">
77 <title>Explain Operation</title>
79 The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
80 well known to any client, but the specific
81 <emphasis>semantics</emphasis> - taking into account a
82 particular servers functionalities and abilities - must be
83 discovered from case to case. Enters the
84 <literal>explain</literal> operation, which provides the means
86 <emphasis>fields</emphasis> (also called
87 <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>)
88 are provided, which default parameter the server uses, which
89 retrieve document formats are defined, and which specific parts
90 of the general query model are supported.
93 The Z39.50 embeds the <literal>explain</literal> operation
95 <literal>search</literal> in the magic
96 <literal>IR-Explain-1</literal> database;
97 see <xref linkend="querymodel-exp1"/>.
100 In SRU, <literal>explain</literal> is an entirely separate
101 operation, which returns an <literal>ZeeRex
102 XML</literal> record according to the
103 structure defined by the protocol.
106 In both cases, the information gathered through
107 <literal>explain</literal> operations can be used to
108 auto-configure a client user interface to the servers
113 <sect3 id="querymodel-operation-type-search">
114 <title>Search Operation</title>
116 Search and retrieve interactions are the raison d'ĂȘtre.
117 They are used to query the remote database and
118 return search result documents. Search queries span from
119 simple free text searches to nested complex boolean queries,
120 targeting specific indexes, and possibly enhanced with many
121 query semantic specifications. Search interactions are the heart
122 and soul of Z39.50/SRU servers.
126 <sect3 id="querymodel-operation-type-scan">
127 <title>Scan Operation</title>
129 The <literal>scan</literal> operation is a helper functionality,
130 which operates on one index or access point a time.
134 the means to investigate the content of specific indexes.
135 Scanning an index returns a handful of terms actually found in
136 the indexes, and in addition the <literal>scan</literal>
137 operation returns the number of documents indexed by each term.
138 A search client can use this information to propose proper
139 spelling of search terms, to auto-fill search boxes, or to
140 display controlled vocabularies.
149 <sect1 id="querymodel-pqf">
150 <title>Prefix Query Format syntax and semantics</title>
152 The <ulink url="&url.yaz.pqf;">PQF grammar</ulink>
153 is documented in the YAZ manual, and shall not be
154 repeated here. This textual PQF representation
155 is always during search mapped to the equivalent Zebra internal
159 <sect2 id="querymodel-pqf-tree">
160 <title>PQF tree structure</title>
162 The PQF parse tree - or the equivalent textual representation -
163 may start with one specification of the
164 <emphasis>attribute set</emphasis> used. Following is a query
166 consists of <emphasis>atomic query parts (APT)</emphasis> or
167 <emphasis>named result sets</emphasis>, eventually
168 paired by <emphasis>boolean binary operators</emphasis>, and
169 finally <emphasis>recursively combined </emphasis> into
173 <sect3 id="querymodel-attribute-sets">
174 <title>Attribute sets</title>
176 Attribute sets define the exact meaning and semantics of queries
177 issued. Zebra comes with some predefined attribute set
178 definitions, others can easily be defined and added to the
183 <table id="querymodel-attribute-sets-table"
184 frame="all" rowsep="1" colsep="1" align="center">
186 <caption>Attribute sets predefined in Zebra</caption>
190 <td>Attribute set</td>
199 <td><literal>Explain</literal></td>
200 <td><literal>exp-1</literal></td>
201 <td>Special attribute set used on the special automagic
202 <literal>IR-Explain-1</literal> database to gain information on
203 server capabilities, database names, and database
208 <td><literal>Bib1</literal></td>
209 <td><literal>bib-1</literal></td>
210 <td>Standard PQF query language attribute set which defines the
211 semantics of Z39.50 searching. In addition, all of the
212 non-use attributes (type 2-9) define the hard-wired
218 <td><literal>GILS</literal></td>
219 <td><literal>gils</literal></td>
220 <td>Extension to the <literal>Bib1</literal> attribute set.</td>
225 <td><literal>IDXPATH</literal></td>
226 <td><literal>idxpath</literal></td>
227 <td>Hardwired XPATH like attribute set, only available for
228 indexing with the GRS record model</td>
237 The <literal>use attributes (type 1)</literal> mappings the
238 predefined attribute sets are found in the
239 attribute set configuration files <filename>tab/*.att</filename>.
243 The Zebra internal query processing is modeled after
244 the <literal>Bib1</literal> attribute set, and the non-use
245 attributes type 2-6 are hard-wired in. It is therefore essential
246 to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
250 <sect3 id="querymodel-boolean-operators">
251 <title>Boolean operators</title>
253 A pair of sub query trees, or of atomic queries, is combined
254 using the standard boolean operators into new query trees.
255 Thus, boolean operators are always internal nodes in the query tree.
258 <table id="querymodel-boolean-operators-table"
259 frame="all" rowsep="1" colsep="1" align="center">
261 <caption>Boolean operators</caption>
270 <tr><td><literal>@and</literal></td>
271 <td>binary <literal>AND</literal> operator</td>
272 <td>Set intersection of two atomic queries hit sets</td>
274 <tr><td><literal>@or</literal></td>
275 <td>binary <literal>OR</literal> operator</td>
276 <td>Set union of two atomic queries hit sets</td>
278 <tr><td><literal>@not</literal></td>
279 <td>binary <literal>AND NOT</literal> operator</td>
280 <td>Set complement of two atomic queries hit sets</td>
282 <tr><td><literal>@prox</literal></td>
283 <td>binary <literal>PROXIMITY</literal> operator</td>
284 <td>Set intersection of two atomic queries hit sets. In
285 addition, the intersection set is purged for all
286 documents which do not satisfy the requested query
287 term proximity. Usually a proper subset of the AND
294 For example, we can combine the terms
295 <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
296 into different searches in the default index of the default
297 attribute set as follows.
298 Querying for the union of all documents containing the
299 terms <emphasis>information</emphasis> OR
300 <emphasis>retrieval</emphasis>:
302 Z> find @or information retrieval
306 Querying for the intersection of all documents containing the
307 terms <emphasis>information</emphasis> AND
308 <emphasis>retrieval</emphasis>:
309 The hit set is a subset of the corresponding
312 Z> find @and information retrieval
316 Querying for the intersection of all documents containing the
317 terms <emphasis>information</emphasis> AND
318 <emphasis>retrieval</emphasis>, taking proximity into account:
319 The hit set is a subset of the corresponding
321 (see the <ulink url="&url.yaz.pqf;">PQF grammar</ulink> for
322 details on the proximity operator):
324 Z> find @prox 0 3 0 2 k 2 information retrieval
328 Querying for the intersection of all documents containing the
329 terms <emphasis>information</emphasis> AND
330 <emphasis>retrieval</emphasis>, in the same order and near each
331 other as described in the term list.
332 The hit set is a subset of the corresponding
335 Z> find "information retrieval"
341 <sect3 id="querymodel-atomic-queries">
342 <title>Atomic queries (APT)</title>
344 Atomic queries are the query parts which work on one access point
345 only. These consist of <literal>an attribute list</literal>
346 followed by a <literal>single term</literal> or a
347 <literal>quoted term list</literal>, and are often called
348 <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
351 Atomic (APT) queries are always leaf nodes in the PQF query tree.
352 UN-supplied non-use attributes type 2-9 are either inherited from
353 higher nodes in the query tree, or are set to Zebra's default values.
354 See <xref linkend="querymodel-bib1"/> for details.
357 <table id="querymodel-atomic-queries-table"
358 frame="all" rowsep="1" colsep="1" align="center">
360 <caption>Atomic queries (APT)</caption>
370 <td><emphasis>attribute list</emphasis></td>
371 <td>List of <literal>orthogonal</literal> attributes</td>
372 <td>Any of the orthogonal attribute types may be omitted,
373 these are inherited from higher query tree nodes, or if not
374 inherited, are set to the default Zebra configuration values.
378 <td><emphasis>term</emphasis></td>
379 <td>single <literal>term</literal>
380 or <literal>quoted term list</literal> </td>
381 <td>Here the search terms or list of search terms is added
387 Querying for the term <emphasis>information</emphasis> in the
388 default index using the default attribute set, the server choice
389 of access point/index, and the default non-use attributes.
395 Equivalent query fully specified including all default values:
397 Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
402 Finding all documents which have the term
403 <emphasis>debussy</emphasis> in the title field.
405 Z> find @attr 1=4 debussy
410 The <literal>scan</literal> operation is only supported with
411 atomic APT queries, as it is bound to one access point at a
412 time. Boolean query trees are not allowed during
413 <literal>scan</literal>.
417 For example, we might want to scan the title index, starting with
419 <emphasis>debussy</emphasis>, and displaying this and the
420 following terms in lexicographic order:
422 Z> scan @attr 1=4 debussy
428 <sect3 id="querymodel-resultset">
429 <title>Named Result Sets</title>
431 Named result sets are supported in Zebra, and result sets can be
432 used as operands without limitations. It follows that named
433 result sets are leaf nodes in the PQF query tree, exactly as
434 atomic APT queries are.
437 After the execution of a search, the result set is available at
438 the server, such that the client can use it for subsequent
439 searches or retrieval requests. The Z30.50 standard actually
440 stresses the fact that result sets are volatile. It may cease
441 to exist at any time point after search, and the server will
442 send a diagnostic to the effect that the requested
443 result set does not exist any more.
447 Defining a named result set and re-using it in the next query,
448 using <literal>yaz-client</literal>.
450 Z> f @attr 1=4 mozart
452 Number of hits: 43, setno 1
454 Z> f @and @set 1 @attr 1=4 amadeus
456 Number of hits: 14, setno 2
458 Z> f @attr 1=1016 beethoven
460 Number of hits: 26, setno 3
466 Named result sets are only supported by the Z39.50 protocol.
467 The SRU web service is stateless, and therefore the notion of
468 named result sets does not exist when accessing a Zebra server by
474 <sect3 id="querymodel-use-string">
475 <title>Zebra's special access point of type 'string'</title>
477 The numeric <literal>use (type 1)</literal> attribute is usually
478 referred to from a given
479 attribute set. In addition, Zebra let you use
480 <emphasis>any internal index
481 name defined in your configuration</emphasis>
482 as use attribute value. This is a great feature for
483 debugging, and when you do
484 not need the complexity of defined use attribute values. It is
485 the preferred way of accessing Zebra indexes directly.
488 Finding all documents which have the term list "information
489 retrieval" in an Zebra index, using it's internal full string
490 name. Scanning the same index.
492 Z> find @attr 1=sometext "information retrieval"
493 Z> scan @attr 1=sometext aterm
497 Searching or scanning
498 the bib-1 use attribute 54 using it's string name:
500 Z> find @attr 1=Code-language eng
501 Z> scan @attr 1=Code-language ""
505 It is possible to search
506 in any silly string index - if it's defined in your
507 indexation rules and can be parsed by the PQF parser.
508 This is definitely not the recommended use of
509 this facility, as it might confuse your users with some very
512 Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
516 See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
517 <xref linkend="server-sru"/>
518 for the SRU PQF query extension using string names as a fast
523 <sect3 id="querymodel-use-xpath">
524 <title>Zebra's special access point of type 'XPath'
525 for GRS filters</title>
527 As we have seen above, it is possible (albeit seldom a great
529 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
530 search by defining <literal>use (type 1)</literal>
531 <emphasis>string</emphasis> attributes which in appearance
532 <emphasis>resemble XPath queries</emphasis>. There are two
533 problems with this approach: first, the XPath-look-alike has to
534 be defined at indexation time, no new undefined
535 XPath queries can entered at search time, and second, it might
536 confuse users very much that an XPath-alike index name in fact
537 gets populated from a possible entirely different XML element
538 than it pretends to access.
541 When using the <literal>GRS Record Model</literal>
542 (see <xref linkend="record-model-grs"/>), we have the
543 possibility to embed <emphasis>life</emphasis>
545 in the PQF queries, which are here called
546 <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
547 attributes. You must enable the
548 <literal>xpath enable</literal> directive in your
549 <literal>.abs</literal> configuration files.
552 Only a <emphasis>very</emphasis> restricted subset of the
553 <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
554 standard is supported as the GRS record model is simpler than
555 a full XML DOM structure. See the following examples for
559 Finding all documents which have the term "content"
560 inside a text node found in a specific XML DOM
561 <emphasis>subtree</emphasis>, whose starting element is
564 Z> find @attr 1=/root content
565 Z> find @attr 1=/root/first content
567 <emphasis>Notice that the
568 XPath must be absolute, i.e., must start with '/', and that the
569 XPath <literal>descendant-or-self</literal> axis followed by a
570 text node selection <literal>text()</literal> is implicitly
571 appended to the stated XPath.
573 It follows that the above searches are interpreted as:
575 Z> find @attr 1=/root//text() content
576 Z> find @attr 1=/root/first//text() content
581 Searching inside attribute strings is possible:
583 Z> find @attr 1=/link/@creator morten
588 Filter the addressing XPath by a predicate working on exact
590 attributes (in the XML sense) can be done: return all those docs which
591 have the term "english" contained in one of all text sub nodes of
592 the subtree defined by the XPath
593 <literal>/record/title[@lang='en']</literal>. And similar
596 Z> find @attr 1=/record/title[@lang='en'] english
597 Z> find @attr 1=/link[@creator='sisse'] sibelius
598 Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
603 Combining numeric indexes, boolean expressions,
604 and xpath based searches is possible:
606 Z> find @attr 1=/record/title @and foo bar
607 Z> find @and @attr 1=/record/title foo @attr 1=4 bar
611 Escaping PQF keywords and other non-parseable XPath constructs
612 with <literal>'{ }'</literal> to prevent syntax errors:
614 Z> find @attr {1=/root/first[@attr='danish']} content
615 Z> find @attr {1=/record/@set} oai
619 It is worth mentioning that these dynamic performed XPath
620 queries are a performance bottleneck, as no optimized
621 specialized indexes can be used. Therefore, avoid the use of
622 this facility when speed is essential, and the database content
623 size is medium to large.
630 <sect2 id="querymodel-exp1">
631 <title>Explain Attribute Set</title>
633 The Z39.50 standard defines the
634 <ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
635 <literal>Exp-1</literal>, which is used to discover information
636 about a server's search semantics and functional capabilities
637 Zebra exposes a "classic"
638 Explain database by base name <literal>IR-Explain-1</literal>, which
639 is populated with system internal information.
642 The attribute-set <literal>exp-1</literal> consists of a single
643 <literal>use attribute (type 1)</literal>.
646 In addition, the non-Use
647 <literal>bib-1</literal> attributes, that is, the types
648 <literal>Relation</literal>, <literal>Position</literal>,
649 <literal>Structure</literal>, <literal>Truncation</literal>,
650 and <literal>Completeness</literal> are imported from
651 the <literal>bib-1</literal> attribute set, and may be used
652 within any explain query.
655 <sect3 id="querymodel-exp1-use">
656 <title>Use Attributes (type = 1)</title>
658 The following Explain search attributes are supported:
659 <literal>ExplainCategory</literal> (@attr 1=1),
660 <literal>DatabaseName</literal> (@attr 1=3),
661 <literal>DateAdded</literal> (@attr 1=9),
662 <literal>DateChanged</literal>(@attr 1=10).
665 A search in the use attribute <literal>ExplainCategory</literal>
666 supports only these predefined values:
667 <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
668 <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
671 See <filename>tab/explain.att</filename> and the
672 <ulink url="&url.z39.50;">Z39.50</ulink> standard
673 for more information.
678 <title>Explain searches with yaz-client</title>
680 Classic Explain only defines retrieval of Explain information
681 via ASN.1. Practically no Z39.50 clients supports this. Fortunately
682 they don't have to - Zebra allows retrieval of this information
684 <literal>SUTRS</literal>, <literal>XML</literal>,
685 <literal>GRS-1</literal> and <literal>ASN.1</literal> Explain.
689 List supported categories to find out which explain commands are
693 Z> find @attr exp1 1=1 categorylist
700 Get target info, that is, investigate which databases exist at
701 this server endpoint:
704 Z> find @attr exp1 1=1 targetinfo
715 List all supported databases, the number of hits
716 is the number of databases found, which most commonly are the
718 the <literal>Default</literal> and the
719 <literal>IR-Explain-1</literal> databases.
722 Z> find @attr exp1 1=1 databaseinfo
729 Get database info record for database <literal>Default</literal>.
732 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
734 Identical query with explicitly specified attribute set:
737 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
742 Get attribute details record for database
743 <literal>Default</literal>.
744 This query is very useful to study the internal Zebra indexes.
745 If records have been indexed using the <literal>alvis</literal>
746 XSLT filter, the string representation names of the known indexes can be
750 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
752 Identical query with explicitly specified attribute set:
755 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
762 <sect2 id="querymodel-bib1">
763 <title>Bib1 Attribute Set</title>
765 Most of the information contained in this section is an excerpt of
766 the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
768 found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
769 Attribute Set Semantics</ulink> from 1995, also in an updated
770 <ulink url="&url.z39.50.attset.bib1;">Bib-1
771 Attribute Set</ulink>
772 version from 2003. Index Data is not the copyright holder of this
773 information, except for the configuration details, the listing of
774 Zebra's capabilities, and the example queries.
778 <sect3 id="querymodel-bib1-use">
779 <title>Use Attributes (type 1)</title>
782 A use attribute specifies an access point for any atomic query.
783 These access points are highly dependent on the attribute set used
784 in the query, and are user configurable using the following
785 default configuration files:
786 <filename>tab/bib1.att</filename>,
787 <filename>tab/dan1.att</filename>,
788 <filename>tab/explain.att</filename>, and
789 <filename>tab/gils.att</filename>.
790 New attribute sets can be added by adding new
791 <filename>tab/*.att</filename> configuration files, which need to
792 be sourced in the main configuration <filename>zebra.cfg</filename>.
796 In addition, Zebra allows the access of
797 <emphasis>internal index names</emphasis> and <emphasis>dynamic
798 XPath</emphasis> as use attributes; see
799 <xref linkend="querymodel-use-string"/> and
800 <xref linkend="querymodel-use-xpath"/>.
804 Phrase search for <emphasis>information retrieval</emphasis> in
805 the title-register, scanning the same register afterwards:
807 Z> find @attr 1=4 "information retrieval"
808 Z> scan @attr 1=4 information
816 <sect2 id="querymodel-bib1-nonuse">
817 <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
819 <sect3 id="querymodel-bib1-relation">
820 <title>Relation Attributes (type 2)</title>
823 Relation attributes describe the relationship of the access
825 of the relation) to the search term as qualified by the attributes (right
826 side of the relation), e.g., Date-publication <= 1975.
829 <table id="querymodel-bib1-relation-table"
830 frame="all" rowsep="1" colsep="1" align="center">
832 <caption>Relation Attributes (type 2)</caption>
847 <td>Less than or equal</td>
857 <td>Greater or equal</td>
862 <td>Greater than</td>
887 <td>AlwaysMatches</td>
895 The relation attributes
896 <literal>1-5</literal> are supported and work exactly as
898 All ordering operations are based on a lexicographical ordering,
899 <emphasis>expect</emphasis> when the
900 <literal>structure attribute numeric (109)</literal> is used. In
901 this case, ordering is numerical. See
902 <xref linkend="querymodel-bib1-structure"/>.
904 Z> find @attr 1=Title @attr 2=1 music
906 Number of hits: 11745, setno 1
908 Z> find @attr 1=Title @attr 2=2 music
910 Number of hits: 11771, setno 2
912 Z> find @attr 1=Title @attr 2=3 music
914 Number of hits: 532, setno 3
916 Z> find @attr 1=Title @attr 2=4 music
918 Number of hits: 11463, setno 4
920 Z> find @attr 1=Title @attr 2=5 music
922 Number of hits: 11419, setno 5
927 The relation attribute
928 <literal>Relevance (102)</literal> is supported, see
929 <xref linkend="administration-ranking"/> for full information.
933 Ranked search for <emphasis>information retrieval</emphasis> in
936 Z> find @attr 1=4 @attr 2=102 "information retrieval"
941 The relation attribute
942 <literal>AlwaysMatches (103)</literal> is in the default
944 supported in conjecture with structure attribute
945 <literal>Phrase (1)</literal> (which may be omitted by
947 It can be configured to work with other structure attributes,
948 see the configuration file
949 <filename>tab/default.idx</filename> and
950 <xref linkend="querymodel-pqf-apt-mapping"/>.
953 <literal>AlwaysMatches (103)</literal> is a
954 great way to discover how many documents have been indexed in a
955 given field. The search term is ignored, but needed for correct
956 PQF syntax. An empty search term may be supplied.
958 Z> find @attr 1=Title @attr 2=103 ""
959 Z> find @attr 1=Title @attr 2=103 @attr 4=1 ""
966 <sect3 id="querymodel-bib1-position">
967 <title>Position Attributes (type 3)</title>
970 The position attribute specifies the location of the search term
971 within the field or subfield in which it appears.
974 <table id="querymodel-bib1-position-table"
975 frame="all" rowsep="1" colsep="1" align="center">
977 <caption>Position Attributes (type 3)</caption>
987 <td>First in field </td>
992 <td>First in subfield</td>
997 <td>Any position in field</td>
1005 The position attribute values <literal>first in field (1)</literal>,
1006 and <literal>first in subfield(2)</literal> are unsupported.
1007 Using them does not trigger an error, but silent defaults to
1008 <literal>any position in field (3)</literal>.
1013 <sect3 id="querymodel-bib1-structure">
1014 <title>Structure Attributes (type 4)</title>
1017 The structure attribute specifies the type of search
1018 term. This causes the search to be mapped on
1019 different Zebra internal indexes, which must have been defined
1024 The possible values of the
1025 <literal>structure attribute (type 4)</literal> can be defined
1026 using the configuration file <filename>
1027 tab/default.idx</filename>.
1028 The default configuration is summarized in this table.
1031 <table id="querymodel-bib1-structure-table"
1032 frame="all" rowsep="1" colsep="1" align="center">
1034 <caption>Structure Attributes (type 4)</caption>
1064 <td>Date (normalized)</td>
1074 <td>Date (un-normalized)</td>
1076 <td>unsupported</td>
1079 <td>Name (normalized) </td>
1081 <td>unsupported</td>
1084 <td>Name (un-normalized) </td>
1086 <td>unsupported</td>
1091 <td>unsupported</td>
1099 <td>Free-form-text</td>
1104 <td>Document-text</td>
1109 <td>Local-number</td>
1116 <td>unsupported</td>
1119 <td>Numeric string</td>
1128 The structure attribute values
1129 <literal>Word list (6)</literal>
1130 is supported, and maps to the boolean <literal>AND</literal>
1131 combination of words supplied. The word list is useful when
1132 google-like bag-of-word queries need to be translated from a GUI
1133 query language to PQF. For example, the following queries
1136 Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
1137 Z> find @attr 1=Title @and mozart amadeus
1142 The structure attribute value
1143 <literal>Free-form-text (105)</literal> and
1144 <literal>Document-text (106)</literal>
1145 are supported, and map both to the boolean <literal>OR</literal>
1146 combination of words supplied. The following queries
1149 Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
1150 Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
1151 Z> find @attr 1=Body-of-text @or bach @or salieri teleman
1153 This <literal>OR</literal> list of terms is very useful in
1154 combination with relevance ranking:
1156 Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
1161 The structure attribute value
1162 <literal>Local number (107)</literal>
1163 is supported, and maps always to the Zebra internal document ID,
1164 irrespectively which use attribute is specified. The following queries
1165 have exactly the same unique record in the hit set:
1167 Z> find @attr 4=107 10
1168 Z> find @attr 1=4 @attr 4=107 10
1169 Z> find @attr 1=1010 @attr 4=107 10
1175 the GILS schema (<literal>gils.abs</literal>), the
1176 west-bounding-coordinate is indexed as type <literal>n</literal>,
1177 and is therefore searched by specifying
1178 <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1179 To match all those records with west-bounding-coordinate greater
1180 than -114 we use the following query:
1182 Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1186 The exact mapping between PQF queries and Zebra internal indexes
1187 and index types is explained in
1188 <xref linkend="querymodel-pqf-apt-mapping"/>.
1193 <sect3 id="querymodel-bib1-truncation">
1194 <title>Truncation Attributes (type = 5)</title>
1197 The truncation attribute specifies whether variations of one or
1198 more characters are allowed between search term and hit terms, or
1199 not. Using non-default truncation attributes will broaden the
1200 document hit set of a search query.
1203 <table id="querymodel-bib1-truncation-table"
1204 frame="all" rowsep="1" colsep="1" align="center">
1206 <caption>Truncation Attributes (type 5)</caption>
1216 <td>Right truncation </td>
1221 <td>Left truncation</td>
1226 <td>Left and right truncation</td>
1231 <td>Do not truncate</td>
1236 <td>Process # in search term</td>
1254 The truncation attribute values 1-3 perform the obvious way:
1256 Z> scan @attr 1=Body-of-text schnittke
1262 Z> find @attr 1=Body-of-text @attr 5=1 schnittke
1264 Number of hits: 95, setno 7
1266 Z> find @attr 1=Body-of-text @attr 5=2 schnittke
1268 Number of hits: 81, setno 6
1270 Z> find @attr 1=Body-of-text @attr 5=3 schnittke
1272 Number of hits: 95, setno 8
1277 The truncation attribute value
1278 <literal>Process # in search term (101)</literal> is a
1279 poor-man's regular expression search. It maps
1280 each <literal>#</literal> to <literal>.*</literal>, and
1281 performs then a <literal>Regexp-1 (102)</literal> regular
1282 expression search. The following two queries are equivalent:
1284 Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke
1285 Z> find @attr 1=Body-of-text @attr 5=102 schnit.*ke
1287 Number of hits: 89, setno 10
1292 The truncation attribute value
1293 <literal>Regexp-1 (102)</literal> is a normal regular search,
1294 see <xref linkend="querymodel-regular"/> for details.
1296 Z> find @attr 1=Body-of-text @attr 5=102 schnit+ke
1297 Z> find @attr 1=Body-of-text @attr 5=102 schni[a-t]+ke
1302 The truncation attribute value
1303 <literal>Regexp-2 (103) </literal> is a Zebra specific extension
1304 which allows <emphasis>fuzzy</emphasis> matches. One single
1305 error in spelling of search terms is allowed, i.e., a document
1306 is hit if it includes a term which can be mapped to the used
1307 search term by one character substitution, addition, deletion or
1310 Z> find @attr 1=Body-of-text @attr 5=100 schnittke
1312 Number of hits: 81, setno 14
1314 Z> find @attr 1=Body-of-text @attr 5=103 schnittke
1316 Number of hits: 103, setno 15
1322 <sect3 id="querymodel-bib1-completeness">
1323 <title>Completeness Attributes (type = 6)</title>
1327 The <literal>Completeness Attributes (type = 6)</literal>
1328 is used to specify that a given search term or term list is either
1329 part of the terms of a given index/field
1330 (<literal>Incomplete subfield (1)</literal>), or is
1331 what literally is found in the entire field's index
1332 (<literal>Complete field (3)</literal>).
1335 <table id="querymodel-bib1-completeness-table"
1336 frame="all" rowsep="1" colsep="1" align="center">
1337 <caption>Completeness Attributes (type = 6)</caption>
1340 <td>Completeness</td>
1347 <td>Incomplete subfield</td>
1352 <td>Complete subfield</td>
1354 <td>depreciated</td>
1357 <td>Complete field</td>
1365 The <literal>Completeness Attributes (type = 6)</literal>
1366 is only partially and conditionally
1367 supported in the sense that it is ignored if the hit index is
1368 not of structure <literal>type="w"</literal> or
1369 <literal>type="p"</literal>.
1372 <literal>Incomplete subfield (1)</literal> is the default, and
1374 register <literal>type="w"</literal>, whereas
1375 <literal>Complete field (3)</literal> triggers
1376 search and scan in index <literal>type="p"</literal>.
1379 The <literal>Complete subfield (2)</literal> is a reminiscens
1380 from the happy <literal>MARC</literal>
1381 binary format days. Zebra does not support it, but maps silently
1382 to <literal>Complete field (3)</literal>.
1386 The exact mapping between PQF queries and Zebra internal indexes
1387 and index types is explained in
1388 <xref linkend="querymodel-pqf-apt-mapping"/>.
1396 <sect1 id="querymodel-zebra">
1397 <title>Advanced Zebra PQF Features</title>
1399 The Zebra internal query engine has been extended to specific needs
1400 not covered by the <literal>bib-1</literal> attribute set query
1401 model. These extensions are <emphasis>non-standard</emphasis>
1402 and <emphasis>non-portable</emphasis>: most functional extensions
1403 are modeled over the <literal>bib-1</literal> attribute set,
1404 defining type 7-9 attributes.
1405 There are also the special
1406 <literal>string</literal> type index names for the
1407 <literal>idxpath</literal> attribute set.
1410 <sect2 id="querymodel-zebra-attr-allrecords">
1411 <title>Zebra specific retrieval of all records</title>
1413 Zebra defines a hardwired <literal>string</literal> index name
1414 called <literal>_ALLRECORDS</literal>. It matches any record
1415 contained in the database, if used in conjunction with
1416 the relation attribute
1417 <literal>AlwaysMatches (103)</literal>.
1420 The <literal>_ALLRECORDS</literal> index name is used for total database
1421 export. The search term is ignored, it may be empty.
1423 Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
1427 Combination with other index types can be made. For example, to
1428 find all records which are <emphasis>not</emphasis> indexed in
1429 the <literal>Title</literal> register, issue one of the two
1432 Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
1433 Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
1437 The special string index <literal>_ALLRECORDS</literal> is
1438 experimental, and the provided functionality and syntax may very
1439 well change in future releases of Zebra.
1444 <sect2 id="querymodel-zebra-attr-search">
1445 <title>Zebra specific Search Extensions to all Attribute Sets</title>
1447 Zebra extends the Bib1 attribute types, and these extensions are
1448 recognized regardless of attribute
1449 set used in a <literal>search</literal> operation query.
1452 <table id="querymodel-zebra-attr-search-table"
1453 frame="all" rowsep="1" colsep="1" align="center">
1455 <caption>Zebra Search Attribute Extensions</caption>
1461 <td>Zebra version</td>
1466 <td>Embedded Sort</td>
1478 <td>Rank Weight</td>
1484 <td>Approx Limit</td>
1490 <td>Term Reference</td>
1498 <sect3 id="querymodel-zebra-attr-sorting">
1499 <title>Zebra Extension Embedded Sort Attribute (type 7)</title>
1502 The embedded sort is a way to specify sort within a query - thus
1503 removing the need to send a Sort Request separately. It is both
1504 faster and does not require clients to deal with the Sort
1509 All ordering operations are based on a lexicographical ordering,
1510 <emphasis>expect</emphasis> when the
1511 <literal>structure attribute numeric (109)</literal> is used. In
1512 this case, ordering is numerical. See
1513 <xref linkend="querymodel-bib1-structure"/>.
1517 The possible values after attribute <literal>type 7</literal> are
1518 <literal>1</literal> ascending and
1519 <literal>2</literal> descending.
1520 The attributes+term (APT) node is separate from the
1521 rest and must be <literal>@or</literal>'ed.
1522 The term associated with APT is the sorting level in integers,
1523 where <literal>0</literal> means primary sort,
1524 <literal>1</literal> means secondary sort, and so forth.
1525 See also <xref linkend="administration-ranking"/>.
1528 For example, searching for water, sort by title (ascending)
1530 Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1534 Or, searching for water, sort by title ascending, then date descending
1536 Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1540 <sect3 id="querymodel-zebra-attr-estimation">
1541 <title>Zebra Extension Term Set Attribute (type 8)</title>
1544 The Term Set feature is a facility that allows a search to store
1545 hitting terms in a "pseudo" resultset; thus a search (as usual) +
1546 a scan-like facility. Requires a client that can do named result
1547 sets since the search generates two result sets. The value for
1548 attribute 8 is the name of a result set (string). The terms in
1549 the named term set are returned as SUTRS records.
1552 For example, searching for u in title, right truncated, and
1553 storing the result in term set named 'aset'
1555 Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1559 The model has one serious flaw: we don't know the size of term
1560 set. Experimental. Do not use in production code.
1563 <sect3 id="querymodel-zebra-attr-weight">
1564 <title>Zebra Extension Rank Weight Attribute (type 9)</title>
1567 Rank weight is a way to pass a value to a ranking algorithm - so
1568 that one APT has one value - while another as a different one.
1569 See also <xref linkend="administration-ranking"/>.
1572 For example, searching for utah in title with weight 30 as well
1573 as any with weight 20:
1575 Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1579 <sect3 id="querymodel-zebra-attr-limit">
1580 <title>Zebra Extension Approximative Limit Attribute (type 9)</title>
1583 Newer Zebra versions normally estimate hit count for every APT
1584 (leaf) in the query tree. These hit counts are returned as part of
1585 the searchResult-1 facility in the binary encoded Z39.50 search
1589 By setting a limit for the APT we can make Zebra turn into
1590 approximate hit count when a certain hit count limit is
1591 reached. A value of zero means exact hit count.
1594 For example, we might be interested in exact hit count for a, but
1595 for b we allow hit count estimates for 1000 and higher.
1597 Z> find @and a @attr 9=1000 b
1601 The estimated hit count facility makes searches faster, as one
1602 only needs to process large hit lists partially.
1605 This facility clashes with rank weight, because there all
1606 documents in the hit lists need to be examined for scoring and
1608 It is an experimental
1609 extension. Do not use in production code.
1612 <sect3 id="querymodel-zebra-attr-termref">
1613 <title>Zebra Extension Term Reference Attribute (type 10)</title>
1616 Zebra supports the <literal>searchResult-1</literal> facility.
1617 If the <literal>Term Reference Attribute (type 10)</literal> is
1618 given, that specifies a subqueryId value returned as part of the
1619 search result. It is a way for a client to name an APT part of a
1629 Experimental. Do not use in production code.
1636 <sect2 id="querymodel-zebra-attr-scan">
1637 <title>Zebra specific Scan Extensions to all Attribute Sets</title>
1639 Zebra extends the Bib1 attribute types, and these extensions are
1640 recognized regardless of attribute
1641 set used in a <literal>scan</literal> operation query.
1643 <table id="querymodel-zebra-attr-scan-table"
1644 frame="all" rowsep="1" colsep="1" align="center">
1646 <caption>Zebra Scan Attribute Extensions</caption>
1652 <td>Zebra version</td>
1657 <td>Result Set Narrow</td>
1663 <td>Approximative Limit</td>
1671 <sect3 id="querymodel-zebra-attr-narrow">
1672 <title>Zebra Extension Result Set Narrow (type 8)</title>
1675 If attribute <literal>Result Set Narrow (type 8)</literal>
1676 is given for <literal>scan</literal>, the value is the name of a
1677 result set. Each hit count in <literal>scan</literal> is
1678 <literal>@and</literal>'ed with the result set given.
1681 Consider for example
1682 the case of scanning all title fields around the
1683 scanterm <emphasis>mozart</emphasis>, then refining the scan by
1684 issuing a filtering query for <emphasis>amadeus</emphasis> to
1685 restrict the scan to the result set of the query:
1687 Z> scan @attr 1=4 mozart
1690 mozartforskningen (1)
1694 Z> f @attr 1=4 amadeus
1696 Number of hits: 15, setno 2
1698 Z> scan @attr 1=4 @attr 8=2 mozart
1701 mozartforskningen (0)
1709 Experimental. Do not use in production code.
1712 <sect3 id="querymodel-zebra-attr-approx">
1713 <title>Zebra Extension Approximative Limit (type 9)</title>
1716 The <literal>Zebra Extension Approximative Limit (type
1717 9)</literal> is a way to enable approximate
1718 hit counts for <literal>scan</literal> hit counts, in the same
1719 way as for <literal>search</literal> hit counts.
1728 Experimental and buggy. Definitely not to be used in production code.
1735 <sect2 id="querymodel-idxpath">
1736 <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
1738 The attribute-set <literal>idxpath</literal> consists of a single
1739 <literal>Use (type 1)</literal> attribute. All non-use attributes
1743 This feature is enabled when defining the
1744 <literal>xpath enable</literal> option in the GRS filter
1745 <filename>*.abs</filename> configuration files. If one wants to use
1746 the special <literal>idxpath</literal> numeric attribute set, the
1747 main Zebra configuration file <filename>zebra.cfg</filename>
1748 directive <literal>attset: idxpath.att</literal> must be enabled.
1750 <warning>The <literal>idxpath</literal> is depreciated, may not be
1751 supported in future Zebra versions, and should definitely
1752 not be used in production code.
1755 <sect3 id="querymodel-idxpath-use">
1756 <title>IDXPATH Use Attributes (type = 1)</title>
1758 This attribute set allows one to search GRS filter indexed
1759 records by XPATH like structured index names.
1762 <warning>The <literal>idxpath</literal> option defines hard-coded
1763 index names, which might clash with your own index names.
1766 <table id="querymodel-idxpath-use-table"
1767 frame="all" rowsep="1" colsep="1" align="center">
1769 <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
1774 <td>String Index</td>
1780 <td>XPATH Begin</td>
1782 <td>_XPATH_BEGIN</td>
1783 <td>depreciated</td>
1789 <td>depreciated</td>
1792 <td>XPATH CData</td>
1794 <td>_XPATH_CDATA</td>
1795 <td>depreciated</td>
1798 <td>XPATH Attribute Name</td>
1800 <td>_XPATH_ATTR_NAME</td>
1801 <td>depreciated</td>
1804 <td>XPATH Attribute CData</td>
1806 <td>_XPATH_ATTR_CDATA</td>
1807 <td>depreciated</td>
1814 See <filename>tab/idxpath.att</filename> for more information.
1817 Search for all documents starting with root element
1818 <literal>/root</literal> (either using the numeric or the string
1821 Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1822 Z> find @attr idxpath 1=1 @attr 4=3 root/
1823 Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1827 Search for all documents where specific nested XPATH
1828 <literal>/c1/c2/../cn</literal> exists. Notice the very
1829 counter-intuitive <emphasis>reverse</emphasis> notation!
1831 Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1832 Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1836 Search for CDATA string <emphasis>text</emphasis> in any element
1838 Z> find @attrset idxpath @attr 1=1016 text
1839 Z> find @attr 1=_XPATH_CDATA text
1843 Search for CDATA string <emphasis>anothertext</emphasis> in any
1846 Z> find @attrset idxpath @attr 1=1015 anothertext
1847 Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1851 Search for all documents with have an XML element node
1852 including an XML attribute named <emphasis>creator</emphasis>
1854 Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1855 Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1859 Combining usual <literal>bib-1</literal> attribute set searches
1860 with <literal>idxpath</literal> attribute set searches:
1862 Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1863 Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1867 Scanning is supported on all <literal>idxpath</literal>
1868 indexes, both specified as numeric use attributes, or as string
1871 Z> scan @attrset idxpath @attr 1=1016 text
1872 Z> scan @attr 1=_XPATH_ATTR_CDATA anothertext
1873 Z> scan @attrset idxpath @attr 1=3 @attr 4=3 ''
1881 <sect2 id="querymodel-pqf-apt-mapping">
1882 <title>Mapping from PQF atomic APT queries to Zebra internal
1883 register indexes</title>
1885 The rules for PQF APT mapping are rather tricky to grasp in the
1886 first place. We deal first with the rules for deciding which
1887 internal register or string index to use, according to the use
1888 attribute or access point specified in the query. Thereafter we
1889 deal with the rules for determining the correct structure type of
1893 <sect3 id="querymodel-pqf-apt-mapping-accesspoint">
1894 <title>Mapping of PQF APT access points</title>
1896 Zebra understands four fundamental different types of access
1897 points, of which only the
1898 <emphasis>numeric use attribute</emphasis> type access points
1899 are defined by the <ulink url="&url.z39.50;">Z39.50</ulink>
1901 All other access point types are Zebra specific, and non-portable.
1904 <table id="querymodel-zebra-mapping-accesspoint-types"
1905 frame="all" rowsep="1" colsep="1" align="center">
1907 <caption>Access point name mapping</caption>
1910 <td>Access Point</td>
1918 <td>Use attribute</td>
1920 <td>[1-9][1-9]*</td>
1921 <td>directly mapped to string index name</td>
1924 <td>String index name</td>
1926 <td>[a-zA-Z](\-?[a-zA-Z0-9])*</td>
1927 <td>normalized name is used as internal string index name</td>
1930 <td>Zebra internal index name</td>
1932 <td>_[a-zA-Z](_?[a-zA-Z0-9])*</td>
1933 <td>hardwired internal string index name</td>
1936 <td>XPATH special index</td>
1939 <td>special xpath search for GRS indexed records</td>
1945 <literal>Attribute set names</literal> and
1946 <literal>string index names</literal> are normalizes
1947 according to the following rules: all <emphasis>single</emphasis>
1948 hyphens <literal>'-'</literal> are stripped, and all upper case
1949 letters are folded to lower case.
1953 <emphasis>Numeric use attributes</emphasis> are mapped
1954 to the Zebra internal
1955 string index according to the attribute set definition in use.
1956 The default attribute set is <literal>Bib-1</literal>, and may be
1957 omitted in the PQF query.
1961 According to normalization and numeric
1962 use attribute mapping, it follows that the following
1963 PQF queries are considered equivalent (assuming the default
1964 configuration has not been altered):
1966 Z> find @attr 1=Body-of-text serenade
1967 Z> find @attr 1=bodyoftext serenade
1968 Z> find @attr 1=BodyOfText serenade
1969 Z> find @attr 1=bO-d-Y-of-tE-x-t serenade
1970 Z> find @attr 1=1010 serenade
1971 Z> find @attrset Bib-1 @attr 1=1010 serenade
1972 Z> find @attrset bib1 @attr 1=1010 serenade
1973 Z> find @attrset Bib1 @attr 1=1010 serenade
1974 Z> find @attrset b-I-b-1 @attr 1=1010 serenade
1979 The <emphasis>numerical</emphasis>
1980 <literal>use attributes (type 1)</literal>
1981 are interpreted according to the
1982 attribute sets which have been loaded in the
1983 <literal>zebra.cfg</literal> file, and are matched against specific
1984 fields as specified in the <literal>.abs</literal> file which
1985 describes the profile of the records which have been loaded.
1986 If no use attribute is provided, a default of
1987 <literal>Bib-1 Use Any (1016)</literal> is
1989 The predefined <literal>use attribute sets</literal>
1990 can be reconfigured by tweaking the configuration files
1991 <filename>tab/*.att</filename>, and
1992 new attribute sets can be defined by adding similar files in the
1993 configuration path <literal>profilePath</literal> of the server.
1997 <literal>String indexes</literal> can be accessed directly,
1998 independently which attribute set is in use. These are just
1999 ignored. The above mentioned name normalization applies.
2000 <literal>String index names</literal> are defined in the
2001 used indexing filter configuration files, for example in the
2002 <literal>GRS</literal>
2003 <filename>*.abs</filename> configuration files, or in the
2004 <literal>alvis</literal> filter XSLT indexing stylesheets.
2008 <literal>Zebra internal indexes</literal> can be accessed directly,
2009 according to the same rules as the user defined
2010 <literal>string indexes</literal>. The only difference is that
2011 <literal>Zebra internal index names</literal> are hardwired,
2013 must start with the character <literal>'_'</literal>.
2017 Finally, <literal>XPATH</literal> access points are only
2018 available using the <literal>GRS</literal> filter for indexing.
2019 These access point names must start with the character
2020 <literal>'/'</literal>, they are <emphasis>not
2021 normalized</emphasis>, but passed unaltered to the Zebra internal
2022 XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
2030 <sect3 id="querymodel-pqf-apt-mapping-structuretype">
2031 <title>Mapping of PQF APT structure and completeness to
2032 register type</title>
2034 Internally Zebra has in it's default configuration several
2035 different types of registers or indexes, whose tokenization and
2036 character normalization rules differ. This reflects the fact that
2037 searching fundamental different tokens like dates, numbers,
2038 bitfields and string based text needs different rule sets.
2041 <table id="querymodel-zebra-mapping-structure-types"
2042 frame="all" rowsep="1" colsep="1" align="center">
2044 <caption>Structure and completeness mapping to register types</caption>
2048 <td>Completeness</td>
2049 <td>Register type</td>
2056 phrase (@attr 4=1), word (@attr 4=2),
2057 word-list (@attr 4=6),
2058 free-form-text (@attr 4=105), or document-text (@attr 4=106)
2060 <td>Incomplete field (@attr 6=1)</td>
2062 <td>Traditional tokenized and character normalized word index</td>
2066 phrase (@attr 4=1), word (@attr 4=2),
2067 word-list (@attr 4=6),
2068 free-form-text (@attr 4=105), or document-text (@attr 4=106)
2070 <td>complete field' (@attr 6=3)</td>
2071 <td>Phrase ('p')</td>
2072 <td>Character normalized, but not tokenized index for phrase
2077 <td>urx (@attr 4=104)</td>
2079 <td>URX/URL ('u')</td>
2080 <td>Special index for URL web addresses</td>
2083 <td>numeric (@attr 4=109)</td>
2085 <td>Numeric ('u')</td>
2086 <td>Special index for digital numbers</td>
2089 <td>key (@attr 4=3)</td>
2091 <td>Null bitmap ('0')</td>
2092 <td>Used for non-tokenizated and non-normalized bit sequences</td>
2095 <td>year (@attr 4=4)</td>
2098 <td>Non-tokenizated and non-normalized 4 digit numbers</td>
2101 <td>date (@attr 4=5)</td>
2104 <td>Non-tokenizated and non-normalized ISO date strings</td>
2110 <td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td>
2116 <td>Internal record ID register, used whenever
2117 Relation Always Matches (@attr 2=103) is specified</td>
2122 <!-- see in util/zebramap.c -->
2125 If a <emphasis>Structure</emphasis> attribute of
2126 <emphasis>Phrase</emphasis> is used in conjunction with a
2127 <emphasis>Completeness</emphasis> attribute of
2128 <emphasis>Complete (Sub)field</emphasis>, the term is matched
2129 against the contents of the phrase (long word) register, if one
2130 exists for the given <emphasis>Use</emphasis> attribute.
2131 A phrase register is created for those fields in the
2132 GRS <filename>*.abs</filename> file that contains a
2133 <literal>p</literal>-specifier.
2135 Z> scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
2137 bayreuther festspiele (1)
2138 * beethoven bibliography database (1)
2141 Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
2143 Number of hits: 0, setno 5
2145 Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
2147 Number of hits: 1, setno 6
2152 If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
2153 used in conjunction with <emphasis>Incomplete Field</emphasis> - the
2154 default value for <emphasis>Completeness</emphasis>, the
2155 search is directed against the normal word registers, but if the term
2156 contains multiple words, the term will only match if all of the words
2157 are found immediately adjacent, and in the given order.
2158 The word search is performed on those fields that are indexed as
2159 type <literal>w</literal> in the GRS <filename>*.abs</filename> file.
2161 Z> scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2167 Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2169 Number of hits: 18, setno 1
2171 Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven bibliography"
2173 Number of hits: 2, setno 2
2179 If the <emphasis>Structure</emphasis> attribute is
2180 <emphasis>Word List</emphasis>,
2181 <emphasis>Free-form Text</emphasis>, or
2182 <emphasis>Document Text</emphasis>, the term is treated as a
2183 natural-language, relevance-ranked query.
2184 This search type uses the word register, i.e. those fields
2185 that are indexed as type <literal>w</literal> in the
2186 GRS <filename>*.abs</filename> file.
2190 If the <emphasis>Structure</emphasis> attribute is
2191 <emphasis>Numeric String</emphasis> the term is treated as an integer.
2192 The search is performed on those fields that are indexed
2193 as type <literal>n</literal> in the GRS
2194 <filename>*.abs</filename> file.
2198 If the <emphasis>Structure</emphasis> attribute is
2199 <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
2200 The search is performed on those fields that are indexed as type
2201 <literal>u</literal> in the <filename>*.abs</filename> file.
2205 If the <emphasis>Structure</emphasis> attribute is
2206 <emphasis>Local Number</emphasis> the term is treated as
2207 native Zebra Record Identifier.
2211 If the <emphasis>Relation</emphasis> attribute is
2212 <emphasis>Equals</emphasis> (default), the term is matched
2213 in a normal fashion (modulo truncation and processing of
2214 individual words, if required).
2215 If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
2216 <emphasis>Less Than or Equal</emphasis>,
2217 <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
2218 Equal</emphasis>, the term is assumed to be numerical, and a
2219 standard regular expression is constructed to match the given
2221 If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
2222 the standard natural-language query processor is invoked.
2226 For the <emphasis>Truncation</emphasis> attribute,
2227 <emphasis>No Truncation</emphasis> is the default.
2228 <emphasis>Left Truncation</emphasis> is not supported.
2229 <emphasis>Process # in search term</emphasis> is supported, as is
2230 <emphasis>Regxp-1</emphasis>.
2231 <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
2232 search. As a default, a single error (deletion, insertion,
2233 replacement) is accepted when terms are matched against the register
2240 <sect2 id="querymodel-regular">
2241 <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
2244 Each term in a query is interpreted as a regular expression if
2245 the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
2246 or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
2247 Both query types follow the same syntax with the operands:
2250 <table id="querymodel-regular-operands-table"
2251 frame="all" rowsep="1" colsep="1" align="center">
2253 <caption>Regular Expression Operands</caption>
2256 <tr><td>one</td><td>two</td></tr>
2261 <td><literal>x</literal></td>
2262 <td>Matches the character <literal>x</literal>.</td>
2265 <td><literal>.</literal></td>
2266 <td>Matches any character.</td>
2269 <td><literal>[ .. ]</literal></td>
2270 <td>Matches the set of characters specified;
2271 such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
2277 The above operands can be combined with the following operators:
2280 <table id="querymodel-regular-operators-table"
2281 frame="all" rowsep="1" colsep="1" align="center">
2282 <caption>Regular Expression Operators</caption>
2285 <tr><td>one</td><td>two</td></tr>
2290 <td><literal>x*</literal></td>
2291 <td>Matches <literal>x</literal> zero or more times.
2292 Priority: high.</td>
2295 <td><literal>x+</literal></td>
2296 <td>Matches <literal>x</literal> one or more times.
2297 Priority: high.</td>
2300 <td><literal>x?</literal></td>
2301 <td> Matches <literal>x</literal> zero or once.
2302 Priority: high.</td>
2305 <td><literal>xy</literal></td>
2306 <td> Matches <literal>x</literal>, then <literal>y</literal>.
2307 Priority: medium.</td>
2310 <td><literal>x|y</literal></td>
2311 <td> Matches either <literal>x</literal> or <literal>y</literal>.
2315 <td><literal>( )</literal></td>
2316 <td>The order of evaluation may be changed by using parentheses.</td>
2322 If the first character of the <literal>Regxp-2</literal> query
2323 is a plus character (<literal>+</literal>) it marks the
2324 beginning of a section with non-standard specifiers.
2325 The next plus character marks the end of the section.
2326 Currently Zebra only supports one specifier, the error tolerance,
2327 which consists one digit.
2331 Since the plus operator is normally a suffix operator the addition to
2332 the query syntax doesn't violate the syntax for standard regular
2337 For example, a phrase search with regular expressions in
2338 the title-register is performed like this:
2340 Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
2345 Combinations with other attributes are possible. For example, a
2346 ranked search with a regular expression:
2348 Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
2356 The RecordType parameter in the <literal>zebra.cfg</literal> file, or
2357 the <literal>-t</literal> option to the indexer tells Zebra how to
2358 process input records.
2359 Two basic types of processing are available - raw text and structured
2360 data. Raw text is just that, and it is selected by providing the
2361 argument <literal>text</literal> to Zebra. Structured records are
2362 all handled internally using the basic mechanisms described in the
2363 subsequent sections.
2364 Zebra can read structured records in many different formats.
2370 <sect1 id="querymodel-cql-to-pqf">
2371 <title>Server Side CQL to PQF Query Translation</title>
2374 <literal><cql2rpn>l2rpn.txt</cql2rpn></literal>
2375 YAZ Frontend Virtual
2376 Hosts option, one can configure
2377 the YAZ Frontend CQL-to-PQF
2378 converter, specifying the interpretation of various
2379 <ulink url="&url.cql;">CQL</ulink>
2380 indexes, relations, etc. in terms of Type-1 query attributes.
2381 <!-- The yaz-client config file -->
2384 For example, using server-side CQL-to-PQF conversion, one might
2385 query a zebra server like this:
2388 yaz-client localhost:9999
2390 Z> find text=(plant and soil)
2393 and - if properly configured - even static relevance ranking can
2394 be performed using CQL query syntax:
2397 Z> find text = /relevant (plant and soil)
2403 By the way, the same configuration can be used to
2404 search using client-side CQL-to-PQF conversion:
2405 (the only difference is <literal>querytype cql2rpn</literal>
2407 <literal>querytype cql</literal>, and the call specifying a local
2411 yaz-client -q local/cql2pqf.txt localhost:9999
2412 Z> querytype cql2rpn
2413 Z> find text=(plant and soil)
2419 Exhaustive information can be found in the
2420 Section "Specification of CQL to RPN mappings" in the YAZ manual.
2421 <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
2422 http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
2423 and shall therefore not be repeated here.
2428 <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
2429 http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
2430 for the Maintenance Agency's work-in-progress mapping of Dublin Core
2431 indexes to Attribute Architecture (util, XD and BIB-2)
2441 <!-- Keep this comment at the end of the file
2446 sgml-minimize-attributes:nil
2447 sgml-always-quote-attributes:t
2450 sgml-parent-document: "zebra.xml"
2451 sgml-local-catalogs: nil
2452 sgml-namecase-general:t