doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <!-- $Id: querymodel.xml,v 1.18 2006-06-29 16:02:12 heikki Exp $ -->
   3   <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8    <sect2 id="querymodel-query-languages">
   9     <title>Query Languages</title>
  10
  11     <para>
  12      Zebra is born as a networking Information Retrieval engine adhering
  13      to the international standards
  14      <ulink url="&url.z39.50;">Z39.50</ulink> and
  15      <ulink url="&url.sru;">SRU</ulink>,
  16      and implement the
  17      <literal>type-1 Reverse Polish Notation (RPN)</literal> query
  18      model defined there.
  19      Unfortunately, this model has only defined a binary
  20      encoded representation, which is used as transport packaging in
  21      the Z39.50 protocol layer. This representation is not human
  22      readable, nor defines any convenient way to specify queries.
  23     </para>
  24     <para>
  25      Since the <literal>type-1 (RPN)</literal>
  26      query structure has no direct, useful string
  27      representation, every origin application needs to provide some
  28      form of mapping from a local query notation or representation to it.
  29     </para>
  30
  31
  32     <sect3 id="querymodel-query-languages-pqf">
  33      <title>Prefix Query Format (PQF)</title>
  34      <para>
  35       Index Data has defined a textual representation in the
  36       <ulink url="&url.yaz.pqf;">Prefix Query Format</ulink>, short
  37       <emphasis>PQF</emphasis>, which maps
  38       one-to-one to binary encoded
  39       <emphasis>type-1 RPN</emphasis> queries.
  40       PQF has been adopted by other
  41       parties developing Z39.50 software, and is often referred to as
  42       <literal>Prefix Query Notation</literal>, or in short
  43       <literal>PQN</literal>. See
  44       <xref linkend="querymodel-pqf"/> for further explanations and
  45       descriptions of Zebra's capabilities.
  46      </para>
  47     </sect3>
  48
  49     <sect3 id="querymodel-query-languages-cql">
  50      <title>Common Query Language (CQL)</title>
  51      <para>
  52       The query model of the type-1 RPN,
  53       expressed in PQF/PQN is natively supported.
  54       On the other hand, the default SRU
  55       web services <emphasis>Common Query Language</emphasis>
  56       <ulink url="&url.cql;">CQL</ulink> is not natively supported.
  57      </para>
  58      <para>
  59       Zebra can be configured to understand and map CQL to PQF. See
  60       <xref linkend="querymodel-cql-to-pqf"/>.
  61      </para>
  62     </sect3>
  63
  64    </sect2>
  65
  66    <sect2 id="querymodel-operation-types">
  67     <title>Operation types</title>
  68     <para>
  69      Zebra supports all of the three different
  70      <literal>Z39.50/SRU</literal> operations defined in the
  71      standards: <literal>explain</literal>, <literal>search</literal>,
  72      and <literal>scan</literal>. A short description of the
  73      functionality and purpose of each is quite in order here.
  74     </para>
  75
  76     <sect3 id="querymodel-operation-type-explain">
  77      <title>Explain Operation</title>
  78      <para>
  79       The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
  80       well known to any client, but the specific
  81       <emphasis>semantics</emphasis> - taking into account a
  82       particular servers functionalities and abilities - must be
  83       discovered from case to case. Enters the
  84       <literal>explain</literal> operation, which provides the means
  85       for learning which
  86       <emphasis>fields</emphasis> (also called
  87       <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>)
  88       are provided, which default parameter the server uses, which
  89       retrieve document formats are defined, and which specific parts
  90       of the general query model are supported.
  91      </para>
  92      <para>
  93       The Z39.50 embeds the <literal>explain</literal> operation
  94       by performing a
  95       <literal>search</literal> in the magic
  96       <literal>IR-Explain-1</literal> database;
  97       see <xref linkend="querymodel-exp1"/>.
  98      </para>
  99      <para>
 100       In SRU, <literal>explain</literal> is an entirely  separate
 101       operation, which returns an  <literal>ZeeRex
 102       XML</literal> record according to the
 103       structure defined by the protocol.
 104      </para>
 105      <para>
 106       In both cases, the information gathered through
 107       <literal>explain</literal> operations can be used to
 108       auto-configure a client user interface to the servers
 109       capabilities.
 110      </para>
 111     </sect3>
 112
 113     <sect3 id="querymodel-operation-type-search">
 114      <title>Search Operation</title>
 115      <para>
 116       Search and retrieve interactions are the raison d'être.
 117       They are used to query the remote database and
 118       return search result documents.  Search queries span from
 119       simple free text searches to nested complex boolean queries,
 120       targeting specific indexes, and possibly enhanced with many
 121       query semantic specifications. Search interactions are the heart
 122       and soul of Z39.50/SRU servers.
 123      </para>
 124     </sect3>
 125
 126     <sect3 id="querymodel-operation-type-scan">
 127      <title>Scan Operation</title>
 128      <para>
 129       The <literal>scan</literal> operation is a helper functionality,
 130        which operates on one index or access point a time.
 131      </para>
 132      <para>
 133       It provides
 134       the means to investigate the content of specific indexes.
 135       Scanning an index returns a handful of terms actually found in
 136       the indexes, and in addition the <literal>scan</literal>
 137       operation returns the number of documents indexed by each term.
 138       A search client can use this information to propose proper
 139       spelling of search terms, to auto-fill search boxes, or to
 140       display  controlled vocabularies.
 141      </para>
 142     </sect3>
 143
 144    </sect2>
 145
 146  </sect1>
 147
 148
 149   <sect1 id="querymodel-pqf">
 150    <title>Prefix Query Format syntax and semantics</title>
 151    <para>
 152     The <ulink url="&url.yaz.pqf;">PQF grammar</ulink>
 153     is documented in the YAZ manual, and shall not be
 154     repeated here. This textual PQF representation
 155     is always during search mapped to the equivalent Zebra internal
 156     query parse tree.
 157    </para>
 158
 159    <sect2 id="querymodel-pqf-tree">
 160     <title>PQF tree structure</title>
 161     <para>
 162      The PQF parse tree - or the equivalent textual representation -
 163      may start with one specification of the
 164      <emphasis>attribute set</emphasis> used. Following is a query
 165      tree, which
 166      consists of <emphasis>atomic query parts (APT)</emphasis> or
 167      <emphasis>named result sets</emphasis>, eventually
 168      paired by <emphasis>boolean binary operators</emphasis>, and
 169      finally  <emphasis>recursively combined </emphasis> into
 170      complex query trees.
 171     </para>
 172
 173     <sect3 id="querymodel-attribute-sets">
 174      <title>Attribute sets</title>
 175      <para>
 176       Attribute sets define the exact meaning and semantics of queries
 177       issued. Zebra comes with some predefined attribute set
 178       definitions, others can easily be defined and added to the
 179       configuration.
 180      </para>
 181
 182
 183      <table id="querymodel-attribute-sets-table"
 184       frame="all" rowsep="1" colsep="1" align="center">
 185
 186       <caption>Attribute sets predefined in Zebra</caption>
 187
 188        <thead>
 189        <tr>
 190          <td>Attribute set</td>
 191          <td>Short hand</td>
 192          <td>Status</td>
 193          <td>Notes</td>
 194         </tr>
 195       </thead>
 196
 197        <tbody>
 198         <tr>
 199          <td><literal>Explain</literal></td>
 200          <td><literal>exp-1</literal></td>
 201          <td>Special attribute set used on the special automagic
 202           <literal>IR-Explain-1</literal> database to gain information on
 203           server capabilities, database names, and database
 204           and semantics.</td>
 205          <td>predefined</td>
 206         </tr>
 207         <tr>
 208          <td><literal>Bib1</literal></td>
 209          <td><literal>bib-1</literal></td>
 210          <td>Standard PQF query language attribute set which defines the
 211           semantics of Z39.50 searching. In addition, all of the
 212           non-use attributes (type 2-9) define the hard-wired
 213           Zebra internal query
 214           processing.</td>
 215          <td>default</td>
 216         </tr>
 217         <tr>
 218          <td><literal>GILS</literal></td>
 219          <td><literal>gils</literal></td>
 220          <td>Extension to the <literal>Bib1</literal> attribute set.</td>
 221          <td>predefined</td>
 222         </tr>
 223         <!--
 224         <tr>
 225          <td><literal>IDXPATH</literal></td>
 226          <td><literal>idxpath</literal></td>
 227          <td>Hardwired XPATH like attribute set, only available for
 228              indexing with the GRS record model</td>
 229          <td>depreciated</td>
 230         </tr>
 231         -->
 232        </tbody>
 233      </table>
 234     </sect3>
 235
 236     <para>
 237      The <literal>use attributes (type 1)</literal> mappings  the
 238      predefined attribute sets are found in the
 239      attribute set configuration files <filename>tab/*.att</filename>.
 240     </para>
 241
 242     <note>
 243      The Zebra internal query processing is modeled after
 244      the <literal>Bib1</literal> attribute set, and the non-use
 245      attributes type 2-6 are hard-wired in. It is therefore essential
 246      to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
 247     </note>
 248
 249
 250     <sect3 id="querymodel-boolean-operators">
 251      <title>Boolean operators</title>
 252      <para>
 253       A pair of sub query trees, or of atomic queries, is combined
 254       using the standard boolean operators into new query trees.
 255       Thus, boolean operators are always internal nodes in the query tree.
 256      </para>
 257
 258      <table id="querymodel-boolean-operators-table"
 259       frame="all" rowsep="1" colsep="1" align="center">
 260
 261       <caption>Boolean operators</caption>
 262        <thead>
 263         <tr>
 264          <td>Keyword</td>
 265          <td>Operator</td>
 266          <td>Description</td>
 267         </tr>
 268       </thead>
 269        <tbody>
 270         <tr><td><literal>@and</literal></td>
 271          <td>binary <literal>AND</literal> operator</td>
 272          <td>Set intersection of two atomic queries hit sets</td>
 273         </tr>
 274         <tr><td><literal>@or</literal></td>
 275          <td>binary <literal>OR</literal> operator</td>
 276          <td>Set union of two atomic queries hit sets</td>
 277         </tr>
 278         <tr><td><literal>@not</literal></td>
 279          <td>binary <literal>AND NOT</literal> operator</td>
 280          <td>Set complement of two atomic queries hit sets</td>
 281         </tr>
 282         <tr><td><literal>@prox</literal></td>
 283          <td>binary <literal>PROXIMITY</literal> operator</td>
 284          <td>Set intersection of two atomic queries hit sets. In
 285           addition, the intersection set is purged for all
 286           documents which do not satisfy the requested query
 287           term proximity. Usually a proper subset of the AND
 288           operation.</td>
 289         </tr>
 290        </tbody>
 291      </table>
 292
 293      <para>
 294       For example, we can combine the terms
 295       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 296       into different searches in the default index of the default
 297       attribute set as follows.
 298       Querying for the union of all documents containing the
 299       terms <emphasis>information</emphasis> OR
 300       <emphasis>retrieval</emphasis>:
 301       <screen>
 302        Z> find @or information retrieval
 303       </screen>
 304      </para>
 305      <para>
 306       Querying for the intersection of all documents containing the
 307       terms <emphasis>information</emphasis> AND
 308       <emphasis>retrieval</emphasis>:
 309       The hit set is a subset of the corresponding
 310       OR query.
 311       <screen>
 312        Z> find @and information retrieval
 313       </screen>
 314      </para>
 315      <para>
 316       Querying for the intersection of all documents containing the
 317       terms <emphasis>information</emphasis> AND
 318       <emphasis>retrieval</emphasis>, taking proximity into account:
 319       The hit set is a subset of the corresponding
 320       AND query
 321       (see the <ulink url="&url.yaz.pqf;">PQF grammar</ulink> for
 322       details on the proximity operator):
 323       <screen>
 324        Z> find @prox 0 3 0 2 k 2 information retrieval
 325       </screen>
 326      </para>
 327      <para>
 328       Querying for the intersection of all documents containing the
 329       terms <emphasis>information</emphasis> AND
 330       <emphasis>retrieval</emphasis>, in the same order and near each
 331       other as described in the term list.
 332       The hit set is a subset of the corresponding
 333       PROXIMITY query.
 334       <screen>
 335        Z> find "information retrieval"
 336       </screen>
 337      </para>
 338     </sect3>
 339
 340
 341     <sect3 id="querymodel-atomic-queries">
 342      <title>Atomic queries (APT)</title>
 343      <para>
 344       Atomic queries are the query parts which work on one access point
 345       only. These consist of <literal>an attribute list</literal>
 346       followed by a <literal>single term</literal> or a
 347       <literal>quoted term list</literal>, and are often called
 348       <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
 349      </para>
 350      <para>
 351       Atomic (APT) queries are always leaf nodes in the PQF query tree.
 352       UN-supplied non-use attributes type 2-9 are either inherited from
 353       higher nodes in the query tree, or are set to Zebra's default values.
 354       See <xref linkend="querymodel-bib1"/> for details.
 355      </para>
 356
 357      <table id="querymodel-atomic-queries-table"
 358       frame="all" rowsep="1" colsep="1" align="center">
 359
 360       <caption>Atomic queries (APT)</caption>
 361        <thead>
 362         <tr>
 363          <td>Name</td>
 364          <td>Type</td>
 365          <td>Notes</td>
 366         </tr>
 367       </thead>
 368        <tbody>
 369         <tr>
 370          <td><emphasis>attribute list</emphasis></td>
 371          <td>List of <literal>orthogonal</literal> attributes</td>
 372          <td>Any of the orthogonal attribute types may be omitted,
 373           these are inherited from higher query tree nodes, or if not
 374           inherited, are set to the default Zebra configuration values.
 375          </td>
 376         </tr>
 377         <tr>
 378          <td><emphasis>term</emphasis></td>
 379          <td>single <literal>term</literal>
 380           or <literal>quoted term list</literal>   </td>
 381          <td>Here the search terms or list of search terms is added
 382           to the query</td>
 383         </tr>
 384        </tbody>
 385      </table>
 386      <para>
 387       Querying for the term <emphasis>information</emphasis> in the
 388       default index using the default attribute set, the server choice
 389       of access point/index, and the default non-use attributes.
 390       <screen>
 391        Z> find information
 392       </screen>
 393      </para>
 394      <para>
 395       Equivalent query fully specified including all default values:
 396       <screen>
 397        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
 398       </screen>
 399      </para>
 400
 401      <para>
 402       Finding all documents which have the term
 403       <emphasis>debussy</emphasis> in the title field.
 404       <screen>
 405        Z> find @attr 1=4 debussy
 406       </screen>
 407      </para>
 408
 409      <para>
 410       The <literal>scan</literal> operation is only supported with
 411       atomic APT queries, as it is bound to one access point at a
 412       time. Boolean query trees are not allowed during
 413       <literal>scan</literal>.
 414       </para>
 415
 416      <para>
 417       For example, we might want to scan the title index, starting with
 418       the term
 419       <emphasis>debussy</emphasis>, and displaying this and the
 420       following terms in lexicographic order:
 421       <screen>
 422        Z> scan @attr 1=4 debussy
 423       </screen>
 424      </para>
 425     </sect3>
 426
 427
 428     <sect3 id="querymodel-resultset">
 429      <title>Named Result Sets</title>
 430      <para>
 431       Named result sets are supported in Zebra, and result sets can be
 432       used as operands without limitations. It follows that named
 433       result sets are leaf nodes in the PQF query tree, exactly as
 434       atomic APT queries are.
 435      </para>
 436      <para>
 437       After the execution of a search, the result set is available at
 438       the server, such that the client can use it for subsequent
 439       searches or retrieval requests. The Z30.50 standard actually
 440       stresses the fact that result sets are volatile. It may cease
 441       to exist at any time point after search, and the server will
 442       send a diagnostic to the effect that the requested
 443       result set does not exist any more.
 444      </para>
 445
 446      <para>
 447       Defining a named result set and re-using it in the next query,
 448       using <literal>yaz-client</literal>.
 449       <screen>
 450        Z> f @attr 1=4 mozart
 451        ...
 452        Number of hits: 43, setno 1
 453        ...
 454        Z> f @and @set 1 @attr 1=4 amadeus
 455        ...
 456        Number of hits: 14, setno 2
 457        ...
 458        Z> f @attr 1=1016 beethoven
 459        ...
 460        Number of hits: 26, setno 3
 461        ...
 462       </screen>
 463      </para>
 464
 465      <note>
 466       Named result sets are only supported by the Z39.50 protocol.
 467       The SRU web service is stateless, and therefore the notion of
 468       named result sets does not exist when accessing a Zebra server by
 469       the SRU protocol.
 470      </note>
 471     </sect3>
 472
 473
 474     <sect3 id="querymodel-use-string">
 475      <title>Zebra's special access point of type 'string'</title>
 476      <para>
 477       The numeric <literal>use (type 1)</literal> attribute is usually
 478       referred to from a given
 479       attribute set. In addition, Zebra let you use
 480       <emphasis>any internal index
 481        name defined in your configuration</emphasis>
 482       as use attribute value. This is a great feature for
 483       debugging, and when you do
 484       not need the complexity of defined use attribute values. It is
 485       the preferred way of accessing Zebra indexes directly.
 486      </para>
 487      <para>
 488       Finding all documents which have the term list "information
 489       retrieval" in an Zebra index, using it's internal full string
 490       name. Scanning the same index.
 491       <screen>
 492        Z> find @attr 1=sometext "information retrieval"
 493        Z> scan @attr 1=sometext aterm
 494       </screen>
 495      </para>
 496      <para>
 497       Searching or scanning
 498       the bib-1 use attribute 54 using it's string name:
 499       <screen>
 500        Z> find @attr 1=Code-language eng
 501        Z> scan @attr 1=Code-language ""
 502       </screen>
 503      </para>
 504      <para>
 505       It is possible to search
 506       in any silly string index - if it's defined in your
 507       indexation rules and can be parsed by the PQF parser.
 508       This is definitely not the recommended use of
 509       this facility, as it might confuse your users with some very
 510       unexpected results.
 511       <screen>
 512        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 513       </screen>
 514      </para>
 515      <para>
 516       See also <xref linkend="querymodel-pqf-apt-mapping"/> for details, and
 517       <xref linkend="server-sru"/>
 518       for the SRU PQF query extension using string names as a fast
 519       debugging facility.
 520      </para>
 521     </sect3>
 522
 523     <sect3 id="querymodel-use-xpath">
 524      <title>Zebra's special access point of type 'XPath'
 525       for GRS filters</title>
 526      <para>
 527       As we have seen above, it is possible (albeit seldom a great
 528       idea) to emulate
 529       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 530       search by defining <literal>use (type 1)</literal>
 531       <emphasis>string</emphasis> attributes which in appearance
 532       <emphasis>resemble XPath queries</emphasis>. There are two
 533       problems with this approach: first, the XPath-look-alike has to
 534       be defined at indexation time, no new undefined
 535       XPath queries can entered at search time, and second, it might
 536       confuse users very much that an XPath-alike index name in fact
 537       gets populated from a possible entirely different XML element
 538       than it pretends to access.
 539      </para>
 540      <para>
 541       When using the <literal>GRS Record Model</literal>
 542       (see  <xref linkend="record-model-grs"/>), we have the
 543       possibility to embed <emphasis>life</emphasis>
 544       XPath expressions
 545       in the PQF queries, which are here called
 546       <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
 547       attributes. You must enable the
 548       <literal>xpath enable</literal> directive in your
 549       <literal>.abs</literal> configuration files.
 550      </para>
 551      <note>
 552       Only a <emphasis>very</emphasis> restricted subset of the
 553       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 554       standard is supported as the GRS record model is simpler than
 555       a full XML DOM structure. See the following examples for
 556       possibilities.
 557      </note>
 558      <para>
 559       Finding all documents which have the term "content"
 560       inside a text node found in a specific XML DOM
 561       <emphasis>subtree</emphasis>, whose starting element is
 562       addressed by XPath.
 563       <screen>
 564        Z> find @attr 1=/root content
 565        Z> find @attr 1=/root/first content
 566       </screen>
 567       <emphasis>Notice that the
 568        XPath must be absolute, i.e., must start with '/', and that the
 569        XPath <literal>descendant-or-self</literal> axis followed by a
 570        text node selection <literal>text()</literal> is implicitly
 571        appended to the stated XPath.
 572       </emphasis>
 573       It follows that the above searches are interpreted as:
 574       <screen>
 575        Z> find @attr 1=/root//text() content
 576        Z> find @attr 1=/root/first//text() content
 577       </screen>
 578      </para>
 579
 580      <para>
 581       Searching inside attribute strings is possible:
 582       <screen>
 583        Z> find @attr 1=/link/@creator morten
 584       </screen>
 585       </para>
 586
 587      <para>
 588       Filter the addressing XPath by a predicate working on exact
 589       string values in
 590       attributes (in the XML sense) can be done: return all those docs which
 591       have the term "english" contained in one of all text sub nodes of
 592       the subtree defined by the XPath
 593       <literal>/record/title[@lang='en']</literal>. And similar
 594       predicate filtering.
 595       <screen>
 596        Z> find @attr 1=/record/title[@lang='en'] english
 597        Z> find @attr 1=/link[@creator='sisse'] sibelius
 598        Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
 599       </screen>
 600      </para>
 601
 602      <para>
 603       Combining numeric indexes, boolean expressions,
 604       and xpath based searches is possible:
 605       <screen>
 606        Z> find @attr 1=/record/title @and foo bar
 607        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 608       </screen>
 609      </para>
 610      <para>
 611       Escaping PQF keywords and other non-parseable XPath constructs
 612       with <literal>'{ }'</literal> to prevent syntax errors:
 613       <screen>
 614        Z> find @attr {1=/root/first[@attr='danish']} content
 615        Z> find @attr {1=/record/@set} oai
 616       </screen>
 617      </para>
 618      <warning>
 619       It is worth mentioning that these dynamic performed XPath
 620       queries are a performance bottleneck, as no optimized
 621       specialized indexes can be used. Therefore, avoid the use of
 622       this facility when speed is essential, and the database content
 623       size is medium to large.
 624      </warning>
 625
 626     </sect3>
 627
 628    </sect2>
 629
 630    <sect2 id="querymodel-exp1">
 631     <title>Explain Attribute Set</title>
 632     <para>
 633      The Z39.50 standard defines the
 634      <ulink url="&url.z39.50.explain;">Explain</ulink> attribute set
 635      <literal>Exp-1</literal>, which is used to discover information
 636      about a server's search semantics and functional capabilities
 637      Zebra exposes a  "classic"
 638      Explain database by base name <literal>IR-Explain-1</literal>, which
 639      is populated with system internal information.
 640     </para>
 641    <para>
 642      The attribute-set <literal>exp-1</literal> consists of a single
 643      <literal>use attribute (type 1)</literal>.
 644     </para>
 645     <para>
 646      In addition, the non-Use
 647      <literal>bib-1</literal> attributes, that is, the types
 648      <literal>Relation</literal>, <literal>Position</literal>,
 649      <literal>Structure</literal>, <literal>Truncation</literal>,
 650      and <literal>Completeness</literal> are imported from
 651      the <literal>bib-1</literal> attribute set, and may be used
 652      within any explain query.
 653     </para>
 654
 655     <sect3 id="querymodel-exp1-use">
 656     <title>Use Attributes (type = 1)</title>
 657      <para>
 658       The following Explain search attributes are supported:
 659       <literal>ExplainCategory</literal> (@attr 1=1),
 660       <literal>DatabaseName</literal> (@attr 1=3),
 661       <literal>DateAdded</literal> (@attr 1=9),
 662       <literal>DateChanged</literal>(@attr 1=10).
 663      </para>
 664      <para>
 665       A search in the use attribute  <literal>ExplainCategory</literal>
 666       supports only these predefined values:
 667       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 668       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 669      </para>
 670      <para>
 671       See <filename>tab/explain.att</filename> and the
 672       <ulink url="&url.z39.50;">Z39.50</ulink> standard
 673       for more information.
 674      </para>
 675     </sect3>
 676
 677     <sect3>
 678      <title>Explain searches with yaz-client</title>
 679      <para>
 680       Classic Explain only defines retrieval of Explain information
 681       via ASN.1. Practically no Z39.50 clients supports this. Fortunately
 682       they don't have to - Zebra allows retrieval of this information
 683       in other formats:
 684       <literal>SUTRS</literal>, <literal>XML</literal>,
 685       <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 686      </para>
 687
 688      <para>
 689       List supported categories to find out which explain commands are
 690       supported:
 691       <screen>
 692        Z> base IR-Explain-1
 693        Z> find @attr exp1 1=1 categorylist
 694        Z> form sutrs
 695        Z> show 1+2
 696       </screen>
 697      </para>
 698
 699      <para>
 700       Get target info, that is, investigate which databases exist at
 701       this server endpoint:
 702       <screen>
 703        Z> base IR-Explain-1
 704        Z> find @attr exp1 1=1 targetinfo
 705        Z> form xml
 706        Z> show 1+1
 707        Z> form grs-1
 708        Z> show 1+1
 709        Z> form sutrs
 710        Z> show 1+1
 711       </screen>
 712      </para>
 713
 714      <para>
 715       List all supported databases, the number of hits
 716       is the number of databases found, which most commonly are the
 717       following two:
 718       the <literal>Default</literal> and the
 719       <literal>IR-Explain-1</literal> databases.
 720       <screen>
 721        Z> base IR-Explain-1
 722        Z> find @attr exp1 1=1 databaseinfo
 723        Z> form sutrs
 724        Z> show 1+2
 725       </screen>
 726      </para>
 727
 728      <para>
 729       Get database info record for database <literal>Default</literal>.
 730       <screen>
 731        Z> base IR-Explain-1
 732        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 733       </screen>
 734       Identical query with explicitly specified attribute set:
 735       <screen>
 736        Z> base IR-Explain-1
 737        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 738       </screen>
 739      </para>
 740
 741      <para>
 742       Get attribute details record for database
 743       <literal>Default</literal>.
 744       This query is very useful to study the internal Zebra indexes.
 745       If records have been indexed using the <literal>alvis</literal>
 746       XSLT filter, the string representation names of the known indexes can be
 747       found.
 748       <screen>
 749        Z> base IR-Explain-1
 750        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 751       </screen>
 752       Identical query with explicitly specified attribute set:
 753       <screen>
 754        Z> base IR-Explain-1
 755        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 756       </screen>
 757      </para>
 758     </sect3>
 759
 760    </sect2>
 761
 762    <sect2 id="querymodel-bib1">
 763     <title>Bib1 Attribute Set</title>
 764     <para>
 765      Most of the information contained in this section is an excerpt of
 766      the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 767       SEMANTICS</literal>,
 768      found at <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
 769       Attribute Set Semantics</ulink> from 1995, also in an updated
 770      <ulink url="&url.z39.50.attset.bib1;">Bib-1
 771       Attribute Set</ulink>
 772      version from 2003. Index Data is not the copyright holder of this
 773      information, except for the configuration details, the listing of
 774      Zebra's capabilities, and the example queries.
 775     </para>
 776
 777
 778    <sect3 id="querymodel-bib1-use">
 779      <title>Use Attributes (type 1)</title>
 780
 781     <para>
 782      A use attribute specifies an access point for any atomic query.
 783      These access points are highly dependent on the attribute set used
 784      in the query, and are user configurable using the following
 785      default configuration files:
 786      <filename>tab/bib1.att</filename>,
 787      <filename>tab/dan1.att</filename>,
 788      <filename>tab/explain.att</filename>, and
 789      <filename>tab/gils.att</filename>.
 790      New attribute sets can be added by adding new
 791      <filename>tab/*.att</filename> configuration files, which need to
 792      be sourced in the main configuration <filename>zebra.cfg</filename>.
 793      </para>
 794
 795     <para>
 796      In addition, Zebra allows the access of
 797      <emphasis>internal index names</emphasis> and <emphasis>dynamic
 798      XPath</emphasis> as use attributes; see
 799       <xref linkend="querymodel-use-string"/> and
 800      <xref linkend="querymodel-use-xpath"/>.
 801     </para>
 802
 803     <para>
 804      Phrase search for <emphasis>information retrieval</emphasis> in
 805      the title-register, scanning the same register afterwards:
 806      <screen>
 807       Z> find @attr 1=4 "information retrieval"
 808       Z> scan @attr 1=4 information
 809      </screen>
 810     </para>
 811     </sect3>
 812
 813    </sect2>
 814
 815
 816    <sect2 id="querymodel-bib1-nonuse">
 817      <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
 818
 819     <sect3 id="querymodel-bib1-relation">
 820      <title>Relation Attributes (type 2)</title>
 821
 822      <para>
 823       Relation attributes describe the relationship of the access
 824       point (left side
 825       of the relation) to the search term as qualified by the attributes (right
 826       side of the relation), e.g., Date-publication &lt;= 1975.
 827       </para>
 828
 829      <table id="querymodel-bib1-relation-table"
 830       frame="all" rowsep="1" colsep="1" align="center">
 831
 832       <caption>Relation Attributes (type 2)</caption>
 833       <thead>
 834         <tr>
 835          <td>Relation</td>
 836          <td>Value</td>
 837          <td>Notes</td>
 838         </tr>
 839        </thead>
 840        <tbody>
 841         <tr>
 842          <td> Less than</td>
 843          <td>1</td>
 844          <td>supported</td>
 845         </tr>
 846         <tr>
 847          <td>Less than or equal</td>
 848          <td>2</td>
 849          <td>supported</td>
 850         </tr>
 851         <tr>
 852          <td>Equal</td>
 853          <td>3</td>
 854          <td>default</td>
 855         </tr>
 856         <tr>
 857          <td>Greater or equal</td>
 858          <td>4</td>
 859          <td>supported</td>
 860         </tr>
 861         <tr>
 862          <td>Greater than</td>
 863          <td>5</td>
 864          <td>supported</td>
 865         </tr>
 866         <tr>
 867          <td>Not equal</td>
 868          <td>6</td>
 869          <td>unsupported</td>
 870         </tr>
 871         <tr>
 872          <td>Phonetic</td>
 873          <td>100</td>
 874          <td>unsupported</td>
 875         </tr>
 876         <tr>
 877          <td>Stem</td>
 878          <td>101</td>
 879          <td>unsupported</td>
 880         </tr>
 881         <tr>
 882          <td>Relevance</td>
 883          <td>102</td>
 884          <td>supported</td>
 885         </tr>
 886         <tr>
 887          <td>AlwaysMatches</td>
 888          <td>103</td>
 889          <td>supported</td>
 890         </tr>
 891        </tbody>
 892      </table>
 893
 894      <para>
 895       The relation attributes
 896       <literal>1-5</literal> are supported and work exactly as
 897       expected.
 898       All ordering operations are based on a lexicographical ordering,
 899       <emphasis>expect</emphasis> when the
 900       <literal>structure attribute numeric (109)</literal> is used. In
 901       this case, ordering is numerical. See
 902       <xref linkend="querymodel-bib1-structure"/>.
 903       <screen>
 904        Z>  find @attr 1=Title @attr 2=1 music
 905        ...
 906        Number of hits: 11745, setno 1
 907        ...
 908        Z>  find @attr 1=Title @attr 2=2 music
 909        ...
 910        Number of hits: 11771, setno 2
 911        ...
 912        Z>  find @attr 1=Title @attr 2=3 music
 913        ...
 914        Number of hits: 532, setno 3
 915        ...
 916        Z>  find @attr 1=Title @attr 2=4 music
 917        ...
 918        Number of hits: 11463, setno 4
 919        ...
 920        Z>  find @attr 1=Title @attr 2=5 music
 921        ...
 922        Number of hits: 11419, setno 5
 923       </screen>
 924      </para>
 925
 926      <para>
 927       The relation attribute
 928       <literal>Relevance (102)</literal> is supported, see
 929       <xref linkend="administration-ranking"/> for full information.
 930      </para>
 931
 932      <para>
 933       Ranked search for <emphasis>information retrieval</emphasis> in
 934       the title-register:
 935       <screen>
 936        Z> find @attr 1=4 @attr 2=102 "information retrieval"
 937       </screen>
 938      </para>
 939
 940      <para>
 941       The relation attribute
 942       <literal>AlwaysMatches (103)</literal> is in the default
 943       configuration
 944       supported in conjecture with structure attribute
 945       <literal>Phrase (1)</literal> (which may be omitted by
 946       default).
 947       It can be configured to work with other structure attributes,
 948       see the configuration file
 949       <filename>tab/default.idx</filename> and
 950        <xref linkend="querymodel-pqf-apt-mapping"/>.
 951      </para>
 952      <para>
 953       <literal>AlwaysMatches (103)</literal> is a
 954       great way to discover how many documents have been indexed in a
 955       given field. The search term is ignored, but needed for correct
 956       PQF syntax. An empty search term may be supplied.
 957       <screen>
 958        Z> find @attr 1=Title  @attr 2=103  ""
 959        Z> find @attr 1=Title  @attr 2=103  @attr 4=1 ""
 960       </screen>
 961      </para>
 962
 963
 964     </sect3>
 965
 966     <sect3 id="querymodel-bib1-position">
 967      <title>Position Attributes (type 3)</title>
 968
 969      <para>
 970       The position attribute specifies the location of the search term
 971       within the field or subfield in which it appears.
 972      </para>
 973
 974      <table id="querymodel-bib1-position-table"
 975       frame="all" rowsep="1" colsep="1" align="center">
 976
 977       <caption>Position Attributes (type 3)</caption>
 978       <thead>
 979         <tr>
 980          <td>Position</td>
 981          <td>Value</td>
 982          <td>Notes</td>
 983         </tr>
 984        </thead>
 985        <tbody>
 986         <tr>
 987          <td>First in field </td>
 988          <td>1</td>
 989          <td>unsupported</td>
 990         </tr>
 991         <tr>
 992          <td>First in subfield</td>
 993          <td>2</td>
 994          <td>unsupported</td>
 995         </tr>
 996         <tr>
 997          <td>Any position in field</td>
 998          <td>3</td>
 999          <td>default</td>
1000         </tr>
1001        </tbody>
1002      </table>
1003
1004     <para>
1005       The position attribute values <literal>first in field (1)</literal>,
1006       and <literal>first in subfield(2)</literal> are unsupported.
1007       Using them does not trigger an error, but silent defaults to
1008       <literal>any position in field (3)</literal>.
1009       <!-- It should -->
1010       </para>
1011     </sect3>
1012
1013     <sect3 id="querymodel-bib1-structure">
1014      <title>Structure Attributes (type 4)</title>
1015
1016      <para>
1017       The structure attribute specifies the type of search
1018       term. This causes the search to be mapped on
1019       different Zebra internal indexes, which must have been defined
1020       at index time.
1021      </para>
1022
1023      <para>
1024       The possible values of the
1025       <literal>structure attribute (type 4)</literal> can be defined
1026       using the configuration file <filename>
1027       tab/default.idx</filename>.
1028       The default configuration is summarized in this table.
1029      </para>
1030
1031      <table id="querymodel-bib1-structure-table"
1032       frame="all" rowsep="1" colsep="1" align="center">
1033
1034       <caption>Structure Attributes (type 4)</caption>
1035       <thead>
1036         <tr>
1037          <td>Structure</td>
1038          <td>Value</td>
1039          <td>Notes</td>
1040         </tr>
1041        </thead>
1042        <tbody>
1043         <tr>
1044          <td>Phrase </td>
1045          <td>1</td>
1046          <td>default</td>
1047         </tr>
1048         <tr>
1049          <td>Word</td>
1050          <td>2</td>
1051          <td>supported</td>
1052         </tr>
1053         <tr>
1054          <td>Key</td>
1055          <td>3</td>
1056          <td>supported</td>
1057         </tr>
1058         <tr>
1059          <td>Year</td>
1060          <td>4</td>
1061          <td>supported</td>
1062         </tr>
1063         <tr>
1064          <td>Date (normalized)</td>
1065          <td>5</td>
1066          <td>supported</td>
1067         </tr>
1068         <tr>
1069          <td>Word list</td>
1070          <td>6</td>
1071          <td>supported</td>
1072         </tr>
1073         <tr>
1074          <td>Date (un-normalized)</td>
1075          <td>100</td>
1076          <td>unsupported</td>
1077         </tr>
1078         <tr>
1079          <td>Name (normalized) </td>
1080          <td>101</td>
1081          <td>unsupported</td>
1082         </tr>
1083         <tr>
1084          <td>Name (un-normalized) </td>
1085          <td>102</td>
1086          <td>unsupported</td>
1087         </tr>
1088         <tr>
1089          <td>Structure</td>
1090          <td>103</td>
1091          <td>unsupported</td>
1092         </tr>
1093         <tr>
1094          <td>Urx</td>
1095          <td>104</td>
1096          <td>supported</td>
1097         </tr>
1098         <tr>
1099          <td>Free-form-text</td>
1100          <td>105</td>
1101          <td>supported</td>
1102         </tr>
1103         <tr>
1104          <td>Document-text</td>
1105          <td>106</td>
1106          <td>supported</td>
1107         </tr>
1108         <tr>
1109          <td>Local-number</td>
1110          <td>107</td>
1111          <td>supported</td>
1112         </tr>
1113         <tr>
1114          <td>String</td>
1115          <td>108</td>
1116          <td>unsupported</td>
1117         </tr>
1118         <tr>
1119          <td>Numeric string</td>
1120          <td>109</td>
1121          <td>supported</td>
1122         </tr>
1123        </tbody>
1124      </table>
1125
1126
1127     <para>
1128      The structure attribute values
1129      <literal>Word list (6)</literal>
1130      is supported, and maps to the boolean <literal>AND</literal>
1131      combination of words supplied. The word list is useful when
1132      google-like bag-of-word queries need to be translated from a GUI
1133      query language to PQF.  For example, the following queries
1134      are equivalent:
1135      <screen>
1136       Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
1137       Z> find @attr 1=Title  @and mozart amadeus
1138      </screen>
1139     </para>
1140
1141     <para>
1142      The structure attribute value
1143      <literal>Free-form-text (105)</literal> and
1144      <literal>Document-text (106)</literal>
1145      are supported, and map both to the boolean <literal>OR</literal>
1146      combination of words supplied. The following queries
1147      are equivalent:
1148      <screen>
1149       Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
1150       Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
1151       Z> find @attr 1=Body-of-text @or bach @or salieri teleman
1152      </screen>
1153      This <literal>OR</literal> list of terms is very useful in
1154      combination with relevance ranking:
1155      <screen>
1156       Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
1157      </screen>
1158     </para>
1159
1160     <para>
1161      The structure attribute value
1162      <literal>Local number (107)</literal>
1163      is supported, and maps always to the Zebra internal document ID,
1164      irrespectively which use attribute is specified. The following queries
1165      have exactly the same unique record in the hit set:
1166      <screen>
1167       Z> find @attr 4=107 10
1168       Z> find @attr 1=4 @attr 4=107 10
1169       Z> find @attr 1=1010 @attr 4=107 10
1170      </screen>
1171     </para>
1172
1173     <para>
1174      In
1175      the GILS schema (<literal>gils.abs</literal>), the
1176      west-bounding-coordinate is indexed as type <literal>n</literal>,
1177      and is therefore searched by specifying
1178      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1179      To match all those records with west-bounding-coordinate greater
1180      than -114 we use the following query:
1181      <screen>
1182       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1183      </screen>
1184     </para>
1185      <note>
1186       The exact mapping between PQF queries and Zebra internal indexes
1187       and index types is explained in
1188        <xref linkend="querymodel-pqf-apt-mapping"/>.
1189       </note>
1190
1191    </sect3>
1192
1193     <sect3 id="querymodel-bib1-truncation">
1194      <title>Truncation Attributes (type = 5)</title>
1195
1196      <para>
1197       The truncation attribute specifies whether variations of one or
1198       more characters are allowed between search term and hit terms, or
1199       not. Using non-default truncation attributes will broaden the
1200       document hit set of a search query.
1201      </para>
1202
1203      <table id="querymodel-bib1-truncation-table"
1204       frame="all" rowsep="1" colsep="1" align="center">
1205
1206       <caption>Truncation Attributes (type 5)</caption>
1207       <thead>
1208         <tr>
1209          <td>Truncation</td>
1210          <td>Value</td>
1211          <td>Notes</td>
1212         </tr>
1213        </thead>
1214        <tbody>
1215         <tr>
1216          <td>Right truncation </td>
1217          <td>1</td>
1218          <td>supported</td>
1219         </tr>
1220         <tr>
1221          <td>Left truncation</td>
1222          <td>2</td>
1223          <td>supported</td>
1224         </tr>
1225         <tr>
1226          <td>Left and right truncation</td>
1227          <td>3</td>
1228          <td>supported</td>
1229         </tr>
1230         <tr>
1231          <td>Do not truncate</td>
1232          <td>100</td>
1233          <td>default</td>
1234         </tr>
1235         <tr>
1236          <td>Process # in search term</td>
1237          <td>101</td>
1238          <td>supported</td>
1239         </tr>
1240         <tr>
1241          <td>RegExpr-1 </td>
1242          <td>102</td>
1243          <td>supported</td>
1244         </tr>
1245         <tr>
1246          <td>RegExpr-2</td>
1247          <td>103</td>
1248          <td>supported</td>
1249         </tr>
1250        </tbody>
1251      </table>
1252
1253      <para>
1254       The truncation attribute values 1-3 perform the obvious way:
1255       <screen>
1256        Z> scan @attr 1=Body-of-text  schnittke
1257        ...
1258        * schnittke (81)
1259        schnittkes (31)
1260        schnittstelle (1)
1261        ...
1262        Z> find @attr 1=Body-of-text  @attr 5=1 schnittke
1263        ...
1264        Number of hits: 95, setno 7
1265        ...
1266        Z> find @attr 1=Body-of-text  @attr 5=2 schnittke
1267        ...
1268        Number of hits: 81, setno 6
1269        ...
1270        Z> find @attr 1=Body-of-text  @attr 5=3 schnittke
1271        ...
1272        Number of hits: 95, setno 8
1273       </screen>
1274       </para>
1275
1276      <para>
1277       The truncation attribute value
1278       <literal>Process # in search term (101)</literal> is a
1279       poor-man's regular expression search. It maps
1280       each <literal>#</literal> to <literal>.*</literal>, and
1281       performs then a <literal>Regexp-1 (102)</literal> regular
1282       expression search. The following two queries are equivalent:
1283       <screen>
1284        Z> find @attr 1=Body-of-text  @attr 5=101 schnit#ke
1285        Z> find @attr 1=Body-of-text  @attr 5=102 schnit.*ke
1286        ...
1287        Number of hits: 89, setno 10
1288       </screen>
1289      </para>
1290
1291      <para>
1292       The truncation attribute value
1293        <literal>Regexp-1 (102)</literal> is a normal regular search,
1294       see <xref linkend="querymodel-regular"/> for details.
1295       <screen>
1296        Z> find @attr 1=Body-of-text  @attr 5=102 schnit+ke
1297        Z> find @attr 1=Body-of-text  @attr 5=102 schni[a-t]+ke
1298       </screen>
1299      </para>
1300
1301      <para>
1302        The truncation attribute value
1303       <literal>Regexp-2 (103) </literal> is a Zebra specific extension
1304       which allows <emphasis>fuzzy</emphasis> matches. One single
1305       error in spelling of search terms is allowed, i.e., a document
1306       is hit if it includes a term which can be mapped to the used
1307       search term by one character substitution, addition, deletion or
1308       change of position.
1309       <screen>
1310        Z> find @attr 1=Body-of-text  @attr 5=100 schnittke
1311        ...
1312        Number of hits: 81, setno 14
1313        ...
1314        Z> find @attr 1=Body-of-text  @attr 5=103 schnittke
1315        ...
1316        Number of hits: 103, setno 15
1317        ...
1318       </screen>
1319       </para>
1320     </sect3>
1321
1322     <sect3 id="querymodel-bib1-completeness">
1323     <title>Completeness Attributes (type = 6)</title>
1324
1325
1326      <para>
1327       The <literal>Completeness Attributes (type = 6)</literal>
1328       is used to specify that a given search term or term list is  either
1329       part of the terms of a given index/field
1330       (<literal>Incomplete subfield (1)</literal>), or is
1331       what literally is found in the entire field's index
1332       (<literal>Complete field (3)</literal>).
1333       </para>
1334
1335      <table id="querymodel-bib1-completeness-table"
1336       frame="all" rowsep="1" colsep="1" align="center">
1337       <caption>Completeness Attributes (type = 6)</caption>
1338       <thead>
1339         <tr>
1340          <td>Completeness</td>
1341          <td>Value</td>
1342          <td>Notes</td>
1343         </tr>
1344        </thead>
1345        <tbody>
1346         <tr>
1347          <td>Incomplete subfield</td>
1348          <td>1</td>
1349          <td>default</td>
1350         </tr>
1351         <tr>
1352          <td>Complete subfield</td>
1353          <td>2</td>
1354          <td>depreciated</td>
1355         </tr>
1356         <tr>
1357          <td>Complete field</td>
1358          <td>3</td>
1359          <td>supported</td>
1360         </tr>
1361        </tbody>
1362      </table>
1363
1364      <para>
1365       The <literal>Completeness Attributes (type = 6)</literal>
1366       is only partially and conditionally
1367       supported in the sense that it is ignored if the hit index is
1368       not of structure <literal>type="w"</literal> or
1369       <literal>type="p"</literal>.
1370       </para>
1371      <para>
1372       <literal>Incomplete subfield (1)</literal> is the default, and
1373       makes Zebra use
1374       register <literal>type="w"</literal>, whereas
1375       <literal>Complete field (3)</literal> triggers
1376       search and scan in index <literal>type="p"</literal>.
1377      </para>
1378      <para>
1379       The <literal>Complete subfield (2)</literal> is a reminiscens
1380       from the  happy <literal>MARC</literal>
1381       binary format days. Zebra does not support it, but maps silently
1382       to <literal>Complete field (3)</literal>.
1383      </para>
1384
1385      <note>
1386       The exact mapping between PQF queries and Zebra internal indexes
1387       and index types is explained in
1388        <xref linkend="querymodel-pqf-apt-mapping"/>.
1389       </note>
1390     </sect3>
1391    </sect2>
1392
1393    </sect1>
1394
1395
1396   <sect1 id="querymodel-zebra">
1397    <title>Advanced Zebra PQF Features</title>
1398    <para>
1399     The Zebra internal query engine has been extended to specific needs
1400     not covered by the <literal>bib-1</literal> attribute set query
1401     model. These extensions are <emphasis>non-standard</emphasis>
1402     and <emphasis>non-portable</emphasis>: most functional extensions
1403     are modeled over the <literal>bib-1</literal> attribute set,
1404     defining type 7-9 attributes.
1405     There are also the special
1406     <literal>string</literal> type index names for the
1407     <literal>idxpath</literal> attribute set.
1408    </para>
1409
1410    <sect2 id="querymodel-zebra-attr-allrecords">
1411     <title>Zebra specific retrieval of all records</title>
1412     <para>
1413      Zebra defines a hardwired <literal>string</literal> index name
1414      called <literal>_ALLRECORDS</literal>. It matches any record
1415      contained in the database, if used in conjunction with
1416      the relation attribute
1417      <literal>AlwaysMatches (103)</literal>.
1418      </para>
1419     <para>
1420      The <literal>_ALLRECORDS</literal> index name is used for total database
1421      export. The search term is ignored, it may be empty.
1422      <screen>
1423       Z> find @attr 1=_ALLRECORDS @attr 2=103 ""
1424      </screen>
1425     </para>
1426     <para>
1427      Combination with other index types can be made. For example, to
1428      find all records which are <emphasis>not</emphasis> indexed in
1429      the <literal>Title</literal> register, issue one of the two
1430      equivalent queries:
1431      <screen>
1432       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=Title @attr 2=103 ""
1433       Z> find @not @attr 1=_ALLRECORDS @attr 2=103 "" @attr 1=4 @attr 2=103 ""
1434      </screen>
1435     </para>
1436     <warning>
1437      The special string index <literal>_ALLRECORDS</literal> is
1438      experimental, and the provided functionality and syntax may very
1439      well change in future releases of Zebra.
1440     </warning>
1441
1442    </sect2>
1443
1444    <sect2 id="querymodel-zebra-attr-search">
1445     <title>Zebra specific Search Extensions to all Attribute Sets</title>
1446     <para>
1447      Zebra extends the Bib1 attribute types, and these extensions are
1448      recognized regardless of attribute
1449      set used in a <literal>search</literal> operation query.
1450     </para>
1451
1452      <table id="querymodel-zebra-attr-search-table"
1453       frame="all" rowsep="1" colsep="1" align="center">
1454
1455       <caption>Zebra Search Attribute Extensions</caption>
1456        <thead>
1457         <tr>
1458          <td>Name</td>
1459          <td>Value</td>
1460          <td>Operation</td>
1461          <td>Zebra version</td>
1462         </tr>
1463       </thead>
1464        <tbody>
1465         <tr>
1466          <td>Embedded Sort</td>
1467          <td>7</td>
1468          <td>search</td>
1469          <td>1.1</td>
1470         </tr>
1471         <tr>
1472          <td>Term Set</td>
1473          <td>8</td>
1474          <td>search</td>
1475          <td>1.1</td>
1476         </tr>
1477         <tr>
1478          <td>Rank Weight</td>
1479          <td>9</td>
1480          <td>search</td>
1481          <td>1.1</td>
1482         </tr>
1483         <tr>
1484          <td>Approx Limit</td>
1485          <td>9</td>
1486          <td>search</td>
1487          <td>1.4</td>
1488         </tr>
1489         <tr>
1490          <td>Term Reference</td>
1491          <td>10</td>
1492          <td>search</td>
1493          <td>1.4</td>
1494         </tr>
1495        </tbody>
1496       </table>
1497
1498     <sect3 id="querymodel-zebra-attr-sorting">
1499      <title>Zebra Extension Embedded Sort Attribute (type 7)</title>
1500     </sect3>
1501     <para>
1502      The embedded sort is a way to specify sort within a query - thus
1503      removing the need to send a Sort Request separately. It is both
1504      faster and does not require clients to deal with the Sort
1505      Facility.
1506     </para>
1507
1508     <para>
1509      All ordering operations are based on a lexicographical ordering,
1510      <emphasis>expect</emphasis> when the
1511      <literal>structure attribute numeric (109)</literal> is used. In
1512      this case, ordering is numerical. See
1513       <xref linkend="querymodel-bib1-structure"/>.
1514     </para>
1515
1516     <para>
1517      The possible values after attribute <literal>type 7</literal> are
1518      <literal>1</literal> ascending and
1519      <literal>2</literal> descending.
1520      The attributes+term (APT) node is separate from the
1521      rest and must be <literal>@or</literal>'ed.
1522      The term associated with APT is the sorting level in integers,
1523      where <literal>0</literal> means primary sort,
1524      <literal>1</literal> means secondary sort, and so forth.
1525      See also <xref linkend="administration-ranking"/>.
1526     </para>
1527     <para>
1528      For example, searching for water, sort by title (ascending)
1529      <screen>
1530       Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1531      </screen>
1532     </para>
1533     <para>
1534      Or, searching for water, sort by title ascending, then date descending
1535      <screen>
1536       Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1537      </screen>
1538     </para>
1539
1540     <sect3 id="querymodel-zebra-attr-estimation">
1541      <title>Zebra Extension Term Set Attribute (type 8)</title>
1542     </sect3>
1543     <para>
1544      The Term Set feature is a facility that allows a search to store
1545      hitting terms in a "pseudo" resultset; thus a search (as usual) +
1546      a scan-like facility. Requires a client that can do named result
1547      sets since the search generates two result sets. The value for
1548      attribute 8 is the name of a result set (string). The terms in
1549      the named term set are returned as SUTRS records.
1550     </para>
1551     <para>
1552      For example, searching  for u in title, right truncated, and
1553      storing the result in term set named 'aset'
1554      <screen>
1555       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1556      </screen>
1557     </para>
1558     <warning>
1559      The model has one serious flaw: we don't know the size of term
1560      set. Experimental. Do not use in production code.
1561     </warning>
1562
1563     <sect3 id="querymodel-zebra-attr-weight">
1564      <title>Zebra Extension Rank Weight Attribute (type 9)</title>
1565     </sect3>
1566     <para>
1567      Rank weight is a way to pass a value to a ranking algorithm - so
1568      that one APT has one value - while another as a different one.
1569      See also <xref linkend="administration-ranking"/>.
1570     </para>
1571     <para>
1572      For example, searching  for utah in title with weight 30 as well
1573      as any with weight 20:
1574      <screen>
1575       Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1576      </screen>
1577     </para>
1578
1579     <sect3 id="querymodel-zebra-attr-limit">
1580      <title>Zebra Extension Approximative Limit Attribute (type 9)</title>
1581     </sect3>
1582     <para>
1583      Newer Zebra versions normally estimate hit count for every APT
1584      (leaf) in the query tree. These hit counts are returned as part of
1585      the searchResult-1 facility in the binary encoded Z39.50 search
1586      response packages.
1587     </para>
1588     <para>
1589      By setting a limit for the APT we can make Zebra turn into
1590      approximate hit count when a certain hit count limit is
1591      reached. A value of zero means exact hit count.
1592     </para>
1593     <para>
1594      For example, we might be interested in exact hit count for a, but
1595      for b we allow hit count estimates for 1000 and higher.
1596      <screen>
1597       Z> find @and a @attr 9=1000 b
1598      </screen>
1599     </para>
1600     <note>
1601      The estimated hit count facility makes searches faster, as one
1602      only needs to process large hit lists partially.
1603     </note>
1604     <warning>
1605      This facility clashes with rank weight, because there all
1606      documents in the hit lists need to be examined for scoring and
1607      re-sorting.
1608      It is an experimental
1609      extension. Do not use in production code.
1610     </warning>
1611
1612     <sect3 id="querymodel-zebra-attr-termref">
1613      <title>Zebra Extension Term Reference Attribute (type 10)</title>
1614     </sect3>
1615     <para>
1616      Zebra supports the <literal>searchResult-1</literal> facility.
1617      If the <literal>Term Reference Attribute (type 10)</literal> is
1618      given, that specifies a subqueryId value returned as part of the
1619      search result. It is a way for a client to name an APT part of a
1620      query.
1621     </para>
1622     <!--
1623     <para>
1624      <screen>
1625      </screen>
1626     </para>
1627     -->
1628     <warning>
1629      Experimental. Do not use in production code.
1630     </warning>
1631
1632
1633    </sect2>
1634
1635
1636    <sect2 id="querymodel-zebra-attr-scan">
1637     <title>Zebra specific Scan Extensions to all Attribute Sets</title>
1638     <para>
1639      Zebra extends the Bib1 attribute types, and these extensions are
1640      recognized regardless of attribute
1641      set used in a <literal>scan</literal> operation query.
1642     </para>
1643      <table id="querymodel-zebra-attr-scan-table"
1644       frame="all" rowsep="1" colsep="1" align="center">
1645
1646       <caption>Zebra Scan Attribute Extensions</caption>
1647        <thead>
1648         <tr>
1649          <td>Name</td>
1650          <td>Type</td>
1651          <td>Operation</td>
1652          <td>Zebra version</td>
1653         </tr>
1654       </thead>
1655        <tbody>
1656         <tr>
1657          <td>Result Set Narrow</td>
1658          <td>8</td>
1659          <td>scan</td>
1660          <td>1.3</td>
1661         </tr>
1662         <tr>
1663          <td>Approximative Limit</td>
1664          <td>9</td>
1665          <td>scan</td>
1666          <td>1.4</td>
1667         </tr>
1668        </tbody>
1669       </table>
1670
1671     <sect3 id="querymodel-zebra-attr-narrow">
1672      <title>Zebra Extension Result Set Narrow (type 8)</title>
1673     </sect3>
1674     <para>
1675      If attribute <literal>Result Set Narrow (type 8)</literal>
1676      is given for <literal>scan</literal>, the value is the name of a
1677      result set. Each hit count in <literal>scan</literal> is
1678      <literal>@and</literal>'ed with the result set given.
1679     </para>
1680     <para>
1681      Consider for example
1682      the case of scanning all title fields around the
1683      scanterm <emphasis>mozart</emphasis>, then refining the scan by
1684      issuing a filtering query for <emphasis>amadeus</emphasis> to
1685      restrict the scan to the result set of the query:
1686      <screen>
1687       Z> scan @attr 1=4 mozart
1688       ...
1689       * mozart (43)
1690         mozartforskningen (1)
1691         mozartiana (1)
1692         mozarts (16)
1693       ...
1694       Z> f @attr 1=4 amadeus
1695       ...
1696       Number of hits: 15, setno 2
1697       ...
1698       Z> scan @attr 1=4 @attr 8=2 mozart
1699       ...
1700       * mozart (14)
1701         mozartforskningen (0)
1702         mozartiana (0)
1703         mozarts (1)
1704       ...
1705      </screen>
1706     </para>
1707
1708     <warning>
1709      Experimental. Do not use in production code.
1710     </warning>
1711
1712     <sect3 id="querymodel-zebra-attr-approx">
1713      <title>Zebra Extension Approximative Limit (type 9)</title>
1714     </sect3>
1715     <para>
1716      The <literal>Zebra Extension Approximative Limit (type
1717       9)</literal> is a way to enable approximate
1718      hit counts for <literal>scan</literal> hit counts, in the same
1719      way as for <literal>search</literal> hit counts.
1720     </para>
1721     <!--
1722     <para>
1723      <screen>
1724      </screen>
1725     </para>
1726     -->
1727     <warning>
1728      Experimental and buggy. Definitely not to be used in production code.
1729     </warning>
1730
1731
1732    </sect2>
1733
1734
1735    <sect2 id="querymodel-idxpath">
1736     <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
1737     <para>
1738      The attribute-set <literal>idxpath</literal> consists of a single
1739      <literal>Use (type 1)</literal> attribute. All non-use attributes
1740      behave as normal.
1741     </para>
1742     <para>
1743      This feature is enabled when defining the
1744      <literal>xpath enable</literal> option in the GRS filter
1745      <filename>*.abs</filename> configuration files. If one wants to use
1746      the special <literal>idxpath</literal> numeric attribute set, the
1747      main Zebra configuration file <filename>zebra.cfg</filename>
1748      directive <literal>attset: idxpath.att</literal> must be enabled.
1749     </para>
1750     <warning>The <literal>idxpath</literal> is depreciated, may not be
1751      supported in future Zebra versions, and should definitely
1752      not be used in production code.
1753     </warning>
1754
1755     <sect3 id="querymodel-idxpath-use">
1756     <title>IDXPATH Use Attributes (type = 1)</title>
1757      <para>
1758       This attribute set allows one to search GRS filter indexed
1759       records by XPATH like structured index names.
1760      </para>
1761
1762      <warning>The <literal>idxpath</literal> option defines hard-coded
1763       index names, which might clash with your own index names.
1764      </warning>
1765
1766      <table id="querymodel-idxpath-use-table"
1767       frame="all" rowsep="1" colsep="1" align="center">
1768
1769       <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
1770       <thead>
1771         <tr>
1772          <td>IDXPATH</td>
1773          <td>Value</td>
1774          <td>String Index</td>
1775          <td>Notes</td>
1776         </tr>
1777        </thead>
1778        <tbody>
1779         <tr>
1780          <td>XPATH Begin</td>
1781          <td>1</td>
1782          <td>_XPATH_BEGIN</td>
1783          <td>depreciated</td>
1784         </tr>
1785         <tr>
1786          <td>XPATH End</td>
1787          <td>2</td>
1788          <td>_XPATH_END</td>
1789          <td>depreciated</td>
1790         </tr>
1791         <tr>
1792          <td>XPATH CData</td>
1793          <td>1016</td>
1794          <td>_XPATH_CDATA</td>
1795          <td>depreciated</td>
1796         </tr>
1797         <tr>
1798          <td>XPATH Attribute Name</td>
1799          <td>3</td>
1800          <td>_XPATH_ATTR_NAME</td>
1801          <td>depreciated</td>
1802         </tr>
1803         <tr>
1804          <td>XPATH Attribute CData</td>
1805          <td>1015</td>
1806          <td>_XPATH_ATTR_CDATA</td>
1807          <td>depreciated</td>
1808         </tr>
1809        </tbody>
1810      </table>
1811
1812
1813      <para>
1814       See <filename>tab/idxpath.att</filename> for more information.
1815      </para>
1816      <para>
1817       Search for all documents starting with root element
1818       <literal>/root</literal> (either using the numeric or the string
1819       use attributes):
1820       <screen>
1821        Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1822        Z> find @attr idxpath 1=1 @attr 4=3 root/
1823        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1824       </screen>
1825      </para>
1826      <para>
1827       Search for all documents where specific nested XPATH
1828       <literal>/c1/c2/../cn</literal> exists. Notice the very
1829       counter-intuitive <emphasis>reverse</emphasis> notation!
1830       <screen>
1831        Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1832        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1833       </screen>
1834      </para>
1835      <para>
1836       Search for CDATA string <emphasis>text</emphasis> in any  element
1837       <screen>
1838        Z> find @attrset idxpath @attr 1=1016 text
1839        Z> find @attr 1=_XPATH_CDATA text
1840       </screen>
1841      </para>
1842      <para>
1843        Search for CDATA string <emphasis>anothertext</emphasis> in any
1844        attribute:
1845       <screen>
1846        Z> find @attrset idxpath @attr 1=1015 anothertext
1847        Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1848       </screen>
1849      </para>
1850      <para>
1851        Search for all documents with have an XML element node
1852        including an XML  attribute named <emphasis>creator</emphasis>
1853       <screen>
1854        Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1855        Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1856       </screen>
1857      </para>
1858      <para>
1859       Combining usual <literal>bib-1</literal> attribute set searches
1860       with <literal>idxpath</literal> attribute set searches:
1861       <screen>
1862        Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1863        Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1864       </screen>
1865      </para>
1866      <para>
1867       Scanning is supported on all <literal>idxpath</literal>
1868       indexes, both specified as numeric use attributes, or as string
1869       index names.
1870       <screen>
1871        Z> scan  @attrset idxpath @attr 1=1016 text
1872        Z> scan  @attr 1=_XPATH_ATTR_CDATA anothertext
1873        Z> scan  @attrset idxpath @attr 1=3 @attr 4=3 ''
1874       </screen>
1875      </para>
1876
1877     </sect3>
1878    </sect2>
1879
1880
1881    <sect2 id="querymodel-pqf-apt-mapping">
1882     <title>Mapping from PQF atomic APT queries to Zebra internal
1883      register indexes</title>
1884     <para>
1885      The rules for PQF APT mapping are rather tricky to grasp in the
1886      first place. We deal first with the rules for deciding which
1887      internal register or string index to use, according to the use
1888      attribute or access point specified in the query. Thereafter we
1889      deal with the rules for determining the correct structure type of
1890      the named register.
1891     </para>
1892
1893    <sect3 id="querymodel-pqf-apt-mapping-accesspoint">
1894     <title>Mapping of PQF APT access points</title>
1895     <para>
1896       Zebra understands four fundamental different types of access
1897       points, of which only the
1898       <emphasis>numeric use attribute</emphasis> type access points
1899       are defined by the  <ulink url="&url.z39.50;">Z39.50</ulink>
1900       standard.
1901       All other access point types are Zebra specific, and non-portable.
1902     </para>
1903
1904      <table id="querymodel-zebra-mapping-accesspoint-types"
1905       frame="all" rowsep="1" colsep="1" align="center">
1906
1907       <caption>Access point name mapping</caption>
1908        <thead>
1909         <tr>
1910          <td>Access Point</td>
1911          <td>Type</td>
1912          <td>Grammar</td>
1913          <td>Notes</td>
1914         </tr>
1915       </thead>
1916       <tbody>
1917        <tr>
1918         <td>Use attribute</td>
1919         <td>numeric</td>
1920         <td>[1-9][1-9]*</td>
1921         <td>directly mapped to string index name</td>
1922        </tr>
1923        <tr>
1924         <td>String index name</td>
1925         <td>string</td>
1926         <td>[a-zA-Z](\-?[a-zA-Z0-9])*</td>
1927         <td>normalized name is used as internal string index name</td>
1928        </tr>
1929        <tr>
1930         <td>Zebra internal index name</td>
1931         <td>zebra</td>
1932         <td>_[a-zA-Z](_?[a-zA-Z0-9])*</td>
1933         <td>hardwired internal string index name</td>
1934        </tr>
1935        <tr>
1936         <td>XPATH special index</td>
1937         <td>XPath</td>
1938         <td>/.*</td>
1939         <td>special xpath search for GRS indexed records</td>
1940        </tr>
1941       </tbody>
1942     </table>
1943
1944     <para>
1945      <literal>Attribute set names</literal> and
1946      <literal>string index names</literal> are normalizes
1947      according to the following rules: all <emphasis>single</emphasis>
1948      hyphens <literal>'-'</literal> are stripped, and all upper case
1949      letters are folded to lower case.
1950      </para>
1951
1952      <para>
1953       <emphasis>Numeric use attributes</emphasis> are mapped
1954       to the Zebra internal
1955       string index according to the attribute set definition in use.
1956       The default attribute set is <literal>Bib-1</literal>, and may be
1957       omitted in the PQF query.
1958      </para>
1959
1960      <para>
1961       According to normalization and numeric
1962       use attribute mapping, it follows that the following
1963       PQF queries are considered equivalent (assuming the default
1964       configuration has not been altered):
1965       <screen>
1966       Z> find  @attr 1=Body-of-text serenade
1967       Z> find  @attr 1=bodyoftext serenade
1968       Z> find  @attr 1=BodyOfText serenade
1969       Z> find  @attr 1=bO-d-Y-of-tE-x-t serenade
1970       Z> find  @attr 1=1010 serenade
1971       Z> find  @attrset Bib-1 @attr 1=1010 serenade
1972       Z> find  @attrset bib1 @attr 1=1010 serenade
1973       Z> find  @attrset Bib1 @attr 1=1010 serenade
1974       Z> find  @attrset b-I-b-1 @attr 1=1010 serenade
1975      </screen>
1976     </para>
1977
1978     <para>
1979       The <emphasis>numerical</emphasis>
1980       <literal>use attributes (type 1)</literal>
1981       are interpreted according to the
1982       attribute sets which have been loaded in the
1983       <literal>zebra.cfg</literal> file, and are matched against specific
1984       fields as specified in the <literal>.abs</literal> file which
1985       describes the profile of the records which have been loaded.
1986       If no use attribute is provided, a default of
1987       <literal>Bib-1 Use Any (1016)</literal> is
1988       assumed.
1989       The predefined <literal>use attribute sets</literal>
1990       can be reconfigured by  tweaking the configuration files
1991       <filename>tab/*.att</filename>, and
1992       new attribute sets can be defined by adding similar files in the
1993       configuration path <literal>profilePath</literal> of the server.
1994     </para>
1995
1996      <para>
1997       <literal>String indexes</literal> can be accessed directly,
1998       independently which attribute set is in use. These are just
1999       ignored. The above mentioned name normalization applies.
2000       <literal>String index names</literal> are defined in the
2001       used indexing  filter configuration files, for example in the
2002       <literal>GRS</literal>
2003       <filename>*.abs</filename> configuration files, or in the
2004       <literal>alvis</literal> filter XSLT indexing stylesheets.
2005      </para>
2006
2007      <para>
2008       <literal>Zebra internal indexes</literal> can be accessed directly,
2009       according to the same rules as the user defined
2010       <literal>string indexes</literal>. The only difference is that
2011       <literal>Zebra internal index names</literal> are hardwired,
2012       all uppercase and
2013       must start with the character <literal>'_'</literal>.
2014      </para>
2015
2016      <para>
2017       Finally, <literal>XPATH</literal> access points are only
2018       available using the <literal>GRS</literal> filter for indexing.
2019       These access point names must start with the character
2020       <literal>'/'</literal>, they are <emphasis>not
2021       normalized</emphasis>, but passed unaltered to the Zebra internal
2022       XPATH engine. See <xref linkend="querymodel-use-xpath"/>.
2023
2024      </para>
2025
2026
2027     </sect3>
2028
2029
2030    <sect3 id="querymodel-pqf-apt-mapping-structuretype">
2031      <title>Mapping of PQF APT structure and completeness to
2032       register type</title>
2033     <para>
2034       Internally Zebra has in it's default configuration several
2035      different types of registers or indexes, whose tokenization and
2036       character normalization rules differ. This reflects the fact that
2037       searching fundamental different tokens like dates, numbers,
2038       bitfields and string based text needs different rule sets.
2039      </para>
2040
2041      <table id="querymodel-zebra-mapping-structure-types"
2042       frame="all" rowsep="1" colsep="1" align="center">
2043
2044       <caption>Structure and completeness mapping to register types</caption>
2045        <thead>
2046         <tr>
2047          <td>Structure</td>
2048          <td>Completeness</td>
2049          <td>Register type</td>
2050          <td>Notes</td>
2051         </tr>
2052       </thead>
2053       <tbody>
2054        <tr>
2055         <td>
2056           phrase (@attr 4=1), word (@attr 4=2),
2057           word-list (@attr 4=6),
2058           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2059          </td>
2060         <td>Incomplete field (@attr 6=1)</td>
2061         <td>Word ('w')</td>
2062         <td>Traditional tokenized and character normalized word index</td>
2063        </tr>
2064        <tr>
2065         <td>
2066           phrase (@attr 4=1), word (@attr 4=2),
2067           word-list (@attr 4=6),
2068           free-form-text  (@attr 4=105), or document-text (@attr 4=106)
2069          </td>
2070         <td>complete field' (@attr 6=3)</td>
2071         <td>Phrase ('p')</td>
2072         <td>Character normalized, but not tokenized index for phrase
2073           matches
2074          </td>
2075        </tr>
2076        <tr>
2077         <td>urx (@attr 4=104)</td>
2078         <td>ignored</td>
2079         <td>URX/URL ('u')</td>
2080         <td>Special index for URL web addresses</td>
2081        </tr>
2082        <tr>
2083         <td>numeric (@attr 4=109)</td>
2084         <td>ignored</td>
2085         <td>Numeric ('u')</td>
2086         <td>Special index for digital numbers</td>
2087        </tr>
2088        <tr>
2089         <td>key (@attr 4=3)</td>
2090         <td>ignored</td>
2091         <td>Null bitmap ('0')</td>
2092         <td>Used for non-tokenizated and non-normalized bit sequences</td>
2093        </tr>
2094        <tr>
2095         <td>year (@attr 4=4)</td>
2096         <td>ignored</td>
2097         <td>Year ('y')</td>
2098         <td>Non-tokenizated and non-normalized 4 digit numbers</td>
2099        </tr>
2100        <tr>
2101         <td>date (@attr 4=5)</td>
2102         <td>ignored</td>
2103         <td>Date ('d')</td>
2104         <td>Non-tokenizated and non-normalized ISO date strings</td>
2105        </tr>
2106        <tr>
2107         <td>ignored</td>
2108         <td>ignored</td>
2109         <td>Sort ('s')</td>
2110         <td>Used with special sort attribute set (@attr 7=1, @attr 7=2)</td>
2111        </tr>
2112        <tr>
2113         <td>overruled</td>
2114         <td>overruled</td>
2115         <td>special</td>
2116         <td>Internal record ID register, used whenever
2117          Relation Always Matches (@attr 2=103) is specified</td>
2118        </tr>
2119       </tbody>
2120     </table>
2121
2122      <!-- see in util/zebramap.c -->
2123
2124     <para>
2125      If a <emphasis>Structure</emphasis> attribute of
2126      <emphasis>Phrase</emphasis> is used in conjunction with a
2127      <emphasis>Completeness</emphasis> attribute of
2128      <emphasis>Complete (Sub)field</emphasis>, the term is matched
2129      against the contents of the phrase (long word) register, if one
2130      exists for the given <emphasis>Use</emphasis> attribute.
2131      A phrase register is created for those fields in the
2132      GRS <filename>*.abs</filename> file that contains a
2133      <literal>p</literal>-specifier.
2134       <screen>
2135        Z>  scan @attr 1=Title @attr 4=1 @attr 6=3 beethoven
2136        ...
2137        bayreuther festspiele (1)
2138        * beethoven bibliography database (1)
2139        benny carter (1)
2140        ...
2141        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography"
2142        ...
2143        Number of hits: 0, setno 5
2144        ...
2145        Z> find @attr 1=Title @attr 4=1 @attr 6=3 "beethoven bibliography database"
2146        ...
2147        Number of hits: 1, setno 6
2148        </screen>
2149     </para>
2150
2151     <para>
2152      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
2153      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
2154      default value for <emphasis>Completeness</emphasis>, the
2155      search is directed against the normal word registers, but if the term
2156      contains multiple words, the term will only match if all of the words
2157      are found immediately adjacent, and in the given order.
2158      The word search is performed on those fields that are indexed as
2159      type <literal>w</literal> in the GRS <filename>*.abs</filename> file.
2160       <screen>
2161        Z>  scan @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2162        ...
2163          beefheart (1)
2164        * beethoven (18)
2165          beethovens (7)
2166        ...
2167        Z> find @attr 1=Title @attr 4=1 @attr 6=1 beethoven
2168        ...
2169        Number of hits: 18, setno 1
2170        ...
2171        Z> find @attr 1=Title @attr 4=1 @attr 6=1 "beethoven  bibliography"
2172        ...
2173        Number of hits: 2, setno 2
2174        ...
2175      </screen>
2176     </para>
2177
2178     <para>
2179      If the <emphasis>Structure</emphasis> attribute is
2180      <emphasis>Word List</emphasis>,
2181      <emphasis>Free-form Text</emphasis>, or
2182      <emphasis>Document Text</emphasis>, the term is treated as a
2183      natural-language, relevance-ranked query.
2184      This search type uses the word register, i.e. those fields
2185      that are indexed as type <literal>w</literal> in the
2186      GRS <filename>*.abs</filename> file.
2187     </para>
2188
2189     <para>
2190      If the <emphasis>Structure</emphasis> attribute is
2191      <emphasis>Numeric String</emphasis> the term is treated as an integer.
2192      The search is performed on those fields that are indexed
2193      as type <literal>n</literal> in the GRS
2194       <filename>*.abs</filename> file.
2195     </para>
2196
2197     <para>
2198      If the <emphasis>Structure</emphasis> attribute is
2199      <emphasis>URX</emphasis> the term is treated as a URX (URL) entity.
2200      The search is performed on those fields that are indexed as type
2201      <literal>u</literal> in the <filename>*.abs</filename> file.
2202     </para>
2203
2204     <para>
2205      If the <emphasis>Structure</emphasis> attribute is
2206      <emphasis>Local Number</emphasis> the term is treated as
2207      native Zebra Record Identifier.
2208     </para>
2209
2210     <para>
2211      If the <emphasis>Relation</emphasis> attribute is
2212      <emphasis>Equals</emphasis> (default), the term is matched
2213      in a normal fashion (modulo truncation and processing of
2214      individual words, if required).
2215      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
2216      <emphasis>Less Than or Equal</emphasis>,
2217      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
2218       Equal</emphasis>, the term is assumed to be numerical, and a
2219      standard regular expression is constructed to match the given
2220      expression.
2221      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
2222      the standard natural-language query processor is invoked.
2223     </para>
2224
2225     <para>
2226      For the <emphasis>Truncation</emphasis> attribute,
2227      <emphasis>No Truncation</emphasis> is the default.
2228      <emphasis>Left Truncation</emphasis> is not supported.
2229      <emphasis>Process # in search term</emphasis> is supported, as is
2230      <emphasis>Regxp-1</emphasis>.
2231      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
2232      search. As a default, a single error (deletion, insertion,
2233      replacement) is accepted when terms are matched against the register
2234      contents.
2235     </para>
2236
2237      </sect3>
2238    </sect2>
2239
2240    <sect2  id="querymodel-regular">
2241     <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
2242
2243     <para>
2244      Each term in a query is interpreted as a regular expression if
2245      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
2246      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
2247      Both query types follow the same syntax with the operands:
2248     </para>
2249
2250      <table id="querymodel-regular-operands-table"
2251       frame="all" rowsep="1" colsep="1" align="center">
2252
2253       <caption>Regular Expression Operands</caption>
2254        <!--
2255        <thead>
2256        <tr><td>one</td><td>two</td></tr>
2257       </thead>
2258        -->
2259        <tbody>
2260         <tr>
2261          <td><literal>x</literal></td>
2262          <td>Matches the character <literal>x</literal>.</td>
2263         </tr>
2264         <tr>
2265          <td><literal>.</literal></td>
2266          <td>Matches any character.</td>
2267         </tr>
2268         <tr>
2269          <td><literal>[ .. ]</literal></td>
2270          <td>Matches the set of characters specified;
2271          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
2272         </tr>
2273        </tbody>
2274       </table>
2275
2276     <para>
2277      The above operands can be combined with the following operators:
2278     </para>
2279
2280      <table id="querymodel-regular-operators-table"
2281       frame="all" rowsep="1" colsep="1" align="center">
2282       <caption>Regular Expression Operators</caption>
2283        <!--
2284        <thead>
2285        <tr><td>one</td><td>two</td></tr>
2286       </thead>
2287        -->
2288        <tbody>
2289         <tr>
2290          <td><literal>x*</literal></td>
2291          <td>Matches <literal>x</literal> zero or more times.
2292           Priority: high.</td>
2293         </tr>
2294         <tr>
2295          <td><literal>x+</literal></td>
2296          <td>Matches <literal>x</literal> one or more times.
2297           Priority: high.</td>
2298         </tr>
2299         <tr>
2300          <td><literal>x?</literal></td>
2301          <td> Matches <literal>x</literal> zero or once.
2302           Priority: high.</td>
2303         </tr>
2304         <tr>
2305          <td><literal>xy</literal></td>
2306          <td> Matches <literal>x</literal>, then <literal>y</literal>.
2307          Priority: medium.</td>
2308         </tr>
2309         <tr>
2310          <td><literal>x|y</literal></td>
2311          <td> Matches either <literal>x</literal> or <literal>y</literal>.
2312          Priority: low.</td>
2313         </tr>
2314         <tr>
2315          <td><literal>( )</literal></td>
2316          <td>The order of evaluation may be changed by using parentheses.</td>
2317         </tr>
2318        </tbody>
2319       </table>
2320
2321     <para>
2322      If the first character of the <literal>Regxp-2</literal> query
2323      is a plus character (<literal>+</literal>) it marks the
2324      beginning of a section with non-standard specifiers.
2325      The next plus character marks the end of the section.
2326      Currently Zebra only supports one specifier, the error tolerance,
2327      which consists one digit.
2328     </para>
2329
2330     <para>
2331      Since the plus operator is normally a suffix operator the addition to
2332      the query syntax doesn't violate the syntax for standard regular
2333      expressions.
2334     </para>
2335
2336     <para>
2337      For example, a phrase search with regular expressions  in
2338      the title-register is performed like this:
2339      <screen>
2340       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
2341      </screen>
2342     </para>
2343
2344     <para>
2345      Combinations with other attributes are possible. For example, a
2346      ranked search with a regular expression:
2347      <screen>
2348       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
2349      </screen>
2350     </para>
2351    </sect2>
2352
2353
2354    <!--
2355    <para>
2356     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
2357     the <literal>-t</literal> option to the indexer tells Zebra how to
2358     process input records.
2359     Two basic types of processing are available - raw text and structured
2360     data. Raw text is just that, and it is selected by providing the
2361     argument <literal>text</literal> to Zebra. Structured records are
2362     all handled internally using the basic mechanisms described in the
2363     subsequent sections.
2364     Zebra can read structured records in many different formats.
2365    </para>
2366    -->
2367   </sect1>
2368
2369
2370   <sect1 id="querymodel-cql-to-pqf">
2371    <title>Server Side CQL to PQF Query Translation</title>
2372    <para>
2373     Using the
2374     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
2375       YAZ Frontend Virtual
2376     Hosts option, one can configure
2377     the YAZ Frontend CQL-to-PQF
2378     converter, specifying the interpretation of various
2379     <ulink url="&url.cql;">CQL</ulink>
2380     indexes, relations, etc. in terms of Type-1 query attributes.
2381     <!-- The  yaz-client config file -->
2382    </para>
2383    <para>
2384     For example, using server-side CQL-to-PQF conversion, one might
2385     query a zebra server like this:
2386     <screen>
2387     <![CDATA[
2388      yaz-client localhost:9999
2389      Z> querytype cql
2390      Z> find text=(plant and soil)
2391      ]]>
2392     </screen>
2393      and - if properly configured - even static relevance ranking can
2394      be performed using CQL query syntax:
2395     <screen>
2396     <![CDATA[
2397      Z> find text = /relevant (plant and soil)
2398      ]]>
2399      </screen>
2400    </para>
2401
2402    <para>
2403     By the way, the same configuration can be used to
2404     search using client-side CQL-to-PQF conversion:
2405     (the only difference is <literal>querytype cql2rpn</literal>
2406     instead of
2407     <literal>querytype cql</literal>, and the call specifying a local
2408     conversion file)
2409     <screen>
2410     <![CDATA[
2411      yaz-client -q local/cql2pqf.txt localhost:9999
2412      Z> querytype cql2rpn
2413      Z> find text=(plant and soil)
2414      ]]>
2415      </screen>
2416    </para>
2417
2418    <para>
2419     Exhaustive information can be found in the
2420     Section "Specification of CQL to RPN mappings" in the YAZ manual.
2421     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
2422      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
2423    and shall therefore not be repeated here.
2424    </para>
2425   <!--
2426   <para>
2427     See
2428       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
2429       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
2430     for the Maintenance Agency's work-in-progress mapping of Dublin Core
2431     indexes to Attribute Architecture (util, XD and BIB-2)
2432     attributes.
2433    </para>
2434    -->
2435  </sect1>
2436
2437
2438
2439 </chapter>
2440
2441  <!-- Keep this comment at the end of the file
2442  Local variables:
2443  mode: sgml
2444  sgml-omittag:t
2445  sgml-shorttag:t
2446  sgml-minimize-attributes:nil
2447  sgml-always-quote-attributes:t
2448  sgml-indent-step:1
2449  sgml-indent-data:t
2450  sgml-parent-document: "zebra.xml"
2451  sgml-local-catalogs: nil
2452  sgml-namecase-general:t
2453  End:
2454  -->