doc/querymodel.xml

   1  <chapter id="querymodel">
   2   <!-- $Id: querymodel.xml,v 1.10 2006-06-21 13:32:33 marc Exp $ -->
   3   <title>Query Model</title>
   4
   5   <sect1 id="querymodel-overview">
   6    <title>Query Model Overview</title>
   7
   8
   9    <sect2 id="querymodel-query-languages">
  10     <title>Query Languages</title>
  11
  12     <para>
  13      Zebra is born as a networking Information Retrieval engine adhering
  14      to the international standards
  15      <ulink url="&url.z39.50;">Z39.50</ulink> and
  16      <ulink url="&url.sru;">SRU</ulink>,
  17      and implement the
  18      <literal>type-1 Reverse Polish Notation (RPN)</literal> query
  19      model defined there.
  20      Unfortunately, this model has only defined a binary
  21      encoded representation, which is used as transport packaging in
  22      the Z39.50 protocol layer. This representation is not human
  23      readable, nor defines any convenient way to specify queries.
  24     </para>
  25     <para>
  26      Since the <literal>type-1 (RPN)</literal>
  27      query structure has no direct, useful string
  28      representation, every origin application needs to provide some
  29      form of mapping from a local query notation or representation to it.
  30      </para>
  31
  32
  33    <sect3 id="querymodel-query-languages-pqf">
  34     <title>Prefix Query Format (PQF)</title>
  35
  36    <para>
  37      Index Data has defined a textual representaion in the
  38      <literal>Prefix Query Format</literal>, short
  39      <literal>PQF</literal>, which mappes
  40       <literal>one-to-one</literal> to binary encoded
  41       <literal>type-1 RPN</literal> query packages.
  42       It has been adopted by other
  43       parties developing Z39.50 software, and is often referred to as
  44      <literal>Prefix Query Notation</literal>, or in short
  45      <literal>PQN</literal>. See
  46      <xref linkend="querymodel-pqf"/> for further explanaitions and
  47      descriptions of Zebra's capabilities.
  48     </para>
  49    </sect3>
  50
  51    <sect3 id="querymodel-query-languages-cql">
  52     <title>Common Query Language (CQL)</title>
  53      <para>
  54       The query model of the   <literal>type-1 RPN</literal>,
  55       expressed in <literal>PQF/PQN</literal> is natively supported.
  56       On the other hand, the default <literal>SRU</literal>
  57       webservices <literal>Common Query Language</literal>
  58      <ulink url="&url.cql;">CQL</ulink> is not natively supported.
  59      </para>
  60      <para>
  61      Zebra can be configured to understand and map CQL to PQF. See
  62      <xref linkend="querymodel-cql-to-pqf"/>.
  63     </para>
  64    </sect3>
  65
  66    </sect2>
  67
  68    <sect2 id="querymodel-operation-types">
  69     <title>Operation types</title>
  70     <para>
  71      Zebra supports all of the three different
  72      <literal>Z39.50/SRU</literal> operations defined in the
  73      standards: <literal>explain</literal>, <literal>search</literal>,
  74      and <literal>scan</literal>. A short description of the
  75      functionality and purpose of each is quite in order here.
  76     </para>
  77
  78     <sect3 id="querymodel-operation-type-explain">
  79      <title>Explain Operation</title>
  80      <para>
  81       The <emphasis>syntax</emphasis> of Z39.50/SRU queries is
  82       well known to any client, but the specific
  83       <emphasis>semantics</emphasis> - taking into account a
  84       particular servers functionalities and abilities - must be
  85       discovered from case to case. Enters the
  86       <literal>explain</literal> operation, which provides the means
  87       for learning which
  88       <emphasis>fields</emphasis> (also called
  89       <emphasis>indexes</emphasis> or <emphasis>access points</emphasis>
  90       are provided, which default parameter the server uses, which
  91       retrieve document formats are defined, and which specific parts
  92       of the general query model are supported.
  93      </para>
  94      <para>
  95       The Z39.50 embeddes the <literal>explain</literal> operation
  96       by perfoming a
  97       <literal>search</literal> in the magic
  98       <literal>IR-Explain-1</literal> database;
  99       see <xref linkend="querymodel-exp1"/>.
 100      </para>
 101      <para>
 102       In SRU, <literal>explain</literal> is an entirely  seperate
 103       operation, which returns an  <literal>Zeerex
 104       XML</literal> record according to the
 105       structure defined by the protocol.
 106      </para>
 107      <para>
 108       In both cases, the information gathered through
 109       <literal>explain</literal> operations can be used to
 110       auto-configure a client user interface to the servers
 111       capabilities.
 112      </para>
 113     </sect3>
 114
 115     <sect3 id="querymodel-operation-type-search">
 116      <title>Search Operation</title>
 117      <para>
 118       Search and retrieve interactions are the raison d'être.
 119       They are used to query the remote database and
 120       return search result documents.  Search queries span from
 121       simple free text searches to nested complex boolean queries,
 122       targeting specific indexes, and possibly enhanced with many
 123       query semantic specifications. Search interactions are the heart
 124       and soul of Z39.50/SRU servers.
 125      </para>
 126     </sect3>
 127
 128     <sect3 id="querymodel-operation-type-scan">
 129      <title>Scan Operation</title>
 130      <para>
 131       The <literal>scan</literal> operation is a helper functionality,
 132        which operates on one index or access point a time.
 133      </para>
 134      <para>
 135       It provides
 136       the means to investigate the content of specific indexes.
 137       Scanning an index returns a handfull of terms actually fond in
 138       the indexes, and in addition the <literal>scan</literal>
 139       operation returns th enumber of documents indexed by each term.
 140       A search client can use this information to propose proper
 141       spelling of search terms, to auto-fill search boxes, or to
 142       display  controlled vocabularies.
 143      </para>
 144     </sect3>
 145
 146    </sect2>
 147
 148  </sect1>
 149
 150
 151   <sect1 id="querymodel-pqf">
 152    <title>Prefix Query Format structure and syntax</title>
 153    <para>
 154     The <ulink url="&url.yaz.pqf;">PQF grammer</ulink>
 155     is documented in the YAZ manual, and shall not be
 156     repeated here. This textual PQF representation
 157     is always during search mapped to the equivalent Zebra internal
 158     query parse tree.
 159    </para>
 160
 161    <sect2 id="querymodel-pqf-tree">
 162     <title>PQF tree structure</title>
 163     <para>
 164      The PQF parse tree - or the equivalent textual representation -
 165      may start with one specification of the
 166      <emphasis>attribute set</emphasis> used. Following is a query
 167      tree, which
 168      consists of <emphasis>atomic query parts (APT)</emphasis> or
 169      <emphasis>named result sets</emphasis>, eventually
 170      paired by <emphasis>boolean binary operators</emphasis>, and
 171      finally  <emphasis>recursively combined </emphasis> into
 172      complex query trees.
 173     </para>
 174
 175     <sect3 id="querymodel-attribute-sets">
 176      <title>Attribute sets</title>
 177      <para>
 178       Attribute sets define the exact meaning and semantics of queries
 179       issued. Zebra comes with some predefined attribute set
 180       definitions, others can easily be defined and added to the
 181       configuration.
 182      </para>
 183
 184
 185      <table id="querymodel-attribute-sets-table"
 186       frame="all" rowsep="1" colsep="1" align="center">
 187
 188       <caption>Attribute sets predefined in Zebra</caption>
 189
 190        <thead>
 191        <tr>
 192          <td>Attribute set</td>
 193          <td>Short hand</td>
 194          <td>Status</td>
 195          <td>Notes</td>
 196         </tr>
 197       </thead>
 198
 199        <tbody>
 200         <tr>
 201          <td><literal>Explain</literal></td>
 202          <td><literal>exp-1</literal></td>
 203          <td>Special attribute set used on the special automagic
 204           <literal>IR-Explain-1</literal> database to gain information on
 205           server capabilities, database names, and database
 206           and semantics.</td>
 207          <td>predefined</td>
 208         </tr>
 209         <tr>
 210          <td><literal>Bib1</literal></td>
 211          <td><literal>bib-1</literal></td>
 212          <td>Standard PQF query language attribute set which defines the
 213           semantics of Z39.50 searching. In addition, all of the
 214           non-use attributes (type 2-9) define the hard-wired
 215           Zebra internal query
 216           processing.</td>
 217          <td>default</td>
 218         </tr>
 219         <tr>
 220          <td><literal>GILS</literal></td>
 221          <td><literal>gils</literal></td>
 222          <td>Extention to the <literal>Bib1</literal> attribute set.</td>
 223          <td>predefined</td>
 224         </tr>
 225         <tr>
 226          <td><literal>IDXPATH</literal></td>
 227          <td><literal>idxpath</literal></td>
 228          <td>Hardwired XPATH like attribute set, only available for
 229              indexing with the GRS record model</td>
 230          <td>depreciated</td>
 231         </tr>
 232        </tbody>
 233      </table>
 234     </sect3>
 235
 236     <para>
 237      The use attributes (type 1) of the predefined attribute sets can
 238      be reconfigured by  tweaking the files
 239      <filename>tab/*.att</filename>.
 240      New attribute sets can be defined by adding similar files in the
 241      configuration path of the server.
 242     </para>
 243
 244     <note>
 245      The Zebra internal query processing is modeled after
 246      the <literal>Bib1</literal> attribute set, and the non-use
 247      attributes type 2-6 are hard-wired in. It is therefore essential
 248      to be familiar with <xref linkend="querymodel-bib1-nonuse"/>.
 249     </note>
 250
 251
 252     <sect3 id="querymodel-boolean-operators">
 253      <title>Boolean operators</title>
 254      <para>
 255       A pair of subquery trees, or of atomic queries, is combined
 256       using the standard boolean operators into new query trees.
 257      </para>
 258
 259      <table id="querymodel-boolean-operators-table"
 260       frame="all" rowsep="1" colsep="1" align="center">
 261
 262       <caption>Boolean operators</caption>
 263        <!--
 264        <thead>
 265        <tr><td>one</td><td>two</td></tr>
 266       </thead>
 267        -->
 268        <tbody>
 269         <tr><td><literal>@and</literal></td>
 270          <td>binary <literal>AND</literal> operator</td>
 271          <td>Set intersection of two atomic queries hit sets</td>
 272         </tr>
 273         <tr><td><literal>@or</literal></td>
 274          <td>binary <literal>OR</literal> operator</td>
 275          <td>Set union of two atomic queries hit sets</td>
 276         </tr>
 277         <tr><td><literal>@not</literal></td>
 278          <td>binary <literal>AND NOT</literal> operator</td>
 279          <td>Set complement of two atomic queries hit sets</td>
 280         </tr>
 281         <tr><td><literal>@prox</literal></td>
 282          <td>binary <literal>PROXIMY</literal> operator</td>
 283          <td>Set intersection of two atomic queries hit sets. In
 284           addition, the intersection set is purged for all
 285           documents which do not satisfy the requested query
 286           term proximity. Usually a proper subset of the AND
 287           operation.</td>
 288         </tr>
 289        </tbody>
 290      </table>
 291
 292      <para>
 293       For example, we can combine the terms
 294       <emphasis>information</emphasis> and <emphasis>retrieval</emphasis>
 295       into different searches in the default index of the default
 296       attribute set as follows.
 297       Querying for the union of all documents containing the
 298       terms <emphasis>information</emphasis> OR
 299       <emphasis>retrieval</emphasis>:
 300       <screen>
 301        Z> find @or information retrieval
 302       </screen>
 303      </para>
 304      <para>
 305       Querying for the intersection of all documents containing the
 306       terms <emphasis>information</emphasis> AND
 307       <emphasis>retrieval</emphasis>:
 308       The hit set is a subset of the coresponding
 309       OR query.
 310       <screen>
 311        Z> find @and information retrieval
 312       </screen>
 313      </para>
 314      <para>
 315       Querying for the intersection of all documents containing the
 316       terms <emphasis>information</emphasis> AND
 317       <emphasis>retrieval</emphasis>, taking proximity into account:
 318       The hit set is a subset of the coresponding
 319       AND query.
 320       <screen>
 321        Z> find @prox information retrieval
 322       </screen>
 323      </para>
 324      <para>
 325       Querying for the intersection of all documents containing the
 326       terms <emphasis>information</emphasis> AND
 327       <emphasis>retrieval</emphasis>, in the same order and near each
 328       other as described in the term list
 329       The hit set is a subset of the coresponding
 330       PROXIMY query.
 331       <screen>
 332        Z> find "information retrieval"
 333       </screen>
 334      </para>
 335     </sect3>
 336
 337
 338     <sect3 id="querymodel-atomic-queries">
 339      <title>Atomic queries (APT)</title>
 340      <para>
 341       Atomic queries are the query parts which work on one acess point
 342       only. These consist of <literal>an attribute list</literal>
 343       followed by a <literal>single term</literal> or a
 344       <literal>quoted term list</literal>, and are often called
 345       <emphasis>Attributes-Plus-Terms (APT)</emphasis> queries.
 346      </para>
 347      <para>
 348       Unsupplied non-use attributes type 2-9 are either inherited from
 349       higher nodes in the query tree, or are set to Zebra's default values.
 350       See <xref linkend="querymodel-bib1"/> for details.
 351      </para>
 352
 353      <table id="querymodel-atomic-queries-table"
 354       frame="all" rowsep="1" colsep="1" align="center">
 355
 356       <caption>Atomic queries</caption>
 357        <!--
 358        <thead>
 359        <tr><td>one</td><td>two</td></tr>
 360       </thead>
 361        -->
 362        <tbody>
 363         <tr><td><emphasis>attribute list</emphasis></td>
 364          <td>List of <literal>orthogonal</literal> attributes</td>
 365          <td>Any of the orthogonal attribute types may be omitted,
 366           these are inherited from higher query tree nodes, or if not
 367           inherited, are set to the default Zebra configuration values.
 368          </td>
 369         </tr>
 370         <tr><td><emphasis>term</emphasis></td>
 371          <td>single <literal>term</literal>
 372           or <literal>quoted term list</literal>   </td>
 373          <td>Here the search terms or list of search terms is added
 374           to the query</td>
 375         </tr>
 376        </tbody>
 377      </table>
 378      <para>
 379       Querying for the term <emphasis>information</emphasis> in the
 380       default index using the default attribite set, the server choice
 381       of access point/index, and the default non-use attributes.
 382       <screen>
 383        Z> find "information"
 384       </screen>
 385      </para>
 386      <para>
 387       Equivalent query fully specified including all default values:
 388       <screen>
 389        Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 "information"
 390       </screen>
 391      </para>
 392
 393      <para>
 394       Finding all documents which have empty titles. Notice that the
 395       empty term must be quoted, but is otherwise legal.
 396       <screen>
 397        Z> find @attr 1=4 ""
 398       </screen>
 399      </para>
 400
 401     </sect3>
 402
 403
 404     <sect3 id="querymodel-resultset">
 405      <title>Named Result Sets</title>
 406      <para>
 407       Named result sets are supported in Zebra, and result sets can be
 408       used as operands without limitations.
 409      </para>
 410      <para>
 411       After the execution of a search, the result set is available at
 412       the server, such that the client can use it for subsequent
 413       searches or retrieval requests. The Z30.50 standard actually
 414       stresses the fact that result sets are voliatile. It may cease
 415       to exist at any time point after search, and the server will
 416       send a diagnostic to the effect that the requested
 417       result set does not exist any more.
 418      </para>
 419
 420      <para>
 421       Defining a named result set and re-using it in the next query,
 422       using <literal>yaz-client</literal>.
 423       <screen>
 424        Z> f @attr 1=4 mozart
 425        ...
 426        Number of hits: 43, setno 1
 427        ...
 428        Z> f @and @set 1 @attr 1=4 amadeus
 429        ...
 430        Number of hits: 14, setno 2
 431        ...
 432        Z> f @attr 1=1016 beethoven
 433        ...
 434        Number of hits: 26, setno 3
 435        ...
 436       </screen>
 437      </para>
 438
 439      <note>
 440       Named result sets are only supported by the Z39.50 protocol.
 441       The SRU web service is stateless, and therefore the notion of
 442       named result sets does not exist when acessing a Zebra server by
 443       the SRU protocol.
 444      </note>
 445     </sect3>
 446
 447
 448     <sect3 id="querymodel-use-string">
 449      <title>Zebra's special use attribute type 1 of form 'string'</title>
 450      <para>
 451       The numeric <literal>use (type 1)</literal> attribute is usually
 452       refered to from a given
 453       attribute set. In addition, Zebra let you use
 454       <emphasis>any internal index
 455        name defined in your configuration</emphasis>
 456       as use atribute value. This is a great feature for
 457       debugging, and when you do
 458       not need the complecity of defined use attribute values. It is
 459       the preferred way of accessing Zebra indexes directly.
 460      </para>
 461      <para>
 462       Finding all documents which have the term list "information
 463       retrieval" in an Zebra index, using it's internal full string name.
 464       <screen>
 465        Z> find @attr 1=sometext "information retrieval"
 466       </screen>
 467      </para>
 468      <para>
 469       Searching the bib-1 use attribute 54 using it's string name:
 470       <screen>
 471        Z> find @attr 1=Code-language eng
 472       </screen>
 473      </para>
 474      <para>
 475       Searching in any silly string index - if it's defined in your
 476       indexation rules and can be parsed by the PQF parser.
 477       This is definitely not the recommended use of
 478       this facility, as it might confuse your users with some very
 479       unexpected results.
 480       <screen>
 481        Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
 482       </screen>
 483      </para>
 484      <para>
 485       See <xref linkend="querymodel-bib1-mapping"/> for details, and
 486       <xref linkend="server-sru"/>
 487       for the SRU PQF query extention using string names as a fast
 488       debugging facility.
 489      </para>
 490     </sect3>
 491
 492     <sect3 id="querymodel-use-xpath">
 493      <title>Zebra's special use attribute type 1 of form 'XPath'
 494       for GRS filters</title>
 495      <para>
 496       As we have seen above, it is possible (albeit seldom a great
 497       idea) to emulate
 498       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink> based
 499       search by defining <literal>use (type 1)</literal>
 500       <emphasis>string</emphasis> attributes which in appearence
 501       <emphasis>resemble XPath queries</emphasis>. There are two
 502       problems with this approach: first, the XPath-look-alike has to
 503       be defined at indexation time, no new undefined
 504       XPath queries can entered at search time, and second, it might
 505       confuse users very much that an XPath-alike index name in fact
 506       gets populated from a possible entirely different XML element
 507       than it pretends to acess.
 508      </para>
 509      <para>
 510       When using the <literal>GRS Record Model</literal>
 511       (see  <xref linkend="record-model-grs"/>), we have the
 512       possibility to embed <emphasis>life</emphasis>
 513       XPath expressions
 514       in the PQF queries, which are here called
 515       <literal>use (type 1)</literal> <emphasis>xpath</emphasis>
 516       attributes. You must enable the
 517       <literal>xpath enable</literal> directive in your
 518       <literal>.abs</literal> config files.
 519      </para>
 520      <note>
 521       Only a <emphasis>very</emphasis> restricted subset of the
 522       <ulink url="http://www.w3.org/TR/xpath">XPath 1.0</ulink>
 523       standard is supported as the GRS record model is simpler than
 524       a full XML DOM structure. See the following examples for
 525       possibilities.
 526      </note>
 527      <para>
 528       Finding all documents which have the term "content"
 529       inside a text node found in a specific XML DOM
 530       <emphasis>subtree</emphasis>, whose starting element is
 531       adressed by XPath.
 532       <screen>
 533        Z> find @attr 1=/root content
 534        Z> find @attr 1=/root/first content
 535       </screen>
 536       <emphasis>Notice that the
 537        XPath must be absolute, i.e., must start with '/', and that the
 538        XPath <literal>decendant-or-self</literal> axis followed by a
 539        text node selection <literal>text()</literal> is implicitly
 540        appended to the stated XPath.
 541       </emphasis>
 542       It follows that the above searches are interpreted as:
 543       <screen>
 544        Z> find @attr 1=/root//text() content
 545        Z> find @attr 1=/root/first//text() content
 546       </screen>
 547      </para>
 548
 549      <para>
 550       Filter the adressing XPath by a predicate working on exact
 551       string values in
 552       attributes (in the XML sense) can be done: return all those docs which
 553       have the term "english" contained in one of all text subnodes of
 554       the subtree defined by the XPath
 555       <literal>/record/title[@lang='en']</literal>
 556       <screen>
 557        Z> find @attr 1=/record/title[@lang='en'] english
 558       </screen>
 559      </para>
 560
 561      <para>
 562       Combining numeric indexes, boolean expressions,
 563       and xpath based searches is possible:
 564       <screen>
 565        Z> find @attr 1=/record/title @and foo bar
 566        Z> find @and @attr 1=/record/title foo @attr 1=4 bar
 567       </screen>
 568      </para>
 569      <para>
 570       Escaping PQF keywords and other non-parseable XPath constructs
 571       with <literal>'{ }'</literal> to prevent syntax errors:
 572       <screen>
 573        Z> find @attr {1=/root/first[@attr='danish']} content
 574        Z> find @attr {1=/root/second[@attr='danish lake']}
 575        Z> find @attr {1=/root/third[@attr='dansk s\xc3\xb8']}
 576       </screen>
 577      </para>
 578      <warning>
 579       It is worth mentioning that these dynamic performed XPath
 580       queries are a performance bottelneck, as no optimized
 581       specialized indexes can be used. Therefore, avoid the use of
 582       this facility when speed is essential, and the database content
 583       size is medium to large.
 584      </warning>
 585
 586     </sect3>
 587
 588    </sect2>
 589
 590    <sect2 id="querymodel-exp1">
 591     <title>Explain Attribute Set</title>
 592     <para>
 593      The Z39.50 standard defines the
 594      <ulink url="&url.z39.50.explain;">Explain</ulink>attribute set
 595      <literal>exp-1</literal>, which is used to discover information
 596      about a server's search semantics and functional capabilities
 597      Zebra exposes a  "classic"
 598      Explain database by base name <literal>IR-Explain-1</literal>, which
 599      is populated with system internal information.
 600     </para>
 601    <para>
 602      The attribute-set <literal>exp-1</literal> consists of a single
 603      <literal>Use (type 1)</literal> attribute.
 604     </para>
 605     <para>
 606      In addition, the non-Use
 607      <literal>bib-1</literal> attributes, that is, the types
 608      <literal>Relation</literal>, <literal>Position</literal>,
 609      <literal>Structure</literal>, <literal>Truncation</literal>,
 610      and <literal>Completeness</literal> are imported from
 611      the <literal>bib-1</literal> attribute set, and may be used
 612      within any explain query.
 613     </para>
 614
 615     <sect3 id="querymodel-exp1-use">
 616     <title>Use Attributes (type = 1)</title>
 617      <para>
 618       The following Explain search atributes are supported:
 619       <literal>ExplainCategory</literal> (@attr 1=1),
 620       <literal>DatabaseName</literal> (@attr 1=3),
 621       <literal>DateAdded</literal> (@attr 1=9),
 622       <literal>DateChanged</literal>(@attr 1=10).
 623      </para>
 624      <para>
 625       A search in the use attribute  <literal>ExplainCategory</literal>
 626       supports only these predefined values:
 627       <literal>CategoryList</literal>, <literal>TargetInfo</literal>,
 628       <literal>DatabaseInfo</literal>, <literal>AttributeDetails</literal>.
 629      </para>
 630      <para>
 631       See <filename>tab/explain.att</filename> and the
 632       <ulink url="&url.z39.50;">Z39.50</ulink> standard
 633       for more information.
 634      </para>
 635     </sect3>
 636
 637     <sect3>
 638      <title>Explain searches with yaz-client</title>
 639      <para>
 640       Classic Explain only defines retrieval of Explain information
 641       via ASN.1. Pratically no Z39.50 clients supports this. Fortunately
 642       they don't have to - Zebra allows retrieval of this information
 643       in other formats:
 644       <literal>SUTRS</literal>, <literal>XML</literal>,
 645       <literal>GRS-1</literal> and  <literal>ASN.1</literal> Explain.
 646      </para>
 647
 648      <para>
 649       List supported categories to find out which explain commands are
 650       supported:
 651       <screen>
 652        Z> base IR-Explain-1
 653        Z> find @attr exp1 1=1 categorylist
 654        Z> form sutrs
 655        Z> show 1+2
 656       </screen>
 657      </para>
 658
 659      <para>
 660       Get target info, that is, investigate which databases exist at
 661       this server endpoint:
 662       <screen>
 663        Z> base IR-Explain-1
 664        Z> find @attr exp1 1=1 targetinfo
 665        Z> form xml
 666        Z> show 1+1
 667        Z> form grs-1
 668        Z> show 1+1
 669        Z> form sutrs
 670        Z> show 1+1
 671       </screen>
 672      </para>
 673
 674      <para>
 675       List all supported databases, the number of hits
 676       is the number of databases found, which most commonly are the
 677       following two:
 678       the <literal>Default</literal> and the
 679       <literal>IR-Explain-1</literal> databases.
 680       <screen>
 681        Z> base IR-Explain-1
 682        Z> find @attr exp1 1=1 databaseinfo
 683        Z> form sutrs
 684        Z> show 1+2
 685       </screen>
 686      </para>
 687
 688      <para>
 689       Get database info record for database <literal>Default</literal>.
 690       <screen>
 691        Z> base IR-Explain-1
 692        Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
 693       </screen>
 694       Identical query with explicitly specified attribute set:
 695       <screen>
 696        Z> base IR-Explain-1
 697        Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
 698       </screen>
 699      </para>
 700
 701      <para>
 702       Get attribute details record for database
 703       <literal>Default</literal>.
 704       This query is very useful to study the internal Zebra indexes.
 705       If records have been indexed using the <literal>alvis</literal>
 706       XSLT filter, the string representation names of the known indexes can be
 707       found.
 708       <screen>
 709        Z> base IR-Explain-1
 710        Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
 711       </screen>
 712       Identical query with explicitly specified attribute set:
 713       <screen>
 714        Z> base IR-Explain-1
 715        Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
 716       </screen>
 717      </para>
 718     </sect3>
 719
 720    </sect2>
 721
 722    <sect2 id="querymodel-bib1">
 723     <title>Bib1 Attribute Set</title>
 724     <para>
 725      Most of the information contained in this section is an excerpt of
 726      the <literal>ATTRIBUTE SET BIB-1 (Z39.50-1995)
 727       SEMANTICS</literal>,
 728      found at  <ulink url="&url.z39.50.attset.bib1.1995;">. The BIB-1
 729       Attribute Set Semantics</ulink> from 1995, also in an updated
 730      <ulink url="&url.z39.50.attset.bib1;">Bib-1
 731       Attribute Set</ulink>
 732      version from 2003. Index Data is not the copyright holder of this
 733      information, except for the configuration details, the listing of
 734      Zebra's capabilities, and the example queries.
 735     </para>
 736
 737
 738    <sect3 id="querymodel-bib1-use">
 739      <title>Use Attributes (type 1)</title>
 740
 741     <para>
 742      A use attribute specifies an access point for any atomic query.
 743      These acess points are highly dependent on the attribute set used
 744      in the query, and are user configurable using the following
 745      default configuration files:
 746      <filename>tab/bib1.att</filename>,
 747      <filename>tab/dan1.att</filename>,
 748      <filename>tab/explain.att</filename>, and
 749      <filename>tab/gils.att</filename>.
 750      New attribute sets can be added by adding new
 751      <filename>tab/*.att</filename> configuration files, which need to
 752      be sourced in the main configuration <filename>zebra.cfg</filename>.
 753      </para>
 754
 755     <para>
 756      In addition, Zebra allows the acess of
 757      <emphasis>internal index names</emphasis> and <emphasis>dynamic
 758      XPath</emphasis> as use attributes.
 759      See  <xref linkend="querymodel-use-string"/> and
 760      <xref linkend="querymodel-use-xpath"/> for
 761      alternative acess to the Zebra internal index names and XPath queries.
 762     </para>
 763
 764     <para>
 765      Phrase search for <emphasis>information retrieval</emphasis> in
 766      the title-register:
 767      <screen>
 768       Z> find @attr 1=4 "information retrieval"
 769      </screen>
 770     </para>
 771     </sect3>
 772
 773    </sect2>
 774
 775
 776    <sect2 id="querymodel-bib1-nonuse">
 777      <title>Zebra general Bib1 Non-Use Attributes (type 2-6)</title>
 778
 779     <sect3 id="querymodel-bib1-relation">
 780      <title>Relation Attributes (type 2)</title>
 781
 782      <para>
 783       Relation attributes describe the relationship of the access
 784       point (left side
 785       of the relation) to the search term as qualified by the attributes (right
 786       side of the relation), e.g., Date-publication &lt;= 1975.
 787       </para>
 788
 789      <table id="querymodel-bib1-relation-table"
 790       frame="all" rowsep="1" colsep="1" align="center">
 791
 792       <caption>Relation Attributes (type 2)</caption>
 793       <thead>
 794         <tr>
 795          <td>Relation</td>
 796          <td>Value</td>
 797          <td>Notes</td>
 798         </tr>
 799        </thead>
 800        <tbody>
 801         <tr>
 802          <td> Less than</td>
 803          <td>1</td>
 804          <td>supported</td>
 805         </tr>
 806         <tr>
 807          <td>Less than or equal</td>
 808          <td>2</td>
 809          <td>supported</td>
 810         </tr>
 811         <tr>
 812          <td>Equal</td>
 813          <td>3</td>
 814          <td>default</td>
 815         </tr>
 816         <tr>
 817          <td>Greater or equal</td>
 818          <td>4</td>
 819          <td>supported</td>
 820         </tr>
 821         <tr>
 822          <td>Greater than</td>
 823          <td>5</td>
 824          <td>supported</td>
 825         </tr>
 826         <tr>
 827          <td>Not equal</td>
 828          <td>6</td>
 829          <td>unsupported</td>
 830         </tr>
 831         <tr>
 832          <td>Phonetic</td>
 833          <td>100</td>
 834          <td>unsupported</td>
 835         </tr>
 836         <tr>
 837          <td>Stem</td>
 838          <td>101</td>
 839          <td>unsupported</td>
 840         </tr>
 841         <tr>
 842          <td>Relevance</td>
 843          <td>102</td>
 844          <td>supported</td>
 845         </tr>
 846         <tr>
 847          <td>AlwaysMatches</td>
 848          <td>103</td>
 849          <td>unsupported</td>
 850         </tr>
 851        </tbody>
 852      </table>
 853
 854      <para>
 855       The relation attribute
 856       <literal>relevance (102)</literal> is supported, see
 857       <xref linkend="administration-ranking"/> for full information.
 858       <!-- always-matches (103) not supported for all indexes -->
 859      </para>
 860
 861     <para>
 862      All ordering operations are based on a lexicographical ordering,
 863      <emphasis>expect</emphasis> when the
 864      <literal>structure attribute numeric (109)</literal> is used. In
 865      this case, ordering is numerical. See
 866       <xref linkend="querymodel-bib1-structure"/>.
 867     </para>
 868
 869      <para>
 870      Ranked search for <emphasis>information retrieval</emphasis> in
 871      the title-register:
 872      <screen>
 873       Z> find @attr 1=4 @attr 2=102 "information retrieval"
 874      </screen>
 875     </para>
 876     </sect3>
 877
 878     <sect3 id="querymodel-bib1-position">
 879      <title>Position Attributes (type 3)</title>
 880
 881      <para>
 882       The position attribute specifies the location of the search term
 883       within the field or subfield in which it appears.
 884      </para>
 885
 886      <table id="querymodel-bib1-position-table"
 887       frame="all" rowsep="1" colsep="1" align="center">
 888
 889       <caption>Position Attributes (type 3)</caption>
 890       <thead>
 891         <tr>
 892          <td>Position</td>
 893          <td>Value</td>
 894          <td>Notes</td>
 895         </tr>
 896        </thead>
 897        <tbody>
 898         <tr>
 899          <td>First in field </td>
 900          <td>1</td>
 901          <td>unsupported</td>
 902         </tr>
 903         <tr>
 904          <td>First in subfield</td>
 905          <td>2</td>
 906          <td>unsupported</td>
 907         </tr>
 908         <tr>
 909          <td>Any position in field</td>
 910          <td>3</td>
 911          <td>default</td>
 912         </tr>
 913        </tbody>
 914      </table>
 915
 916     <para>
 917       The position attribute values <literal>first in field (1)</literal>,
 918       and <literal>first in subfield(2)</literal> are unsupported.
 919       Using them does not trigger an error, but silent defaults to
 920       <literal>any position in field (3)</literal>.
 921       <!-- It should -->
 922       </para>
 923     </sect3>
 924
 925     <sect3 id="querymodel-bib1-structure">
 926      <title>Structure Attributes (type 4)</title>
 927
 928      <para>
 929       The structure attribute specifies the type of search
 930       term. This causes the search to be mapped on
 931       different Zebra internal indexes, which must have been defined
 932       at index time.
 933      </para>
 934
 935      <para>
 936       The possible values of the
 937       <literal>structure attribute (type 4)</literal> can be defined
 938       using the configuraiton file <filename>
 939       tab/default.idx</filename>.
 940       The default configuration is summerized in this table.
 941      </para>
 942
 943      <table id="querymodel-bib1-structure-table"
 944       frame="all" rowsep="1" colsep="1" align="center">
 945
 946       <caption>Structure Attributes (type 4)</caption>
 947       <thead>
 948         <tr>
 949          <td>Structure</td>
 950          <td>Value</td>
 951          <td>Notes</td>
 952         </tr>
 953        </thead>
 954        <tbody>
 955         <tr>
 956          <td>Phrase </td>
 957          <td>1</td>
 958          <td>default</td>
 959         </tr>
 960         <tr>
 961          <td>Word</td>
 962          <td>2</td>
 963          <td>supported</td>
 964         </tr>
 965         <tr>
 966          <td>Key</td>
 967          <td>3</td>
 968          <td>supported</td>
 969         </tr>
 970         <tr>
 971          <td>Year</td>
 972          <td>4</td>
 973          <td>supported</td>
 974         </tr>
 975         <tr>
 976          <td>Date (normalized)</td>
 977          <td>5</td>
 978          <td>supported</td>
 979         </tr>
 980         <tr>
 981          <td>Word list</td>
 982          <td>6</td>
 983          <td>supported</td>
 984         </tr>
 985         <tr>
 986          <td>Date (un-normalized)</td>
 987          <td>100</td>
 988          <td>unsupported</td>
 989         </tr>
 990         <tr>
 991          <td>Name (normalized) </td>
 992          <td>101</td>
 993          <td>unsupported</td>
 994         </tr>
 995         <tr>
 996          <td>Name (un-normalized) </td>
 997          <td>102</td>
 998          <td>unsupported</td>
 999         </tr>
1000         <tr>
1001          <td>Structure</td>
1002          <td>103</td>
1003          <td>unsupported</td>
1004         </tr>
1005         <tr>
1006          <td>Urx</td>
1007          <td>104</td>
1008          <td>supported</td>
1009         </tr>
1010         <tr>
1011          <td>Free-form-text</td>
1012          <td>105</td>
1013          <td>supported</td>
1014         </tr>
1015         <tr>
1016          <td>Document-text</td>
1017          <td>106</td>
1018          <td>supported</td>
1019         </tr>
1020         <tr>
1021          <td>Local-number</td>
1022          <td>107</td>
1023          <td>supported</td>
1024         </tr>
1025         <tr>
1026          <td>String</td>
1027          <td>108</td>
1028          <td>unsupported</td>
1029         </tr>
1030         <tr>
1031          <td>Numeric string</td>
1032          <td>109</td>
1033          <td>supported</td>
1034         </tr>
1035        </tbody>
1036      </table>
1037     </sect3>
1038
1039     <para>
1040      The structure attribute value <literal>local-number
1041       (107)</literal>
1042      is supported, and maps always to the Zebra internal document ID.
1043      </para>
1044
1045     <para>
1046      For example, in
1047      the GILS schema (<literal>gils.abs</literal>), the
1048      west-bounding-coordinate is indexed as type <literal>n</literal>,
1049      and is therefore searched by specifying
1050      <emphasis>structure</emphasis>=<emphasis>Numeric String</emphasis>.
1051      To match all those records with west-bounding-coordinate greater
1052      than -114 we use the following query:
1053      <screen>
1054       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
1055      </screen>
1056     </para>
1057
1058     <sect3 id="querymodel-bib1-truncation">
1059      <title>Truncation Attributes (type = 5)</title>
1060
1061      <para>
1062       The truncation attribute specifies whether variations of one or
1063       more characters are allowed between serch term and hit terms, or
1064       not. Using non-default truncation attributes will broaden the
1065       document hit set of a search query.
1066      </para>
1067
1068      <table id="querymodel-bib1-truncation-table"
1069       frame="all" rowsep="1" colsep="1" align="center">
1070
1071       <caption>Truncation Attributes (type 5)</caption>
1072       <thead>
1073         <tr>
1074          <td>Truncation</td>
1075          <td>Value</td>
1076          <td>Notes</td>
1077         </tr>
1078        </thead>
1079        <tbody>
1080         <tr>
1081          <td>Right truncation </td>
1082          <td>1</td>
1083          <td>supported</td>
1084         </tr>
1085         <tr>
1086          <td>Left truncation</td>
1087          <td>2</td>
1088          <td>supported</td>
1089         </tr>
1090         <tr>
1091          <td>Left and right truncation</td>
1092          <td>3</td>
1093          <td>supported</td>
1094         </tr>
1095         <tr>
1096          <td>Do not truncate</td>
1097          <td>100</td>
1098          <td>default</td>
1099         </tr>
1100         <tr>
1101          <td>Process # in search term</td>
1102          <td>101</td>
1103          <td>supported</td>
1104         </tr>
1105         <tr>
1106          <td>RegExpr-1 </td>
1107          <td>102</td>
1108          <td>supported</td>
1109         </tr>
1110         <tr>
1111          <td>RegExpr-2</td>
1112          <td>103</td>
1113          <td>supported</td>
1114         </tr>
1115        </tbody>
1116      </table>
1117
1118      <para>
1119       Truncation attribute value
1120       <literal>Process # in search term (100)</literal> is a
1121       poor-man's regular expression search. It maps
1122       each <literal>#</literal> to <literal>.*</literal>, and
1123       performes then a <literal>Regexp-1 (102)</literal> regular
1124       expression search.
1125      </para>
1126      <para>
1127       Truncation attribute value
1128        <literal>Regexp-1 (102)</literal> is a normal regular search,
1129       see.
1130      </para>
1131      <para>
1132        Truncation attribute value
1133       <literal>Regexp-2 (103) </literal> is a Zebra specific extention
1134       which allows <emphasis>fuzzy</emphasis> matches. One single
1135       error in spelling of search terms is allowed, i.e., a document
1136       is hit if it includes a term which can be mapped to the used
1137       search term by one character substitution, addition, deletion or
1138       change of posiiton.
1139       </para>
1140       <!--
1141       Special 104, 105, 106 are deprecated and will be removed! -->
1142     </sect3>
1143
1144     <sect3 id="querymodel-bib1-completeness">
1145     <title>Completeness Attributes (type = 6)</title>
1146      <para>
1147       This attribute is ONLY used if structure w, p is to be
1148       chosen. completeness is ignorned if not w, p is to be
1149       used..
1150       Incomplete field(1) is the default and makes Zebra use
1151       register type w.
1152       complete subfield(2) and complete field(3) both triggers
1153       search field type p.
1154      </para>
1155     </sect3>
1156    </sect2>
1157
1158
1159    <sect2 id="querymodel-zebra-attr-search">
1160     <title>Zebra specific Search Extentions to all Attribute Sets</title>
1161     <para>
1162      Zebra extends the Bib1 attribute types, and these extentions are
1163      recognized regardless of attribute
1164      set used in a <literal>search</literal> operation query.
1165     </para>
1166
1167      <table id="querymodel-zebra-attr-search-table"
1168       frame="all" rowsep="1" colsep="1" align="center">
1169
1170       <caption>Zebra Search Attribute Extentions</caption>
1171        <thead>
1172         <tr>
1173          <td>Name</td>
1174          <td>Value</td>
1175          <td>Operation</td>
1176          <td>Zebra version</td>
1177         </tr>
1178       </thead>
1179        <tbody>
1180         <tr>
1181          <td>Embedded Sort</td>
1182          <td>7</td>
1183          <td>search</td>
1184          <td>1.1</td>
1185         </tr>
1186         <tr>
1187          <td>Term Set</td>
1188          <td>8</td>
1189          <td>search</td>
1190          <td>1.1</td>
1191         </tr>
1192         <tr>
1193          <td>Rank Weight</td>
1194          <td>9</td>
1195          <td>search</td>
1196          <td>1.1</td>
1197         </tr>
1198         <tr>
1199          <td>Approx Limit</td>
1200          <td>9</td>
1201          <td>search</td>
1202          <td>1.4</td>
1203         </tr>
1204         <tr>
1205          <td>Term Reference</td>
1206          <td>10</td>
1207          <td>search</td>
1208          <td>1.4</td>
1209         </tr>
1210        </tbody>
1211       </table>
1212
1213     <sect3 id="querymodel-zebra-attr-sorting">
1214      <title>Zebra Extention Embedded Sort Attribute (type 7)</title>
1215     </sect3>
1216     <para>
1217      The embedded sort is a way to specify sort within a query - thus
1218      removing the need to send a Sort Request separately. It is both
1219      faster and does not require clients to deal with the Sort
1220      Facility.
1221     </para>
1222     <para>
1223      The possible values after attribute <literal>type 7</literal> are
1224      <literal>1</literal> ascending and
1225      <literal>2</literal> descending.
1226      The attributes+term (APT) node is separate from the
1227      rest and must be <literal>@or</literal>'ed.
1228      The term associated with APT is the sorting level in integers,
1229      where <literal>0</literal> means primary sort,
1230      <literal>1</literal> means secondary sort, and so forth.
1231      See also <xref linkend="administration-ranking"/>.
1232     </para>
1233     <para>
1234      For example, searching for water, sort by title (ascending)
1235      <screen>
1236       Z> find @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
1237      </screen>
1238     </para>
1239     <para>
1240      Or, searching for water, sort by title ascending, then date descending
1241      <screen>
1242       Z> find @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
1243      </screen>
1244     </para>
1245
1246     <sect3 id="querymodel-zebra-attr-estimation">
1247      <title>Zebra Extention Term Set Attribute (type 8)</title>
1248     </sect3>
1249     <para>
1250      The Term Set feature is a facility that allows a search to store
1251      hitting terms in a "pseudo" resultset; thus a search (as usual) +
1252      a scan-like facility. Requires a client that can do named result
1253      sets since the search generates two result sets. The value for
1254      attribute 8 is the name of a result set (string). The terms in
1255      the named term set are returned as SUTRS records.
1256     </para>
1257     <para>
1258      For example, searching  for u in title, right truncated, and
1259      storing the result in term set named 'aset'
1260      <screen>
1261       Z> find @attr 5=1 @attr 1=4 @attr 8=aset u
1262      </screen>
1263     </para>
1264     <warning>
1265      The model has one serious flaw: we don't know the size of term
1266      set. Experimental. Do not use in production code.
1267     </warning>
1268
1269     <sect3 id="querymodel-zebra-attr-weight">
1270      <title>Zebra Extention Rank Weight Attribute (type 9)</title>
1271     </sect3>
1272     <para>
1273      Rank weight is a way to pass a value to a ranking algorithm - so
1274      that one APT has one value - while another as a different one.
1275      See also <xref linkend="administration-ranking"/>.
1276     </para>
1277     <para>
1278      For example, searching  for utah in title with weight 30 as well
1279      as any with weight 20:
1280      <screen>
1281       Z> find @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
1282      </screen>
1283     </para>
1284
1285     <sect3 id="querymodel-zebra-attr-limit">
1286      <title>Zebra Extention Approximative Limit Attribute (type 9)</title>
1287     </sect3>
1288     <para>
1289      Newer Zebra versions normally estemiates hit count for every APT
1290      (leaf) in the query tree. These hit counts are returned as part of
1291      the searchResult-1 facility in the binary encoded Z39.50 search
1292      response packages.
1293     </para>
1294     <para>
1295      By setting a limit for the APT we can make Zebra turn into
1296      approximate hit count when a certain hit count limit is
1297      reached. A value of zero means exact hit count.
1298     </para>
1299     <para>
1300      For example, we might be intersted in exact hit count for a, but
1301      for b we allow hit count estimates for 1000 and higher.
1302      <screen>
1303       Z> find @and a @attr 9=1000 b
1304      </screen>
1305     </para>
1306     <note>
1307      The estimated hit count fascility makes searches faster, as one
1308      only needs to process large hit lists partially.
1309     </note>
1310     <warning>
1311      This facility clashes with rank weight, because there all
1312      documents in the hit lists need to be examined for scoring and
1313      re-sorting.
1314      It is an experimental
1315      extention. Do not use in production code.
1316     </warning>
1317
1318     <sect3 id="querymodel-zebra-attr-termref">
1319      <title>Zebra Extention Term Reference Attribute (type 10)</title>
1320     </sect3>
1321     <para>
1322      Zebra supports the <literal>searchResult-1</literal> facility.
1323      If the <literal>Term Reference Attribute (type 10)</literal> is
1324      given, that specifies a subqueryId value returned as part of the
1325      search result. It is a way for a client to name an APT part of a
1326      query.
1327     </para>
1328     <!--
1329     <para>
1330      <screen>
1331      </screen>
1332     </para>
1333     -->
1334     <warning>
1335      Experimental. Do not use in production code.
1336     </warning>
1337
1338
1339    </sect2>
1340
1341
1342    <sect2 id="querymodel-zebra-attr-scan">
1343     <title>Zebra specific Scan Extentions to all Attribute Sets</title>
1344     <para>
1345      Zebra extends the Bib1 attribute types, and these extentions are
1346      recognized regardless of attribute
1347      set used in a <literal>scan</literal> operation query.
1348     </para>
1349      <table id="querymodel-zebra-attr-scan-table"
1350       frame="all" rowsep="1" colsep="1" align="center">
1351
1352       <caption>Zebra Scan Attribute Extentions</caption>
1353        <thead>
1354         <tr>
1355          <td>Name</td>
1356          <td>Type</td>
1357          <td>Operation</td>
1358          <td>Zebra version</td>
1359         </tr>
1360       </thead>
1361        <tbody>
1362         <tr>
1363          <td>Result Set Narrow</td>
1364          <td>8</td>
1365          <td>scan</td>
1366          <td>1.3</td>
1367         </tr>
1368         <tr>
1369          <td>Approximative Limit</td>
1370          <td>9</td>
1371          <td>scan</td>
1372          <td>1.4</td>
1373         </tr>
1374        </tbody>
1375       </table>
1376
1377     <sect3 id="querymodel-zebra-attr-narrow">
1378      <title>Zebra Extention Result Set Narrow (type 8)</title>
1379     </sect3>
1380     <para>
1381      If attribute <literal>Result Set Narrow (type 8)</literal>
1382      is given for <literal>scan</literal>, the value is the name of a
1383      result set. Each hit count in <literal>scan</literal> is
1384      <literal>@and</literal>'ed with the result set given.
1385     </para>
1386     <para>
1387      Consider for example
1388      the case of scanning all title fields around the
1389      scanterm <emphasis>mozart</emphasis>, then refining the scan by
1390      issuing a filtering query for <emphasis>amadeus</emphasis> to
1391      restric the scan to the result set of the query:
1392      <screen>
1393       Z> scan @attr 1=4 mozart
1394       ...
1395       * mozart (43)
1396         mozartforskningen (1)
1397         mozartiana (1)
1398         mozarts (16)
1399       ...
1400       Z> f @attr 1=4 amadeus
1401       ...
1402       Number of hits: 15, setno 2
1403       ...
1404       Z> scan @attr 1=4 @attr 8=2 mozart
1405       ...
1406       * mozart (14)
1407         mozartforskningen (0)
1408         mozartiana (0)
1409         mozarts (1)
1410       ...
1411      </screen>
1412     </para>
1413
1414     <warning>
1415      Experimental. Do not use in production code.
1416     </warning>
1417
1418     <sect3 id="querymodel-zebra-attr-approx">
1419      <title>Zebra Extention Approximative Limit (type 9)</title>
1420     </sect3>
1421     <para>
1422      The <literal>Zebra Extention Approximative Limit (type
1423       9)</literal> is a way to enable approx
1424      hit counts for <literal>scan</literal> hit counts, in the same
1425      way as for <literal>search</literal> hit counts.
1426     </para>
1427     <!--
1428     <para>
1429      <screen>
1430      </screen>
1431     </para>
1432     -->
1433     <warning>
1434      Experimental and buggy. Definitely not to be used in production code.
1435     </warning>
1436
1437
1438    </sect2>
1439
1440
1441    <sect2 id="querymodel-idxpath">
1442     <title>Zebra special IDXPATH Attribute Set for GRS indexing</title>
1443     <para>
1444      The attribute-set <literal>idxpath</literal> consists of a single
1445      <literal>Use (type 1)</literal> attribute. All non-use attributes
1446      behave as normal.
1447     </para>
1448     <para>
1449      This feature is enabled when defining the
1450      <literal>xpath enable</literal> option in the GRS filter
1451      <literal>*.abs</literal> configuration files. If one wants to use
1452      the special <literal>idxpath</literal> numeric attribute set, the
1453      main Zebra configuraiton file <filename>zebra.cfg</filename>
1454      directive <literal>attset: idxpath.att</literal> must be enabled.
1455     </para>
1456     <warning>The <literal>idxpath</literal> is depreciated, may not be
1457      supported in future Zebra versions, and should definitely
1458      not be used in production code.
1459     </warning>
1460
1461     <sect3 id="querymodel-idxpath-use">
1462     <title>IDXPATH Use Attributes (type = 1)</title>
1463      <para>
1464       This attribute set allows one to search GRS filter indexed
1465       records by XPATH like structured index names. It is enabled by
1466       specifying the <literal></literal>
1467      </para>
1468
1469
1470      <warning>The <literal>idxpath</literal> option defines hard-coded
1471       index names, which might clash with your own index names.
1472      </warning>
1473
1474      <table id="querymodel-idxpath-use-table"
1475       frame="all" rowsep="1" colsep="1" align="center">
1476
1477       <caption>Zebra specific IDXPATH Use Attributes (type 1)</caption>
1478       <thead>
1479         <tr>
1480          <td>IDXPATH</td>
1481          <td>Value</td>
1482          <td>String Index</td>
1483          <td>Notes</td>
1484         </tr>
1485        </thead>
1486        <tbody>
1487         <tr>
1488          <td>XPATH Begin</td>
1489          <td>1</td>
1490          <td>_XPATH_BEGIN</td>
1491          <td>depreciated</td>
1492         </tr>
1493         <tr>
1494          <td>XPATH End</td>
1495          <td>2</td>
1496          <td>_XPATH_END</td>
1497          <td>depreciated</td>
1498         </tr>
1499         <tr>
1500          <td>XPATH CData</td>
1501          <td>1016</td>
1502          <td>_XPATH_CDATA</td>
1503          <td>depreciated</td>
1504         </tr>
1505         <tr>
1506          <td>XPATH Attribute Name</td>
1507          <td>3</td>
1508          <td>_XPATH_ATTR_NAME</td>
1509          <td>depreciated</td>
1510         </tr>
1511         <tr>
1512          <td>XPATH Attribute CData</td>
1513          <td>1015</td>
1514          <td>_XPATH_ATTR_CDATA</td>
1515          <td>depreciated</td>
1516         </tr>
1517        </tbody>
1518      </table>
1519
1520
1521      <para>
1522       See <filename>tab/idxpath.att</filename> for more information.
1523      </para>
1524      <para>
1525       Search for all documents starting with root element
1526       <literal>/root</literal> (either using the numeric or the string
1527       use attributes):
1528       <screen>
1529        Z> find @attrset idxpath @attr 1=1 @attr 4=3 root/
1530        Z> find @attr idxpath 1=1 @attr 4=3 root/
1531        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 root/
1532       </screen>
1533      </para>
1534      <para>
1535       Search for all documents where specific nested XPATH
1536       <literal>/c1/c2/../cn</literal> exists. Notice the very
1537       counter-intuitive <emphasis>reverse</emphasis> notation!
1538       <screen>
1539        Z> find @attrset idxpath @attr 1=1 @attr 4=3 cn/cn-1/../c1/
1540        Z> find @attr 1=_XPATH_BEGIN @attr 4=3 cn/cn-1/../c1/
1541       </screen>
1542      </para>
1543      <para>
1544       Search for CDATA string <emphasis>text</emphasis> in any  element
1545       <screen>
1546        Z> find @attrset idxpath @attr 1=1016 text
1547        Z> find @attr 1=_XPATH_CDATA text
1548       </screen>
1549      </para>
1550      <para>
1551        Search for CDATA string <emphasis>anothertext</emphasis> in any
1552        attribute:
1553       <screen>
1554        Z> find @attrset idxpath @attr 1=1015 anothertext
1555        Z> find @attr 1=_XPATH_ATTR_CDATA anothertext
1556       </screen>
1557      </para>
1558      <para>
1559        Search for all documents with have an XML element node
1560        including an XML  attribute named <emphasis>creator</emphasis>
1561       <screen>
1562        Z> find @attrset idxpath @attr 1=3 @attr 4=3 creator
1563        Z> find @attr 1=_XPATH_ATTR_NAME @attr 4=3 creator
1564       </screen>
1565      </para>
1566      <para>
1567       Combining usual <literal>bib-1</literal> attribut set searches
1568       with <literal>idxpath</literal> attribute set searches:
1569       <screen>
1570        Z> find @and @attr idxpath 1=1 @attr 4=3 link/ @attr 1=4 mozart
1571        Z> find @and @attr 1=_XPATH_BEGIN @attr 4=3 link/ @attr 1=_XPATH_CDATA mozart
1572       </screen>
1573      </para>
1574
1575     </sect3>
1576    </sect2>
1577
1578
1579    <sect2 id="querymodel-bib1-mapping">
1580     <title>Mapping from Bib1 Attributes to Zebra internal
1581      register indexes</title>
1582     <para>
1583      TO-DO
1584      </para>
1585
1586
1587      <!-- see in util/zebramap.c
1588       int zebra_maps_attr
1589
1590   if (completeness_value == 2 || completeness_value == 3)
1591         *complete_flag = 1;
1592     else
1593         *complete_flag = 0;
1594     *reg_id = 0;
1595
1596     *sort_flag =(sort_relation_value > 0) ? 1 : 0;
1597     *search_type = "phrase";
1598     strcpy(rank_type, "void");
1599     if (relation_value == 102)
1600     {
1601         if (weight_value == -1)
1602             weight_value = 34;
1603         sprintf(rank_type, "rank,w=%d,u=%d", weight_value, use_value);
1604     }
1605     if (relation_value == 103)
1606     {
1607         *search_type = "always";
1608         *reg_id = 'w';
1609         return 0;
1610     }
1611     if (*complete_flag)
1612         *reg_id = 'p';
1613     else
1614         *reg_id = 'w';
1615     switch (structure_value)
1616     {
1617     case 6:   /* word list */
1618         *search_type = "and-list";
1619         break;
1620     case 105: /* free-form-text */
1621         *search_type = "or-list";
1622         break;
1623     case 106: /* document-text */
1624         *search_type = "or-list";
1625         break;
1626     case -1:
1627     case 1:   /* phrase */
1628     case 2:   /* word */
1629     case 108: /* string */
1630         *search_type = "phrase";
1631         break;
1632    case 107: /* local-number */
1633         *search_type = "local";
1634         *reg_id = 0;
1635         break;
1636     case 109: /* numeric string */
1637         *reg_id = 'n';
1638         *search_type = "numeric";
1639         break;
1640     case 104: /* urx */
1641         *reg_id = 'u';
1642         *search_type = "phrase";
1643         break;
1644     case 3:   /* key */
1645         *reg_id = '0';
1646         *search_type = "phrase";
1647         break;
1648     case 4:  /* year */
1649         *reg_id = 'y';
1650         *search_type = "phrase";
1651         break;
1652     case 5:  /* date */
1653         *reg_id = 'd';
1654         *search_type = "phrase";
1655         break;
1656     default:
1657         return -1;
1658     }
1659     return 0;
1660
1661      -->
1662
1663
1664     <para>
1665      <emphasis>Use</emphasis> attributes are interpreted according to the
1666      attribute sets which have been loaded in the
1667     <literal>zebra.cfg</literal> file, and are matched against specific
1668      fields as specified in the <literal>.abs</literal> file which
1669      describes the profile of the records which have been loaded.
1670      If no Use attribute is provided, a default of Bib-1 Any is assumed.
1671     </para>
1672
1673     <para>
1674      If a <emphasis>Structure</emphasis> attribute of
1675      <emphasis>Phrase</emphasis> is used in conjunction with a
1676      <emphasis>Completeness</emphasis> attribute of
1677      <emphasis>Complete (Sub)field</emphasis>, the term is matched
1678      against the contents of the phrase (long word) register, if one
1679      exists for the given <emphasis>Use</emphasis> attribute.
1680      A phrase register is created for those fields in the
1681      <literal>.abs</literal> file that contains a
1682      <literal>p</literal>-specifier.
1683      <!-- ### whatever the hell _that_ is -->
1684     </para>
1685
1686     <para>
1687      If <emphasis>Structure</emphasis>=<emphasis>Phrase</emphasis> is
1688      used in conjunction with <emphasis>Incomplete Field</emphasis> - the
1689      default value for <emphasis>Completeness</emphasis>, the
1690      search is directed against the normal word registers, but if the term
1691      contains multiple words, the term will only match if all of the words
1692      are found immediately adjacent, and in the given order.
1693      The word search is performed on those fields that are indexed as
1694      type <literal>w</literal> in the <literal>.abs</literal> file.
1695     </para>
1696
1697     <para>
1698      If the <emphasis>Structure</emphasis> attribute is
1699      <emphasis>Word List</emphasis>,
1700      <emphasis>Free-form Text</emphasis>, or
1701      <emphasis>Document Text</emphasis>, the term is treated as a
1702      natural-language, relevance-ranked query.
1703      This search type uses the word register, i.e. those fields
1704      that are indexed as type <literal>w</literal> in the
1705      <literal>.abs</literal> file.
1706     </para>
1707
1708     <para>
1709      If the <emphasis>Structure</emphasis> attribute is
1710      <emphasis>Numeric String</emphasis> the term is treated as an integer.
1711      The search is performed on those fields that are indexed
1712      as type <literal>n</literal> in the <literal>.abs</literal> file.
1713     </para>
1714
1715     <para>
1716      If the <emphasis>Structure</emphasis> attribute is
1717      <emphasis>URx</emphasis> the term is treated as a URX (URL) entity.
1718      The search is performed on those fields that are indexed as type
1719      <literal>u</literal> in the <literal>.abs</literal> file.
1720     </para>
1721
1722     <para>
1723      If the <emphasis>Structure</emphasis> attribute is
1724      <emphasis>Local Number</emphasis> the term is treated as
1725      native Zebra Record Identifier.
1726     </para>
1727
1728     <para>
1729      If the <emphasis>Relation</emphasis> attribute is
1730      <emphasis>Equals</emphasis> (default), the term is matched
1731      in a normal fashion (modulo truncation and processing of
1732      individual words, if required).
1733      If <emphasis>Relation</emphasis> is <emphasis>Less Than</emphasis>,
1734      <emphasis>Less Than or Equal</emphasis>,
1735      <emphasis>Greater than</emphasis>, or <emphasis>Greater than or
1736       Equal</emphasis>, the term is assumed to be numerical, and a
1737      standard regular expression is constructed to match the given
1738      expression.
1739      If <emphasis>Relation</emphasis> is <emphasis>Relevance</emphasis>,
1740      the standard natural-language query processor is invoked.
1741     </para>
1742
1743     <para>
1744      For the <emphasis>Truncation</emphasis> attribute,
1745      <emphasis>No Truncation</emphasis> is the default.
1746      <emphasis>Left Truncation</emphasis> is not supported.
1747      <emphasis>Process # in search term</emphasis> is supported, as is
1748      <emphasis>Regxp-1</emphasis>.
1749      <emphasis>Regxp-2</emphasis> enables the fault-tolerant (fuzzy)
1750      search. As a default, a single error (deletion, insertion,
1751      replacement) is accepted when terms are matched against the register
1752      contents.
1753     </para>
1754    </sect2>
1755
1756    <sect2  id="querymodel-regular">
1757     <title>Zebra Regular Expressions in Truncation Attribute (type = 5)</title>
1758
1759     <para>
1760      Each term in a query is interpreted as a regular expression if
1761      the truncation value is either <emphasis>Regxp-1 (@attr 5=102)</emphasis>
1762      or <emphasis>Regxp-2 (@attr 5=103)</emphasis>.
1763      Both query types follow the same syntax with the operands:
1764     </para>
1765
1766      <table id="querymodel-regular-operands-table"
1767       frame="all" rowsep="1" colsep="1" align="center">
1768
1769       <caption>Regular Expression Operands</caption>
1770        <!--
1771        <thead>
1772        <tr><td>one</td><td>two</td></tr>
1773       </thead>
1774        -->
1775        <tbody>
1776         <tr>
1777          <td><literal>x</literal></td>
1778          <td>Matches the character <literal>x</literal>.</td>
1779         </tr>
1780         <tr>
1781          <td><literal>.</literal></td>
1782          <td>Matches any character.</td>
1783         </tr>
1784         <tr>
1785          <td><literal>[ .. ]</literal></td>
1786          <td>Matches the set of characters specified;
1787          such as <literal>[abc]</literal> or <literal>[a-c]</literal>.</td>
1788         </tr>
1789        </tbody>
1790       </table>
1791
1792     <para>
1793      The above operands can be combined with the following operators:
1794     </para>
1795
1796      <table id="querymodel-regular-operators-table"
1797       frame="all" rowsep="1" colsep="1" align="center">
1798       <caption>Regular Expression Operators</caption>
1799        <!--
1800        <thead>
1801        <tr><td>one</td><td>two</td></tr>
1802       </thead>
1803        -->
1804        <tbody>
1805         <tr>
1806          <td><literal>x*</literal></td>
1807          <td>Matches <literal>x</literal> zero or more times.
1808           Priority: high.</td>
1809         </tr>
1810         <tr>
1811          <td><literal>x+</literal></td>
1812          <td>Matches <literal>x</literal> one or more times.
1813           Priority: high.</td>
1814         </tr>
1815         <tr>
1816          <td><literal>x?</literal></td>
1817          <td> Matches <literal>x</literal> zero or once.
1818           Priority: high.</td>
1819         </tr>
1820         <tr>
1821          <td><literal>xy</literal></td>
1822          <td> Matches <literal>x</literal>, then <literal>y</literal>.
1823          Priority: medium.</td>
1824         </tr>
1825         <tr>
1826          <td><literal>x|y</literal></td>
1827          <td> Matches either <literal>x</literal> or <literal>y</literal>.
1828          Priority: low.</td>
1829         </tr>
1830         <tr>
1831          <td><literal>( )</literal></td>
1832          <td>The order of evaluation may be changed by using parentheses.</td>
1833         </tr>
1834        </tbody>
1835       </table>
1836
1837     <para>
1838      If the first character of the <literal>Regxp-2</literal> query
1839      is a plus character (<literal>+</literal>) it marks the
1840      beginning of a section with non-standard specifiers.
1841      The next plus character marks the end of the section.
1842      Currently Zebra only supports one specifier, the error tolerance,
1843      which consists one digit.
1844     </para>
1845
1846     <para>
1847      Since the plus operator is normally a suffix operator the addition to
1848      the query syntax doesn't violate the syntax for standard regular
1849      expressions.
1850     </para>
1851
1852     <para>
1853      For example, a phrase search with regular expressions  in
1854      the title-register is performed like this:
1855      <screen>
1856       Z> find @attr 1=4 @attr 5=102 "informat.* retrieval"
1857      </screen>
1858     </para>
1859
1860     <para>
1861      Combinations with other attributes are possible. For example, a
1862      ranked search with a regular expression:
1863      <screen>
1864       Z> find @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"
1865      </screen>
1866     </para>
1867    </sect2>
1868
1869
1870    <!--
1871    <para>
1872     The RecordType parameter in the <literal>zebra.cfg</literal> file, or
1873     the <literal>-t</literal> option to the indexer tells Zebra how to
1874     process input records.
1875     Two basic types of processing are available - raw text and structured
1876     data. Raw text is just that, and it is selected by providing the
1877     argument <literal>text</literal> to Zebra. Structured records are
1878     all handled internally using the basic mechanisms described in the
1879     subsequent sections.
1880     Zebra can read structured records in many different formats.
1881    </para>
1882    -->
1883   </sect1>
1884
1885
1886   <sect1 id="querymodel-cql-to-pqf">
1887    <title>Server Side CQL to PQF Query Translation</title>
1888    <para>
1889     Using the
1890     <literal>&lt;cql2rpn&gt;l2rpn.txt&lt;/cql2rpn&gt;</literal>
1891       YAZ Frontend Virtual
1892     Hosts option, one can configure
1893     the YAZ Frontend CQL-to-PQF
1894     converter, specifying the interpretation of various
1895     <ulink url="&url.cql;">CQL</ulink>
1896     indexes, relations, etc. in terms of Type-1 query attributes.
1897     <!-- The  yaz-client config file -->
1898    </para>
1899    <para>
1900     For example, using server-side CQL-to-PQF conversion, one might
1901     query a zebra server like this:
1902     <screen>
1903     <![CDATA[
1904      yaz-client localhost:9999
1905      Z> querytype cql
1906      Z> find text=(plant and soil)
1907      ]]>
1908     </screen>
1909      and - if properly configured - even static relevance ranking can
1910      be performed using CQL query syntax:
1911     <screen>
1912     <![CDATA[
1913      Z> find text = /relevant (plant and soil)
1914      ]]>
1915      </screen>
1916    </para>
1917
1918    <para>
1919     By the way, the same configuration can be used to
1920     search using client-side CQL-to-PQF conversion:
1921     (the only difference is <literal>querytype cql2rpn</literal>
1922     instead of
1923     <literal>querytype cql</literal>, and the call specifying a local
1924     conversion file)
1925     <screen>
1926     <![CDATA[
1927      yaz-client -q local/cql2pqf.txt localhost:9999
1928      Z> querytype cql2rpn
1929      Z> find text=(plant and soil)
1930      ]]>
1931      </screen>
1932    </para>
1933
1934    <para>
1935     Exhaustive information can be found in the
1936     Section "Specification of CQL to RPN mappings" in the YAZ manual.
1937     <ulink url="http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map">
1938      http://www.indexdata.dk/yaz/doc/tools.tkl#tools.cql.map</ulink>,
1939    and shall therefore not be repeated here.
1940    </para>
1941   <!--
1942   <para>
1943     See
1944       <ulink url="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html">
1945       http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html</ulink>
1946     for the Maintenance Agency's work-in-progress mapping of Dublin Core
1947     indexes to Attribute Architecture (util, XD and BIB-2)
1948     attributes.
1949    </para>
1950    -->
1951  </sect1>
1952
1953
1954
1955 </chapter>
1956
1957  <!-- Keep this comment at the end of the file
1958  Local variables:
1959  mode: sgml
1960  sgml-omittag:t
1961  sgml-shorttag:t
1962  sgml-minimize-attributes:nil
1963  sgml-always-quote-attributes:t
1964  sgml-indent-step:1
1965  sgml-indent-data:t
1966  sgml-parent-document: "zebra.xml"
1967  sgml-local-catalogs: nil
1968  sgml-namecase-general:t
1969  End:
1970  -->