doc/tools.xml

   1 <!-- $Id: tools.xml,v 1.4 2001-08-08 19:33:21 adam Exp $ -->
   2  <chapter><title>Supporting Tools</title>
   3
   4   <para>
   5    In support of the service API - primarily the ASN module, which
   6    provides the programmatic interface to the Z39.50 APDUs, &yaz; contains
   7    a collection of tools that support the development of applications.
   8   </para>
   9
  10   <sect1><title>Query Syntax Parsers</title>
  11
  12    <para>
  13     Since the type-1 (RPN) query structure has no direct, useful string
  14     representation, every origin application needs to provide some form of
  15     mapping from a local query notation or representation to a
  16     <token>Z_RPNQuery</token> structure. Some programmers will prefer to
  17     construct the query manually, perhaps using
  18     <function>odr_malloc()</function> to simplify memory management.
  19     The &yaz; distribution includes two separate, query-generating tools
  20     that may be of use to you.
  21    </para>
  22
  23    <sect2><title id="PQF">Prefix Query Format</title>
  24
  25     <para>
  26      Since RPN or reverse polish notation is really just a fancy way of
  27      describing a suffix notation format (operator follows operands), it
  28      would seem that the confusion is total when we now introduce a prefix
  29      notation for RPN. The reason is one of simple laziness - it's somewhat
  30      simpler to interpret a prefix format, and this utility was designed
  31      for maximum simplicity, to provide a baseline representation for use
  32      in simple test applications and scripting environments (like Tcl). The
  33      demonstration client included with YAZ uses the PQF.
  34     </para>
  35     <para>
  36      The PQF is defined by the pquery module in the YAZ library. The
  37      <filename>pquery.h</filename> file provides the declaration of the
  38      functions
  39     </para>
  40     <screen>
  41 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
  42
  43 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
  44           Odr_oid **attributeSetP, const char *qbuf);
  45
  46 int p_query_attset (const char *arg);
  47     </screen>
  48     <para>
  49      The function <function>p_query_rpn()</function> takes as arguments an
  50       &odr; stream (see section <link linkend="odr">The ODR Module</link>)
  51      to provide a memory source (the structure created is released on
  52      the next call to <function>odr_reset()</function> on the stream), a
  53      protocol identifier (one of the constants <token>PROTO_Z3950</token> and
  54      <token>PROTO_SR</token>), an attribute set reference, and
  55      finally a null-terminated string holding the query string.
  56     </para>
  57     <para>
  58      If the parse went well, <function>p_query_rpn()</function> returns a
  59      pointer to a <literal>Z_RPNQuery</literal> structure which can be
  60      placed directly into a <literal>Z_SearchRequest</literal>.
  61     </para>
  62     <para>
  63
  64      The <literal>p_query_attset</literal> specifies which attribute set
  65      to use if the query doesn't specify one by the
  66      <literal>@attrset</literal> operator.
  67      The <literal>p_query_attset</literal> returns 0 if the argument is a
  68      valid attribute set specifier; otherwise the function returns -1.
  69     </para>
  70
  71     <para>
  72      The grammar of the PQF is as follows:
  73     </para>
  74
  75     <screen>
  76      Query ::= &lsqb; '@attrset' AttSet &rsqb; QueryStruct.
  77
  78      AttSet ::= string.
  79
  80      QueryStruct ::= &lsqb; Attribute &rsqb; Simple | Complex.
  81
  82      Attribute ::= '@attr' &lsqb; AttSet &rsqb; AttributeType '=' AttributeValue.
  83
  84      AttributeType ::= integer.
  85
  86      AttributeValue ::= integer.
  87
  88      Complex ::= Operator QueryStruct QueryStruct.
  89
  90      Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
  91
  92      Simple ::= ResultSet | Term.
  93
  94      ResultSet ::= '@set' string.
  95
  96      Term ::= string | '"' string '"'.
  97
  98      Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
  99
 100      Exclusion ::= '1' | '0' | 'void'.
 101
 102      Distance ::= integer.
 103
 104      Ordered ::= '1' | '0'.
 105
 106      Relation ::= integer.
 107
 108      WhichCode ::= 'known' | 'private' | integer.
 109
 110      UnitCode ::= integer.
 111     </screen>
 112
 113     <para>
 114      You will note that the syntax above is a fairly faithful
 115      representation of RPN, except for the Attibute, which has been
 116      moved a step away from the term, allowing you to associate one or more
 117      attributes with an entire query structure. The parser will
 118      automatically apply the given attributes to each term as required.
 119     </para>
 120
 121     <para>
 122      The following are all examples of valid queries in the PQF.
 123     </para>
 124
 125     <screen>
 126      dylan
 127
 128      "bob dylan"
 129
 130      @or "dylan" "zimmerman"
 131
 132      @set Result-1
 133
 134      @or @and bob dylan @set Result-1
 135
 136      @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
 137
 138      @attr 4=1 @attr 1=4 "self portrait"
 139
 140      @prox 0 3 1 2 k 2 dylan zimmerman
 141
 142      @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
 143     </screen>
 144
 145    </sect2>
 146    <sect2><title id="CCL">Common Command Language</title>
 147
 148     <para>
 149      Not all users enjoy typing in prefix query structures and numerical
 150      attribute values, even in a minimalistic test client. In the library
 151      world, the more intuitive Common Command Language (or ISO 8777) has
 152      enjoyed some popularity - especially before the widespread
 153      availability of graphical interfaces. It is still useful in
 154      applications where you for some reason or other need to provide a
 155      symbolic language for expressing boolean query structures.
 156     </para>
 157
 158     <para>
 159      The EUROPAGATE research project working under the Libraries programme
 160      of the European Commission's DG XIII has, amongst other useful tools,
 161      implemented a general-purpose CCL parser which produces an output
 162      structure that can be trivially converted to the internal RPN
 163      representation of YAZ (The <literal>Z_RPNQuery</literal> structure).
 164      Since the CCL utility - along with the rest of the software
 165      produced by EUROPAGATE - is made freely available on a liberal license, it
 166      is included as a supplement to YAZ.
 167     </para>
 168
 169     <sect3><title>CCL Syntax</title>
 170
 171      <para>
 172       The CCL parser obeys the following grammar for the FIND argument.
 173       The syntax is annotated by in the lines prefixed by
 174       <literal>&dash;&dash;</literal>.
 175      </para>
 176
 177      <screen>
 178       CCL-Find ::= CCL-Find Op Elements
 179                 | Elements.
 180
 181       Op ::= "and" | "or" | "not"
 182       -- The above means that Elements are separated by boolean operators.
 183
 184       Elements ::= '(' CCL-Find ')'
 185                 | Set
 186                 | Terms
 187                 | Qualifiers Relation Terms
 188                 | Qualifiers Relation '(' CCL-Find ')'
 189                 | Qualifiers '=' string '-' string
 190       -- Elements is either a recursive definition, a result set reference, a
 191       -- list of terms, qualifiers followed by terms, qualifiers followed
 192       -- by a recursive definition or qualifiers in a range (lower - upper).
 193
 194       Set ::= 'set' = string
 195       -- Reference to a result set
 196
 197       Terms ::= Terms Prox Term
 198              | Term
 199       -- Proximity of terms.
 200
 201       Term ::= Term string
 202             | string
 203       -- This basically means that a term may include a blank
 204
 205       Qualifiers ::= Qualifiers ',' string
 206                   | string
 207       -- Qualifiers is a list of strings separated by comma
 208
 209       Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
 210       -- Relational operators. This really doesn't follow the ISO8777
 211       -- standard.
 212
 213       Prox ::= '%' | '!'
 214       -- Proximity operator
 215
 216      </screen>
 217
 218      <para>
 219       The following queries are all valid:
 220      </para>
 221
 222      <screen>
 223       dylan
 224
 225       "bob dylan"
 226
 227       dylan or zimmerman
 228
 229       set=1
 230
 231       (dylan and bob) or set=1
 232
 233      </screen>
 234      <para>
 235       Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
 236       and <literal>date</literal> are defined we may use:
 237      </para>
 238
 239      <screen>
 240       ti=self portrait
 241
 242       au=(bob dylan and slow train coming)
 243
 244       date>1980 and (ti=((self portrait)))
 245
 246      </screen>
 247
 248     </sect3>
 249     <sect3><title>CCL Qualifiers</title>
 250
 251      <para>
 252       Qualifiers are used to direct the search to a particular searchable
 253       index, such as title (ti) and author indexes (au). The CCL standard
 254       itself doesn't specify a particular set of qualifiers, but it does
 255       suggest a few short-hand notations. You can customize the CCL parser
 256       to support a particular set of qualifiers to relect the current target
 257       profile. Traditionally, a qualifier would map to a particular
 258       use-attribute within the BIB-1 attribute set. However, you could also
 259       define qualifiers that would set, for example, the
 260       structure-attribute.
 261      </para>
 262
 263      <para>
 264       Consider a scenario where the target support ranked searches in the
 265       title-index. In this case, the user could specify
 266      </para>
 267
 268      <screen>
 269       ti,ranked=knuth computer
 270      </screen>
 271      <para>
 272       and the <literal>ranked</literal> would map to relation=relevance
 273       (2=102) and the <literal>ti</literal> would map to title (1=4).
 274      </para>
 275
 276      <para>
 277       A "profile" with a set predefined CCL qualifiers can be read from a
 278       file. The YAZ client reads its CCL qualifiers from a file named
 279       <filename>default.bib</filename>. Each line in the file has the form:
 280      </para>
 281
 282      <para>
 283       <replaceable>qualifier-name</replaceable>
 284       <replaceable>type</replaceable>=<replaceable>val</replaceable>
 285       <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
 286      </para>
 287
 288      <para>
 289       where <replaceable>qualifier-name</replaceable> is the name of the
 290       qualifier to be used (eg. <literal>ti</literal>),
 291       <replaceable>type</replaceable> is a BIB-1 category type and
 292       <replaceable>val</replaceable> is the corresponding BIB-1 attribute
 293       value.
 294       The <replaceable>type</replaceable> can be either numeric or it may be
 295       either <literal>u</literal> (use), <literal>r</literal> (relation),
 296       <literal>p</literal> (position), <literal>s</literal> (structure),
 297       <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
 298       The <replaceable>qualifier-name</replaceable> <literal>term</literal>
 299       has a special meaning.
 300       The types and values for this definition is used when
 301       <emphasis>no</emphasis> qualifiers are present.
 302      </para>
 303
 304      <para>
 305       Consider the following definition:
 306      </para>
 307
 308      <screen>
 309       ti       u=4 s=1
 310       au       u=1 s=1
 311       term     s=105
 312      </screen>
 313      <para>
 314       Two qualifiers are defined, <literal>ti</literal> and
 315       <literal>au</literal>.
 316       They both set the structure-attribute to phrase (1).
 317       <literal>ti</literal>
 318       sets the use-attribute to 4. <literal>au</literal> sets the
 319       use-attribute to 1.
 320       When no qualifiers are used in the query the structure-attribute is
 321       set to free-form-text (105).
 322      </para>
 323
 324     </sect3>
 325     <sect3><title>CCL API</title>
 326      <para>
 327       All public definitions can be found in the header file
 328       <filename>ccl.h</filename>. A profile identifier is of type
 329       <literal>CCL_bibset</literal>. A profile must be created with the call
 330       to the function <function>ccl_qual_mk</function> which returns a profile
 331       handle of type <literal>CCL_bibset</literal>.
 332      </para>
 333
 334      <para>
 335       To read a file containing qualifier definitions the function
 336       <function>ccl_qual_file</function> may be convenient. This function
 337       takes an already opened <literal>FILE</literal> handle pointer as
 338       argument along with a <literal>CCL_bibset</literal> handle.
 339      </para>
 340
 341      <para>
 342       To parse a simple string with a FIND query use the function
 343      </para>
 344      <screen>
 345 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
 346                                    int *error, int *pos);
 347      </screen>
 348      <para>
 349       which takes the CCL profile (<literal>bibset</literal>) and query
 350       (<literal>str</literal>) as input. Upon successful completion the RPN
 351       tree is returned. If an error eccur, such as a syntax error, the integer
 352       pointed to by <literal>error</literal> holds the error code and
 353       <literal>pos</literal> holds the offset inside query string in which
 354       the parsing failed.
 355      </para>
 356
 357      <para>
 358       An english representation of the error may be obtained by calling
 359       the <literal>ccl_err_msg</literal> function. The error codes are
 360       listed in <filename>ccl.h</filename>.
 361      </para>
 362
 363      <para>
 364       To convert the CCL RPN tree (type
 365       <literal>struct ccl_rpn_node *</literal>)
 366       to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
 367       must be used. This function which is part of YAZ is implemented in
 368       <filename>yaz-ccl.c</filename>.
 369       After calling this function the CCL RPN tree is probably no longer
 370       needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
 371      </para>
 372
 373      <para>
 374       A CCL profile may be destroyed by calling the
 375       <function>ccl_qual_rm</function> function.
 376      </para>
 377
 378      <para>
 379       The token names for the CCL operators may be changed by setting the
 380       globals (all type <literal>char *</literal>)
 381       <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
 382       <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
 383       An operator may have aliases, i.e. there may be more than one name for
 384       the operator. To do this, separate each alias with a space character.
 385      </para>
 386     </sect3>
 387    </sect2>
 388   </sect1>
 389   <sect1><title>Object Identifiers</title>
 390
 391    <para>
 392     The basic YAZ representation of an OID is an array of integers,
 393     terminated with the value -1. The &odr; module provides two
 394     utility-functions to create and copy this type of data elements:
 395    </para>
 396
 397    <screen>
 398     Odr_oid *odr_getoidbystr(ODR o, char *str);
 399    </screen>
 400
 401    <para>
 402     Creates an OID based on a string-based representation using dots (.)
 403     to separate elements in the OID.
 404    </para>
 405
 406    <screen>
 407     Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
 408    </screen>
 409
 410    <para>
 411     Creates a copy of the OID referenced by the <emphasis>o</emphasis>
 412     parameter.
 413     Both functions take an &odr; stream as parameter. This stream is used to
 414     allocate memory for the data elements, which is released on a
 415     subsequent call to <function>odr_reset()</function> on that stream.
 416    </para>
 417
 418    <para>
 419     The OID module provides a higher-level representation of the
 420     family of object identifers which describe the Z39.50 protocol and its
 421     related objects. The definition of the module interface is given in
 422     the <filename>oid.h</filename> file.
 423    </para>
 424
 425    <para>
 426     The interface is mainly based on the <literal>oident</literal> structure.
 427     The definition of this structure looks like this:
 428    </para>
 429
 430    <screen>
 431 typedef struct oident
 432 {
 433     oid_proto proto;
 434     oid_class oclass;
 435     oid_value value;
 436     int oidsuffix[OID_SIZE];
 437     char *desc;
 438 } oident;
 439    </screen>
 440
 441    <para>
 442     The proto field takes one of the values
 443    </para>
 444
 445    <screen>
 446     PROTO_Z3950
 447     PROTO_SR
 448    </screen>
 449
 450    <para>
 451     If you don't care about talking to SR-based implementations (few
 452     exist, and they may become fewer still if and when the ISO SR and ANSI
 453     Z39.50 documents are merged into a single standard), you can ignore
 454     this field on incoming packages, and always set it to PROTO_Z3950
 455     for outgoing packages.
 456    </para>
 457    <para>
 458
 459     The oclass field takes one of the values
 460    </para>
 461
 462    <screen>
 463     CLASS_APPCTX
 464     CLASS_ABSYN
 465     CLASS_ATTSET
 466     CLASS_TRANSYN
 467     CLASS_DIAGSET
 468     CLASS_RECSYN
 469     CLASS_RESFORM
 470     CLASS_ACCFORM
 471     CLASS_EXTSERV
 472     CLASS_USERINFO
 473     CLASS_ELEMSPEC
 474     CLASS_VARSET
 475     CLASS_SCHEMA
 476     CLASS_TAGSET
 477     CLASS_GENERAL
 478    </screen>
 479
 480    <para>
 481     corresponding to the OID classes defined by the Z39.50 standard.
 482
 483     Finally, the value field takes one of the values
 484    </para>
 485
 486    <screen>
 487     VAL_APDU
 488     VAL_BER
 489     VAL_BASIC_CTX
 490     VAL_BIB1
 491     VAL_EXP1
 492     VAL_EXT1
 493     VAL_CCL1
 494     VAL_GILS
 495     VAL_WAIS
 496     VAL_STAS
 497     VAL_DIAG1
 498     VAL_ISO2709
 499     VAL_UNIMARC
 500     VAL_INTERMARC
 501     VAL_CCF
 502     VAL_USMARC
 503     VAL_UKMARC
 504     VAL_NORMARC
 505     VAL_LIBRISMARC
 506     VAL_DANMARC
 507     VAL_FINMARC
 508     VAL_MAB
 509     VAL_CANMARC
 510     VAL_SBN
 511     VAL_PICAMARC
 512     VAL_AUSMARC
 513     VAL_IBERMARC
 514     VAL_EXPLAIN
 515     VAL_SUTRS
 516     VAL_OPAC
 517     VAL_SUMMARY
 518     VAL_GRS0
 519     VAL_GRS1
 520     VAL_EXTENDED
 521     VAL_RESOURCE1
 522     VAL_RESOURCE2
 523     VAL_PROMPT1
 524     VAL_DES1
 525     VAL_KRB1
 526     VAL_PRESSET
 527     VAL_PQUERY
 528     VAL_PCQUERY
 529     VAL_ITEMORDER
 530     VAL_DBUPDATE
 531     VAL_EXPORTSPEC
 532     VAL_EXPORTINV
 533     VAL_NONE
 534     VAL_SETM
 535     VAL_SETG
 536     VAL_VAR1
 537     VAL_ESPEC1
 538    </screen>
 539
 540    <para>
 541     again, corresponding to the specific OIDs defined by the standard.
 542    </para>
 543
 544    <para>
 545     The desc field contains a brief, mnemonic name for the OID in question.
 546    </para>
 547
 548    <para>
 549     The function
 550    </para>
 551
 552    <screen>
 553     struct oident *oid_getentbyoid(int *o);
 554    </screen>
 555
 556    <para>
 557     takes as argument an OID, and returns a pointer to a static area
 558     containing an <literal>oident</literal> structure. You typically use
 559     this function when you receive a PDU containing an OID, and you wish
 560     to branch out depending on the specific OID value.
 561    </para>
 562
 563    <para>
 564     The function
 565    </para>
 566
 567    <screen>
 568     int *oid_ent_to_oid(struct oident *ent, int *dst);
 569    </screen>
 570
 571    <para>
 572     Takes as argument an <literal>oident</literal> structure - in which
 573     the <literal>proto</literal>, <literal>oclass</literal>/, and
 574     <literal>value</literal> fields are assumed to be set correctly -
 575     and returns a pointer to a the buffer as given by <literal>dst</literal>
 576     containing the base
 577     representation of the corresponding OID. The function returns
 578     NULL and the array dst is unchanged if a mapping couldn't place.
 579     The array <literal>dst</literal> should be at least of size
 580     <literal>OID_SIZE</literal>.
 581    </para>
 582    <para>
 583
 584     The <function>oid_ent_to_oid()</function> function can be used whenever
 585     you need to prepare a PDU containing one or more OIDs. The separation of
 586     the <literal>protocol</literal> element from the remainer of the
 587     OID-description makes it simple to write applications that can
 588     communicate with either Z39.50 or OSI SR-based applications.
 589    </para>
 590
 591    <para>
 592     The function
 593    </para>
 594
 595    <screen>
 596     oid_value oid_getvalbyname(const char *name);
 597    </screen>
 598
 599    <para>
 600     takes as argument a mnemonic OID name, and returns the
 601     <literal>/value</literal> field of the first entry in the database that
 602     contains the given name in its <literal>desc</literal> field.
 603    </para>
 604
 605    <para>
 606     Finally, the module provides the following utility functions, whose
 607     meaning should be obvious:
 608    </para>
 609
 610    <screen>
 611     void oid_oidcpy(int *t, int *s);
 612     void oid_oidcat(int *t, int *s);
 613     int oid_oidcmp(int *o1, int *o2);
 614     int oid_oidlen(int *o);
 615    </screen>
 616
 617    <note>
 618     <para>
 619      The OID module has been criticized - and perhaps rightly so
 620      - for needlessly abstracting the
 621      representation of OIDs. Other toolkits use a simple
 622      string-representation of OIDs with good results. In practice, we have
 623      found the interface comfortable and quick to work with, and it is a
 624      simple matter (for what it's worth) to create applications compatible
 625      with both ISO SR and Z39.50. Finally, the use of the
 626      <literal>/oident</literal> database is by no means mandatory.
 627      You can easily create your own system for representing OIDs, as long
 628      as it is compatible with the low-level integer-array representation
 629      of the ODR module.
 630     </para>
 631    </note>
 632
 633   </sect1>
 634
 635   <sect1><title>Nibble Memory</title>
 636
 637    <para>
 638     Sometimes when you need to allocate and construct a large,
 639     interconnected complex of structures, it can be a bit of a pain to
 640     release the associated memory again. For the structures describing the
 641     Z39.50 PDUs and related structures, it is convenient to use the
 642     memory-management system of the &odr; subsystem (see
 643     <link linkend="odr-use">Using ODR</link>). However, in some circumstances
 644     where you might otherwise benefit from using a simple nibble memory
 645     management system, it may be impractical to use
 646     <function>odr_malloc()</function> and <function>odr_reset()</function>.
 647     For this purpose, the memory manager which also supports the &odr;
 648     streams is made available in the NMEM module. The external interface
 649     to this module is given in the <filename>nmem.h</filename> file.
 650    </para>
 651
 652    <para>
 653     The following prototypes are given:
 654    </para>
 655
 656    <screen>
 657     NMEM nmem_create(void);
 658     void nmem_destroy(NMEM n);
 659     void *nmem_malloc(NMEM n, int size);
 660     void nmem_reset(NMEM n);
 661     int nmem_total(NMEM n);
 662     void nmem_init(void);
 663    </screen>
 664
 665    <para>
 666     The <function>nmem_create()</function> function returns a pointer to a
 667     memory control handle, which can be released again by
 668     <function>nmem_destroy()</function> when no longer needed.
 669     The function <function>nmem_malloc()</function> allocates a block of
 670     memory of the requested size. A call to <function>nmem_reset()</function>
 671     or <function>nmem_destroy()</function> will release all memory allocated
 672     on the handle since it was created (or since the last call to
 673     <function>nmem_reset()</function>. The function
 674     <function>nmem_total()</function> returns the number of bytes currently
 675     allocated on the handle.
 676    </para>
 677
 678    <note>
 679     <para>
 680      The nibble memory pool is shared amonst threads. POSIX
 681      mutex'es and WIN32 Critical sections are introduced to keep the
 682      module thread safe. On WIN32 function <function>nmem_init()</function>
 683      initialises the Critical Section handle and should be called once
 684      before any other nmem function is used.
 685     </para>
 686    </note>
 687
 688   </sect1>
 689  </chapter>
 690
 691  <!-- Keep this comment at the end of the file
 692  Local variables:
 693  mode: sgml
 694  sgml-omittag:t
 695  sgml-shorttag:t
 696  sgml-minimize-attributes:nil
 697  sgml-always-quote-attributes:t
 698  sgml-indent-step:1
 699  sgml-indent-data:t
 700  sgml-parent-document: "yaz.xml"
 701  sgml-local-catalogs: "../../docbook/docbook.cat"
 702  sgml-namecase-general:t
 703  End:
 704  -->