1 <!-- $Id: tools.xml,v 1.12 2002-09-03 09:50:34 adam Exp $ -->
2 <chapter id="tools"><title>Supporting Tools</title>
5 In support of the service API - primarily the ASN module, which
6 provides the pro-grammatic interface to the Z39.50 APDUs, &yaz; contains
7 a collection of tools that support the development of applications.
10 <sect1 id="tools.query"><title>Query Syntax Parsers</title>
13 Since the type-1 (RPN) query structure has no direct, useful string
14 representation, every origin application needs to provide some form of
15 mapping from a local query notation or representation to a
16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
17 construct the query manually, perhaps using
18 <function>odr_malloc()</function> to simplify memory management.
19 The &yaz; distribution includes two separate, query-generating tools
20 that may be of use to you.
23 <sect2><title id="PQF">Prefix Query Format</title>
26 Since RPN or reverse polish notation is really just a fancy way of
27 describing a suffix notation format (operator follows operands), it
28 would seem that the confusion is total when we now introduce a prefix
29 notation for RPN. The reason is one of simple laziness - it's somewhat
30 simpler to interpret a prefix format, and this utility was designed
31 for maximum simplicity, to provide a baseline representation for use
32 in simple test applications and scripting environments (like Tcl). The
33 demonstration client included with YAZ uses the PQF.
38 The PQF have been adopted by other parties developing Z39.50
39 software. It is often referred to as Prefix Query Notation
44 The PQF is defined by the pquery module in the YAZ library.
45 There are two sets of function that have similar behavior. First
46 set operates on a PQF parser handle, second set doesn't. First set
47 set of functions are more flexible than the second set. Second set
48 is obsolete and is only provided to ensure backwards compatibility.
51 First set of functions all operate on a PQF parser handle:
54 #include <yaz/pquery.h>
56 YAZ_PQF_Parser yaz_pqf_create (void);
58 void yaz_pqf_destroy (YAZ_PQF_Parser p);
60 Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);
62 Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
63 Odr_oid **attributeSetId, const char *qbuf);
66 int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
69 A PQF parser is created and destructed by functions
70 <function>yaz_pqf_create</function> and
71 <function>yaz_pqf_destroy</function> respectively.
72 Function <function>yaz_pqf_parse</function> parses query given
73 by string <literal>qbuf</literal>. If parsing was successful,
74 a Z39.50 RPN Query is returned which is created using ODR stream
75 <literal>o</literal>. If parsing failed, a NULL pointer is
77 Function <function>yaz_pqf_scan</function> takes a scan query in
78 <literal>qbuf</literal>. If parsing was successful, the function
79 returns attributes plus term pointer and modifies
80 <literal>attributeSetId</literal> to hold attribute set for the
81 scan request - both allocated using ODR stream <literal>o</literal>.
82 If parsing failed, yaz_pqf_scan returns a NULL pointer.
83 Error information for bad queries can be obtained by a call to
84 <function>yaz_pqf_error</function> which returns an error code and
85 modifies <literal>*msg</literal> to point to an error description,
86 and modifies <literal>*off</literal> to the offset within last
87 query were parsing failed.
90 The second set of functions are declared as follows:
93 #include <yaz/pquery.h>
95 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
97 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
98 Odr_oid **attributeSetP, const char *qbuf);
100 int p_query_attset (const char *arg);
103 The function <function>p_query_rpn()</function> takes as arguments an
104 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
105 to provide a memory source (the structure created is released on
106 the next call to <function>odr_reset()</function> on the stream), a
107 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
108 <token>PROTO_SR</token>), an attribute set reference, and
109 finally a null-terminated string holding the query string.
112 If the parse went well, <function>p_query_rpn()</function> returns a
113 pointer to a <literal>Z_RPNQuery</literal> structure which can be
114 placed directly into a <literal>Z_SearchRequest</literal>.
115 If parsing failed, due to syntax error, a NULL pointer is returned.
118 The <literal>p_query_attset</literal> specifies which attribute set
119 to use if the query doesn't specify one by the
120 <literal>@attrset</literal> operator.
121 The <literal>p_query_attset</literal> returns 0 if the argument is a
122 valid attribute set specifier; otherwise the function returns -1.
126 The grammar of the PQF is as follows:
130 query ::= top-set query-struct.
132 top-set ::= [ '@attrset' string ]
134 query-struct ::= attr-spec | simple | complex | '@term' term-type
136 attr-spec ::= '@attr' [ string ] string query-struct
138 complex ::= operator query-struct query-struct.
140 operator ::= '@and' | '@or' | '@not' | '@prox' proximity.
142 simple ::= result-set | term.
144 result-set ::= '@set' string.
148 proximity ::= exclusion distance ordered relation which-code unit-code.
150 exclusion ::= '1' | '0' | 'void'.
152 distance ::= integer.
154 ordered ::= '1' | '0'.
156 relation ::= integer.
158 which-code ::= 'known' | 'private' | integer.
160 unit-code ::= integer.
162 term-type ::= 'general' | 'numeric' | 'string' | 'oid' |
167 You will note that the syntax above is a fairly faithful
168 representation of RPN, except for the Attribute, which has been
169 moved a step away from the term, allowing you to associate one or more
170 attributes with an entire query structure. The parser will
171 automatically apply the given attributes to each term as required.
175 The @attr operator is followed by an attribute specification
176 (<literal>attr-spec</literal> above). The specification consists
177 of optional an attribute set, an attribute type-value pair and
178 a sub query. The attribute type-value pair is packed in one string:
179 an attribute type, a dash, followed by an attribute value.
180 The type is always an integer but the value may be either an
181 integer or a string (if it doesn't start with a digit character).
185 Z39.50 version 3 defines various encoding of terms.
186 Use the @term operator to indicate the encoding type:
187 <literal>general</literal>, <literal>numeric</literal>,
188 <literal>string</literal> (for InternationalString), ..
189 If no term type has been given, the <literal>general</literal> form
190 is used which is the only encoding allowed in both version 2 - and 3
191 of the Z39.50 standard.
195 The following are all examples of valid queries in the PQF.
203 @or "dylan" "zimmerman"
207 @or @and bob dylan @set Result-1
211 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
213 @attr 4=1 @attr 1=4 "self portrait"
215 @prox 0 3 1 2 k 2 dylan zimmerman
217 @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
219 @term string "a UTF-8 string, maybe?"
221 @attr 1=/book/title computer
225 <sect2><title id="CCL">Common Command Language</title>
228 Not all users enjoy typing in prefix query structures and numerical
229 attribute values, even in a minimalistic test client. In the library
230 world, the more intuitive Common Command Language (or ISO 8777) has
231 enjoyed some popularity - especially before the widespread
232 availability of graphical interfaces. It is still useful in
233 applications where you for some reason or other need to provide a
234 symbolic language for expressing boolean query structures.
238 The <ulink url="http://europagate.dtv.dk/">EUROPAGATE</ulink>
239 research project working under the Libraries programme
240 of the European Commission's DG XIII has, amongst other useful tools,
241 implemented a general-purpose CCL parser which produces an output
242 structure that can be trivially converted to the internal RPN
243 representation of &yaz; (The <literal>Z_RPNQuery</literal> structure).
244 Since the CCL utility - along with the rest of the software
245 produced by EUROPAGATE - is made freely available on a liberal
246 license, it is included as a supplement to &yaz;.
249 <sect3><title>CCL Syntax</title>
252 The CCL parser obeys the following grammar for the FIND argument.
253 The syntax is annotated by in the lines prefixed by
254 <literal>‐‐</literal>.
258 CCL-Find ::= CCL-Find Op Elements
261 Op ::= "and" | "or" | "not"
262 -- The above means that Elements are separated by boolean operators.
264 Elements ::= '(' CCL-Find ')'
267 | Qualifiers Relation Terms
268 | Qualifiers Relation '(' CCL-Find ')'
269 | Qualifiers '=' string '-' string
270 -- Elements is either a recursive definition, a result set reference, a
271 -- list of terms, qualifiers followed by terms, qualifiers followed
272 -- by a recursive definition or qualifiers in a range (lower - upper).
274 Set ::= 'set' = string
275 -- Reference to a result set
277 Terms ::= Terms Prox Term
279 -- Proximity of terms.
283 -- This basically means that a term may include a blank
285 Qualifiers ::= Qualifiers ',' string
287 -- Qualifiers is a list of strings separated by comma
289 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
290 -- Relational operators. This really doesn't follow the ISO8777
294 -- Proximity operator
299 The following queries are all valid:
311 (dylan and bob) or set=1
315 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
316 and <literal>date</literal> are defined we may use:
322 au=(bob dylan and slow train coming)
324 date>1980 and (ti=((self portrait)))
329 <sect3><title>CCL Qualifiers</title>
332 Qualifiers are used to direct the search to a particular searchable
333 index, such as title (ti) and author indexes (au). The CCL standard
334 itself doesn't specify a particular set of qualifiers, but it does
335 suggest a few short-hand notations. You can customize the CCL parser
336 to support a particular set of qualifiers to reflect the current target
337 profile. Traditionally, a qualifier would map to a particular
338 use-attribute within the BIB-1 attribute set. However, you could also
339 define qualifiers that would set, for example, the
344 Consider a scenario where the target support ranked searches in the
345 title-index. In this case, the user could specify
349 ti,ranked=knuth computer
352 and the <literal>ranked</literal> would map to relation=relevance
353 (2=102) and the <literal>ti</literal> would map to title (1=4).
357 A "profile" with a set predefined CCL qualifiers can be read from a
358 file. The YAZ client reads its CCL qualifiers from a file named
359 <filename>default.bib</filename>. Each line in the file has the form:
363 <replaceable>qualifier-name</replaceable>
364 <replaceable>type</replaceable>=<replaceable>val</replaceable>
365 <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
369 where <replaceable>qualifier-name</replaceable> is the name of the
370 qualifier to be used (eg. <literal>ti</literal>),
371 <replaceable>type</replaceable> is a BIB-1 category type and
372 <replaceable>val</replaceable> is the corresponding BIB-1 attribute
374 The <replaceable>type</replaceable> can be either numeric or it may be
375 either <literal>u</literal> (use), <literal>r</literal> (relation),
376 <literal>p</literal> (position), <literal>s</literal> (structure),
377 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
378 The <replaceable>qualifier-name</replaceable> <literal>term</literal>
379 has a special meaning.
380 The types and values for this definition is used when
381 <emphasis>no</emphasis> qualifiers are present.
385 Consider the following definition:
394 Two qualifiers are defined, <literal>ti</literal> and
395 <literal>au</literal>.
396 They both set the structure-attribute to phrase (1).
397 <literal>ti</literal>
398 sets the use-attribute to 4. <literal>au</literal> sets the
400 When no qualifiers are used in the query the structure-attribute is
401 set to free-form-text (105).
405 <sect3><title>CCL API</title>
407 All public definitions can be found in the header file
408 <filename>ccl.h</filename>. A profile identifier is of type
409 <literal>CCL_bibset</literal>. A profile must be created with the call
410 to the function <function>ccl_qual_mk</function> which returns a profile
411 handle of type <literal>CCL_bibset</literal>.
415 To read a file containing qualifier definitions the function
416 <function>ccl_qual_file</function> may be convenient. This function
417 takes an already opened <literal>FILE</literal> handle pointer as
418 argument along with a <literal>CCL_bibset</literal> handle.
422 To parse a simple string with a FIND query use the function
425 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
426 int *error, int *pos);
429 which takes the CCL profile (<literal>bibset</literal>) and query
430 (<literal>str</literal>) as input. Upon successful completion the RPN
431 tree is returned. If an error occur, such as a syntax error, the integer
432 pointed to by <literal>error</literal> holds the error code and
433 <literal>pos</literal> holds the offset inside query string in which
438 An English representation of the error may be obtained by calling
439 the <literal>ccl_err_msg</literal> function. The error codes are
440 listed in <filename>ccl.h</filename>.
444 To convert the CCL RPN tree (type
445 <literal>struct ccl_rpn_node *</literal>)
446 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
447 must be used. This function which is part of YAZ is implemented in
448 <filename>yaz-ccl.c</filename>.
449 After calling this function the CCL RPN tree is probably no longer
450 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
454 A CCL profile may be destroyed by calling the
455 <function>ccl_qual_rm</function> function.
459 The token names for the CCL operators may be changed by setting the
460 globals (all type <literal>char *</literal>)
461 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
462 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
463 An operator may have aliases, i.e. there may be more than one name for
464 the operator. To do this, separate each alias with a space character.
469 <sect1 id="tools.oid"><title>Object Identifiers</title>
472 The basic YAZ representation of an OID is an array of integers,
473 terminated with the value -1. The &odr; module provides two
474 utility-functions to create and copy this type of data elements:
478 Odr_oid *odr_getoidbystr(ODR o, char *str);
482 Creates an OID based on a string-based representation using dots (.)
483 to separate elements in the OID.
487 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
491 Creates a copy of the OID referenced by the <emphasis>o</emphasis>
493 Both functions take an &odr; stream as parameter. This stream is used to
494 allocate memory for the data elements, which is released on a
495 subsequent call to <function>odr_reset()</function> on that stream.
499 The OID module provides a higher-level representation of the
500 family of object identifiers which describe the Z39.50 protocol and its
501 related objects. The definition of the module interface is given in
502 the <filename>oid.h</filename> file.
506 The interface is mainly based on the <literal>oident</literal> structure.
507 The definition of this structure looks like this:
511 typedef struct oident
516 int oidsuffix[OID_SIZE];
522 The proto field takes one of the values
531 If you don't care about talking to SR-based implementations (few
532 exist, and they may become fewer still if and when the ISO SR and ANSI
533 Z39.50 documents are merged into a single standard), you can ignore
534 this field on incoming packages, and always set it to PROTO_Z3950
535 for outgoing packages.
539 The oclass field takes one of the values
561 corresponding to the OID classes defined by the Z39.50 standard.
563 Finally, the value field takes one of the values
621 again, corresponding to the specific OIDs defined by the standard.
625 The desc field contains a brief, mnemonic name for the OID in question.
633 struct oident *oid_getentbyoid(int *o);
637 takes as argument an OID, and returns a pointer to a static area
638 containing an <literal>oident</literal> structure. You typically use
639 this function when you receive a PDU containing an OID, and you wish
640 to branch out depending on the specific OID value.
648 int *oid_ent_to_oid(struct oident *ent, int *dst);
652 Takes as argument an <literal>oident</literal> structure - in which
653 the <literal>proto</literal>, <literal>oclass</literal>/, and
654 <literal>value</literal> fields are assumed to be set correctly -
655 and returns a pointer to a the buffer as given by <literal>dst</literal>
657 representation of the corresponding OID. The function returns
658 NULL and the array dst is unchanged if a mapping couldn't place.
659 The array <literal>dst</literal> should be at least of size
660 <literal>OID_SIZE</literal>.
664 The <function>oid_ent_to_oid()</function> function can be used whenever
665 you need to prepare a PDU containing one or more OIDs. The separation of
666 the <literal>protocol</literal> element from the remainder of the
667 OID-description makes it simple to write applications that can
668 communicate with either Z39.50 or OSI SR-based applications.
676 oid_value oid_getvalbyname(const char *name);
680 takes as argument a mnemonic OID name, and returns the
681 <literal>/value</literal> field of the first entry in the database that
682 contains the given name in its <literal>desc</literal> field.
686 Finally, the module provides the following utility functions, whose
687 meaning should be obvious:
691 void oid_oidcpy(int *t, int *s);
692 void oid_oidcat(int *t, int *s);
693 int oid_oidcmp(int *o1, int *o2);
694 int oid_oidlen(int *o);
699 The OID module has been criticized - and perhaps rightly so
700 - for needlessly abstracting the
701 representation of OIDs. Other toolkits use a simple
702 string-representation of OIDs with good results. In practice, we have
703 found the interface comfortable and quick to work with, and it is a
704 simple matter (for what it's worth) to create applications compatible
705 with both ISO SR and Z39.50. Finally, the use of the
706 <literal>/oident</literal> database is by no means mandatory.
707 You can easily create your own system for representing OIDs, as long
708 as it is compatible with the low-level integer-array representation
715 <sect1 id="tools.nmem"><title>Nibble Memory</title>
718 Sometimes when you need to allocate and construct a large,
719 interconnected complex of structures, it can be a bit of a pain to
720 release the associated memory again. For the structures describing the
721 Z39.50 PDUs and related structures, it is convenient to use the
722 memory-management system of the &odr; subsystem (see
723 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
724 where you might otherwise benefit from using a simple nibble memory
725 management system, it may be impractical to use
726 <function>odr_malloc()</function> and <function>odr_reset()</function>.
727 For this purpose, the memory manager which also supports the &odr;
728 streams is made available in the NMEM module. The external interface
729 to this module is given in the <filename>nmem.h</filename> file.
733 The following prototypes are given:
737 NMEM nmem_create(void);
738 void nmem_destroy(NMEM n);
739 void *nmem_malloc(NMEM n, int size);
740 void nmem_reset(NMEM n);
741 int nmem_total(NMEM n);
742 void nmem_init(void);
743 void nmem_exit(void);
747 The <function>nmem_create()</function> function returns a pointer to a
748 memory control handle, which can be released again by
749 <function>nmem_destroy()</function> when no longer needed.
750 The function <function>nmem_malloc()</function> allocates a block of
751 memory of the requested size. A call to <function>nmem_reset()</function>
752 or <function>nmem_destroy()</function> will release all memory allocated
753 on the handle since it was created (or since the last call to
754 <function>nmem_reset()</function>. The function
755 <function>nmem_total()</function> returns the number of bytes currently
756 allocated on the handle.
760 The nibble memory pool is shared amongst threads. POSIX
761 mutex'es and WIN32 Critical sections are introduced to keep the
762 module thread safe. Function <function>nmem_init()</function>
763 initializes the nibble memory library and it is called automatically
764 the first time the <literal>YAZ.DLL</literal> is loaded. &yaz; uses
765 function <function>DllMain</function> to achieve this. You should
766 <emphasis>not</emphasis> call <function>nmem_init</function> or
767 <function>nmem_exit</function> unless you're absolute sure what
768 you're doing. Note that in previous &yaz; versions you'd have to call
769 <function>nmem_init</function> yourself.
775 <!-- Keep this comment at the end of the file
780 sgml-minimize-attributes:nil
781 sgml-always-quote-attributes:t
784 sgml-parent-document: "yaz.xml"
785 sgml-local-catalogs: nil
786 sgml-namecase-general:t