1 <!-- $Id: tools.xml,v 1.4 2001-08-08 19:33:21 adam Exp $ -->
2 <chapter><title>Supporting Tools</title>
5 In support of the service API - primarily the ASN module, which
6 provides the programmatic interface to the Z39.50 APDUs, &yaz; contains
7 a collection of tools that support the development of applications.
10 <sect1><title>Query Syntax Parsers</title>
13 Since the type-1 (RPN) query structure has no direct, useful string
14 representation, every origin application needs to provide some form of
15 mapping from a local query notation or representation to a
16 <token>Z_RPNQuery</token> structure. Some programmers will prefer to
17 construct the query manually, perhaps using
18 <function>odr_malloc()</function> to simplify memory management.
19 The &yaz; distribution includes two separate, query-generating tools
20 that may be of use to you.
23 <sect2><title id="PQF">Prefix Query Format</title>
26 Since RPN or reverse polish notation is really just a fancy way of
27 describing a suffix notation format (operator follows operands), it
28 would seem that the confusion is total when we now introduce a prefix
29 notation for RPN. The reason is one of simple laziness - it's somewhat
30 simpler to interpret a prefix format, and this utility was designed
31 for maximum simplicity, to provide a baseline representation for use
32 in simple test applications and scripting environments (like Tcl). The
33 demonstration client included with YAZ uses the PQF.
36 The PQF is defined by the pquery module in the YAZ library. The
37 <filename>pquery.h</filename> file provides the declaration of the
41 Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);
43 Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
44 Odr_oid **attributeSetP, const char *qbuf);
46 int p_query_attset (const char *arg);
49 The function <function>p_query_rpn()</function> takes as arguments an
50 &odr; stream (see section <link linkend="odr">The ODR Module</link>)
51 to provide a memory source (the structure created is released on
52 the next call to <function>odr_reset()</function> on the stream), a
53 protocol identifier (one of the constants <token>PROTO_Z3950</token> and
54 <token>PROTO_SR</token>), an attribute set reference, and
55 finally a null-terminated string holding the query string.
58 If the parse went well, <function>p_query_rpn()</function> returns a
59 pointer to a <literal>Z_RPNQuery</literal> structure which can be
60 placed directly into a <literal>Z_SearchRequest</literal>.
64 The <literal>p_query_attset</literal> specifies which attribute set
65 to use if the query doesn't specify one by the
66 <literal>@attrset</literal> operator.
67 The <literal>p_query_attset</literal> returns 0 if the argument is a
68 valid attribute set specifier; otherwise the function returns -1.
72 The grammar of the PQF is as follows:
76 Query ::= [ '@attrset' AttSet ] QueryStruct.
80 QueryStruct ::= [ Attribute ] Simple | Complex.
82 Attribute ::= '@attr' [ AttSet ] AttributeType '=' AttributeValue.
84 AttributeType ::= integer.
86 AttributeValue ::= integer.
88 Complex ::= Operator QueryStruct QueryStruct.
90 Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity.
92 Simple ::= ResultSet | Term.
94 ResultSet ::= '@set' string.
96 Term ::= string | '"' string '"'.
98 Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode.
100 Exclusion ::= '1' | '0' | 'void'.
102 Distance ::= integer.
104 Ordered ::= '1' | '0'.
106 Relation ::= integer.
108 WhichCode ::= 'known' | 'private' | integer.
110 UnitCode ::= integer.
114 You will note that the syntax above is a fairly faithful
115 representation of RPN, except for the Attibute, which has been
116 moved a step away from the term, allowing you to associate one or more
117 attributes with an entire query structure. The parser will
118 automatically apply the given attributes to each term as required.
122 The following are all examples of valid queries in the PQF.
130 @or "dylan" "zimmerman"
134 @or @and bob dylan @set Result-1
136 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming"
138 @attr 4=1 @attr 1=4 "self portrait"
140 @prox 0 3 1 2 k 2 dylan zimmerman
142 @and @attr 2=4 @attr gils 1=2038 -114 @attr 2=2 @attr gils 1=2039 -109
146 <sect2><title id="CCL">Common Command Language</title>
149 Not all users enjoy typing in prefix query structures and numerical
150 attribute values, even in a minimalistic test client. In the library
151 world, the more intuitive Common Command Language (or ISO 8777) has
152 enjoyed some popularity - especially before the widespread
153 availability of graphical interfaces. It is still useful in
154 applications where you for some reason or other need to provide a
155 symbolic language for expressing boolean query structures.
159 The EUROPAGATE research project working under the Libraries programme
160 of the European Commission's DG XIII has, amongst other useful tools,
161 implemented a general-purpose CCL parser which produces an output
162 structure that can be trivially converted to the internal RPN
163 representation of YAZ (The <literal>Z_RPNQuery</literal> structure).
164 Since the CCL utility - along with the rest of the software
165 produced by EUROPAGATE - is made freely available on a liberal license, it
166 is included as a supplement to YAZ.
169 <sect3><title>CCL Syntax</title>
172 The CCL parser obeys the following grammar for the FIND argument.
173 The syntax is annotated by in the lines prefixed by
174 <literal>‐‐</literal>.
178 CCL-Find ::= CCL-Find Op Elements
181 Op ::= "and" | "or" | "not"
182 -- The above means that Elements are separated by boolean operators.
184 Elements ::= '(' CCL-Find ')'
187 | Qualifiers Relation Terms
188 | Qualifiers Relation '(' CCL-Find ')'
189 | Qualifiers '=' string '-' string
190 -- Elements is either a recursive definition, a result set reference, a
191 -- list of terms, qualifiers followed by terms, qualifiers followed
192 -- by a recursive definition or qualifiers in a range (lower - upper).
194 Set ::= 'set' = string
195 -- Reference to a result set
197 Terms ::= Terms Prox Term
199 -- Proximity of terms.
203 -- This basically means that a term may include a blank
205 Qualifiers ::= Qualifiers ',' string
207 -- Qualifiers is a list of strings separated by comma
209 Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
210 -- Relational operators. This really doesn't follow the ISO8777
214 -- Proximity operator
219 The following queries are all valid:
231 (dylan and bob) or set=1
235 Assuming that the qualifiers <literal>ti</literal>, <literal>au</literal>
236 and <literal>date</literal> are defined we may use:
242 au=(bob dylan and slow train coming)
244 date>1980 and (ti=((self portrait)))
249 <sect3><title>CCL Qualifiers</title>
252 Qualifiers are used to direct the search to a particular searchable
253 index, such as title (ti) and author indexes (au). The CCL standard
254 itself doesn't specify a particular set of qualifiers, but it does
255 suggest a few short-hand notations. You can customize the CCL parser
256 to support a particular set of qualifiers to relect the current target
257 profile. Traditionally, a qualifier would map to a particular
258 use-attribute within the BIB-1 attribute set. However, you could also
259 define qualifiers that would set, for example, the
264 Consider a scenario where the target support ranked searches in the
265 title-index. In this case, the user could specify
269 ti,ranked=knuth computer
272 and the <literal>ranked</literal> would map to relation=relevance
273 (2=102) and the <literal>ti</literal> would map to title (1=4).
277 A "profile" with a set predefined CCL qualifiers can be read from a
278 file. The YAZ client reads its CCL qualifiers from a file named
279 <filename>default.bib</filename>. Each line in the file has the form:
283 <replaceable>qualifier-name</replaceable>
284 <replaceable>type</replaceable>=<replaceable>val</replaceable>
285 <replaceable>type</replaceable>=<replaceable>val</replaceable> ...
289 where <replaceable>qualifier-name</replaceable> is the name of the
290 qualifier to be used (eg. <literal>ti</literal>),
291 <replaceable>type</replaceable> is a BIB-1 category type and
292 <replaceable>val</replaceable> is the corresponding BIB-1 attribute
294 The <replaceable>type</replaceable> can be either numeric or it may be
295 either <literal>u</literal> (use), <literal>r</literal> (relation),
296 <literal>p</literal> (position), <literal>s</literal> (structure),
297 <literal>t</literal> (truncation) or <literal>c</literal> (completeness).
298 The <replaceable>qualifier-name</replaceable> <literal>term</literal>
299 has a special meaning.
300 The types and values for this definition is used when
301 <emphasis>no</emphasis> qualifiers are present.
305 Consider the following definition:
314 Two qualifiers are defined, <literal>ti</literal> and
315 <literal>au</literal>.
316 They both set the structure-attribute to phrase (1).
317 <literal>ti</literal>
318 sets the use-attribute to 4. <literal>au</literal> sets the
320 When no qualifiers are used in the query the structure-attribute is
321 set to free-form-text (105).
325 <sect3><title>CCL API</title>
327 All public definitions can be found in the header file
328 <filename>ccl.h</filename>. A profile identifier is of type
329 <literal>CCL_bibset</literal>. A profile must be created with the call
330 to the function <function>ccl_qual_mk</function> which returns a profile
331 handle of type <literal>CCL_bibset</literal>.
335 To read a file containing qualifier definitions the function
336 <function>ccl_qual_file</function> may be convenient. This function
337 takes an already opened <literal>FILE</literal> handle pointer as
338 argument along with a <literal>CCL_bibset</literal> handle.
342 To parse a simple string with a FIND query use the function
345 struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
346 int *error, int *pos);
349 which takes the CCL profile (<literal>bibset</literal>) and query
350 (<literal>str</literal>) as input. Upon successful completion the RPN
351 tree is returned. If an error eccur, such as a syntax error, the integer
352 pointed to by <literal>error</literal> holds the error code and
353 <literal>pos</literal> holds the offset inside query string in which
358 An english representation of the error may be obtained by calling
359 the <literal>ccl_err_msg</literal> function. The error codes are
360 listed in <filename>ccl.h</filename>.
364 To convert the CCL RPN tree (type
365 <literal>struct ccl_rpn_node *</literal>)
366 to the Z_RPNQuery of YAZ the function <function>ccl_rpn_query</function>
367 must be used. This function which is part of YAZ is implemented in
368 <filename>yaz-ccl.c</filename>.
369 After calling this function the CCL RPN tree is probably no longer
370 needed. The <literal>ccl_rpn_delete</literal> destroys the CCL RPN tree.
374 A CCL profile may be destroyed by calling the
375 <function>ccl_qual_rm</function> function.
379 The token names for the CCL operators may be changed by setting the
380 globals (all type <literal>char *</literal>)
381 <literal>ccl_token_and</literal>, <literal>ccl_token_or</literal>,
382 <literal>ccl_token_not</literal> and <literal>ccl_token_set</literal>.
383 An operator may have aliases, i.e. there may be more than one name for
384 the operator. To do this, separate each alias with a space character.
389 <sect1><title>Object Identifiers</title>
392 The basic YAZ representation of an OID is an array of integers,
393 terminated with the value -1. The &odr; module provides two
394 utility-functions to create and copy this type of data elements:
398 Odr_oid *odr_getoidbystr(ODR o, char *str);
402 Creates an OID based on a string-based representation using dots (.)
403 to separate elements in the OID.
407 Odr_oid *odr_oiddup(ODR odr, Odr_oid *o);
411 Creates a copy of the OID referenced by the <emphasis>o</emphasis>
413 Both functions take an &odr; stream as parameter. This stream is used to
414 allocate memory for the data elements, which is released on a
415 subsequent call to <function>odr_reset()</function> on that stream.
419 The OID module provides a higher-level representation of the
420 family of object identifers which describe the Z39.50 protocol and its
421 related objects. The definition of the module interface is given in
422 the <filename>oid.h</filename> file.
426 The interface is mainly based on the <literal>oident</literal> structure.
427 The definition of this structure looks like this:
431 typedef struct oident
436 int oidsuffix[OID_SIZE];
442 The proto field takes one of the values
451 If you don't care about talking to SR-based implementations (few
452 exist, and they may become fewer still if and when the ISO SR and ANSI
453 Z39.50 documents are merged into a single standard), you can ignore
454 this field on incoming packages, and always set it to PROTO_Z3950
455 for outgoing packages.
459 The oclass field takes one of the values
481 corresponding to the OID classes defined by the Z39.50 standard.
483 Finally, the value field takes one of the values
541 again, corresponding to the specific OIDs defined by the standard.
545 The desc field contains a brief, mnemonic name for the OID in question.
553 struct oident *oid_getentbyoid(int *o);
557 takes as argument an OID, and returns a pointer to a static area
558 containing an <literal>oident</literal> structure. You typically use
559 this function when you receive a PDU containing an OID, and you wish
560 to branch out depending on the specific OID value.
568 int *oid_ent_to_oid(struct oident *ent, int *dst);
572 Takes as argument an <literal>oident</literal> structure - in which
573 the <literal>proto</literal>, <literal>oclass</literal>/, and
574 <literal>value</literal> fields are assumed to be set correctly -
575 and returns a pointer to a the buffer as given by <literal>dst</literal>
577 representation of the corresponding OID. The function returns
578 NULL and the array dst is unchanged if a mapping couldn't place.
579 The array <literal>dst</literal> should be at least of size
580 <literal>OID_SIZE</literal>.
584 The <function>oid_ent_to_oid()</function> function can be used whenever
585 you need to prepare a PDU containing one or more OIDs. The separation of
586 the <literal>protocol</literal> element from the remainer of the
587 OID-description makes it simple to write applications that can
588 communicate with either Z39.50 or OSI SR-based applications.
596 oid_value oid_getvalbyname(const char *name);
600 takes as argument a mnemonic OID name, and returns the
601 <literal>/value</literal> field of the first entry in the database that
602 contains the given name in its <literal>desc</literal> field.
606 Finally, the module provides the following utility functions, whose
607 meaning should be obvious:
611 void oid_oidcpy(int *t, int *s);
612 void oid_oidcat(int *t, int *s);
613 int oid_oidcmp(int *o1, int *o2);
614 int oid_oidlen(int *o);
619 The OID module has been criticized - and perhaps rightly so
620 - for needlessly abstracting the
621 representation of OIDs. Other toolkits use a simple
622 string-representation of OIDs with good results. In practice, we have
623 found the interface comfortable and quick to work with, and it is a
624 simple matter (for what it's worth) to create applications compatible
625 with both ISO SR and Z39.50. Finally, the use of the
626 <literal>/oident</literal> database is by no means mandatory.
627 You can easily create your own system for representing OIDs, as long
628 as it is compatible with the low-level integer-array representation
635 <sect1><title>Nibble Memory</title>
638 Sometimes when you need to allocate and construct a large,
639 interconnected complex of structures, it can be a bit of a pain to
640 release the associated memory again. For the structures describing the
641 Z39.50 PDUs and related structures, it is convenient to use the
642 memory-management system of the &odr; subsystem (see
643 <link linkend="odr-use">Using ODR</link>). However, in some circumstances
644 where you might otherwise benefit from using a simple nibble memory
645 management system, it may be impractical to use
646 <function>odr_malloc()</function> and <function>odr_reset()</function>.
647 For this purpose, the memory manager which also supports the &odr;
648 streams is made available in the NMEM module. The external interface
649 to this module is given in the <filename>nmem.h</filename> file.
653 The following prototypes are given:
657 NMEM nmem_create(void);
658 void nmem_destroy(NMEM n);
659 void *nmem_malloc(NMEM n, int size);
660 void nmem_reset(NMEM n);
661 int nmem_total(NMEM n);
662 void nmem_init(void);
666 The <function>nmem_create()</function> function returns a pointer to a
667 memory control handle, which can be released again by
668 <function>nmem_destroy()</function> when no longer needed.
669 The function <function>nmem_malloc()</function> allocates a block of
670 memory of the requested size. A call to <function>nmem_reset()</function>
671 or <function>nmem_destroy()</function> will release all memory allocated
672 on the handle since it was created (or since the last call to
673 <function>nmem_reset()</function>. The function
674 <function>nmem_total()</function> returns the number of bytes currently
675 allocated on the handle.
680 The nibble memory pool is shared amonst threads. POSIX
681 mutex'es and WIN32 Critical sections are introduced to keep the
682 module thread safe. On WIN32 function <function>nmem_init()</function>
683 initialises the Critical Section handle and should be called once
684 before any other nmem function is used.
691 <!-- Keep this comment at the end of the file
696 sgml-minimize-attributes:nil
697 sgml-always-quote-attributes:t
700 sgml-parent-document: "yaz.xml"
701 sgml-local-catalogs: "../../docbook/docbook.cat"
702 sgml-namecase-general:t