1 # $Id: ZOOM.pod,v 1.35 2006-04-12 12:03:10 mike Exp $
8 ZOOM - Perl extension implementing the ZOOM API for Information Retrieval
14 $conn = new ZOOM::Connection($host, $port,
15 databaseName => "mydb");
16 $conn->option(preferredRecordSyntax => "usmarc");
17 $rs = $conn->search_pqf('@attr 1=4 dinosaur');
19 print $rs->record(0)->render();
22 print "Error ", $@->code(), ": ", $@->message(), "\n";
27 This module provides a nice, Perlish implementation of the ZOOM
28 Abstract API described and documented at http://zoom.z3950.org/api/
30 the ZOOM module is implemented as a set of thin classes on top of the
31 non-OO functions provided by this distribution's C<Net::Z3950::ZOOM>
33 turn is a thin layer on top of the ZOOM-C code supplied as part of
34 Index Data's YAZ Toolkit. Because ZOOM-C is also the underlying code
35 that implements ZOOM bindings in C++, Visual Basic, Scheme, Ruby, .NET
36 (including C#) and other languages, this Perl module works compatibly
37 with those other implementations. (Of course, the point of a public
38 API such as ZOOM is that all implementations should be compatible
39 anyway; but knowing that the same code is running is reassuring.)
41 The ZOOM module provides two enumerations (C<ZOOM::Error> and
42 C<ZOOM::Event>), three utility functions C<diag_str()>, C<event_str()>
43 and C<event()> in the C<ZOOM> package itself, and eight classes:
53 Of these, the Query class is abstract, and has three concrete
58 C<ZOOM::Query::CQL2RPN>.
59 Finally, it also provides a
61 module which supplies a useful general-purpose logging facility.
62 Many useful ZOOM applications can be built using only the Connection,
63 ResultSet, Record and Exception classes, as in the example
66 A typical application will begin by creating an Connection object,
67 then using that to execute searches that yield ResultSet objects, then
68 fetching records from the result-sets to yield Record objects. If an
69 error occurs, an Exception object is thrown and can be dealt with.
71 More sophisticated applications might also browse the server's indexes
72 to create a ScanSet, from which indexed terms may be retrieved; others
73 might send ``Extended Services'' Packages to the server, to achieve
74 non-standard tasks such as database creation and record update.
75 Searching using a query syntax other than PQF can be done using an
76 query object of one of the Query subclasses. Finally, sets of options
77 may be manipulated independently of the objects they are associated
78 with using an Options object.
80 In general, method calls throw an exception if anything goes wrong, so
81 you don't need to test for success after each call. See the section
82 below on the Exception class for details.
84 =head1 UTILITY FUNCTIONS
86 =head2 ZOOM::diag_str()
88 $msg = ZOOM::diag_str(ZOOM::Error::INVALID_QUERY);
90 Returns a human-readable English-language string corresponding to the
91 error code that is its own parameter. This works for any error-code
93 C<ZOOM::Exception::code()>,
94 C<ZOOM::Connection::error_x()>
96 C<ZOOM::Connection::errcode()>,
97 irrespective of whether it is a member of the C<ZOOM::Error>
98 enumeration or drawn from the BIB-1 diagnostic set.
100 =head2 ZOOM::event_str()
102 $msg = ZOOM::event_str(ZOOM::Event::RECV_APDU);
104 Returns a human-readable English-language string corresponding to the
105 event code that is its own parameter. This works for any value of the
106 C<ZOOM::Event> enumeration.
110 $connsRef = [ $conn1, $conn2, $conn3 ];
111 $which = ZOOM::event($connsRef);
112 $ev = $connsRef->[$which-1]->last_event()
115 Used only in complex asynchronous applications, this function takes a
116 reference to a list of Connection objects, waits until an event
117 occurs on any one of them, and returns an integer indicating which of
118 the connections it occurred on. The return value is a 1-based index
119 into the list; 0 is returned if no event occurs within the longest
120 timeout specified by the C<timeout> options of all the connections.
122 See the section below on asynchronous applications.
126 The eight ZOOM classes are described here in ``sensible order'':
127 first, the four commonly used classes, in the he order that they will
128 tend to be used in most programs (Connection, ResultSet, Record,
129 Exception); then the four more esoteric classes in descending order of
130 how often they are needed.
132 With the exception of the Options class, which is an extension to the
133 ZOOM model, the introduction to each class includes a link to the
134 relevant section of the ZOOM Abstract API.
136 =head2 ZOOM::Connection
138 $conn = new ZOOM::Connection("indexdata.dk:210/gils");
139 print("server is '", $conn->option("serverImplementationName"), "'\n");
140 $conn->option(preferredRecordSyntax => "usmarc");
141 $rs = $conn->search_pqf('@attr 1=4 mineral');
142 $ss = $conn->scan('@attr 1=1003 a');
143 if ($conn->errcode() != 0) {
144 die("somthing went wrong: " . $conn->errmsg())
148 This class represents a connection to an information retrieval server,
149 using an IR protocol such as ANSI/NISO Z39.50, SRW (the
150 Search/Retrieve Webservice), SRU (the Search/Retrieve URL) or
151 OpenSearch. Not all of these protocols require a low-level connection
152 to be maintained, but the Connection object nevertheless provides a
153 location for the necessary cache of configuration and state
154 information, as well as a uniform API to the connection-oriented
155 facilities (searching, index browsing, etc.), provided by these
158 See the description of the C<Connection> class in the ZOOM Abstract
160 http://zoom.z3950.org/api/zoom-current.html#3.2
166 $conn = new ZOOM::Connection("indexdata.dk", 210);
167 $conn = new ZOOM::Connection("indexdata.dk:210/gils");
168 $conn = new ZOOM::Connection("tcp:indexdata.dk:210/gils");
169 $conn = new ZOOM::Connection("http:indexdata.dk:210/gils");
170 $conn = new ZOOM::Connection("indexdata.dk", 210,
171 databaseName => "mydb",
172 preferredRecordSyntax => "marc");
174 Creates a new Connection object, and immediately connects it to the
175 specified server. If you want to make a new Connection object but
176 delay forging the connection, use the C<create()> and C<connect()>
179 This constructor can be called with two arguments or a single
180 argument. In the former case, the arguments are the name and port
181 number of the Z39.50 server to connect to; in the latter case, the
182 single argument is a YAZ service-specifier string of the form
184 When the two-option form is used (which may be done using a vacuous
185 second argument of zero), any number of additional argument pairs may
186 be provided, which are interpreted as key-value pairs to be set as
187 options after the Connection object is created but before it is
188 connected to the server. This is a convenient way to set options,
189 including those that must be set before connecting such as
190 authentication tokens.
196 [I<scheme>:]I<host>[:I<port>][/I<databaseName>]
200 In which the I<host> and I<port> parts are as in the two-argument
201 form, the I<databaseName> if provided specifies the name of the
202 database to be used in subsequent searches on this connection, and the
203 optional I<scheme> (default C<tcp>) indicates what protocol should be
204 used. At present, the following schemes are supported:
214 Z39.50 connection encrypted using SSL (Secure Sockets Layer). Not
215 many servers support this, but Index Data's Zebra is one that does.
219 Z39.50 connection on a Unix-domain (local) socket, in which case the
220 I<hostname> portion of the string is instead used as a filename in the
225 SRW connection using SOAP over HTTP.
229 Support for SRU will follow in the fullness of time.
231 If an error occurs, an exception is thrown. This may indicate a
232 networking problem (e.g. the host is not found or unreachable), or a
233 protocol-level problem (e.g. a Z39.50 server rejected the Init
236 =head4 create() / connect()
238 $options = new ZOOM::Options();
239 $options->option(implementationName => "my client");
240 $conn = create ZOOM::Connection($options)
241 $conn->connect($host, 0);
243 The usual Connection constructor, C<new()> brings a new object into
244 existence and forges the connection to the server all in one
245 operation, which is often what you want. For applications that need
246 more control, however, these two method separate the two steps,
247 allowing additional steps in between such as the setting of options.
249 C<create()> creates and returns a new Connection object, which is
250 I<not> connected to any server. It may be passed an options block, of
251 type C<ZOOM::Options> (see below), into which options may be set
252 before or after the creation of the Connection. The connection to the
253 server may then be forged by the C<connect()> method, the arguments of
254 which are the same as those of the C<new()> constructor.
256 =head4 error_x() / errcode() / errmsg() / addinfo() / diagset()
258 ($errcode, $errmsg, $addinfo, $diagset) = $conn->error_x();
259 $errcode = $conn->errcode();
260 $errmsg = $conn->errmsg();
261 $addinfo = $conn->addinfo();
262 $diagset = $conn->diagset();
264 These methods may be used to obtain information about the last error
265 to have occurred on a connection - although typically they will not
266 been used, as the same information is available through the
267 C<ZOOM::Exception> that is thrown when the error occurs. The
273 methods each return one element of the diagnostic, and
275 returns all four at once.
277 See the C<ZOOM::Exception> for the interpretation of these elements.
279 =head4 option() / option_binary()
281 print("server is '", $conn->option("serverImplementationName"), "'\n");
282 $conn->option(preferredRecordSyntax => "usmarc");
283 $conn->option_binary(iconBlob => "foo\0bar");
284 die if length($conn->option_binary("iconBlob") != 7);
286 Objects of the Connection, ResultSet, ScanSet and Package classes
287 carry with them a set of named options which affect their behaviour in
288 certain ways. See the ZOOM-C options documentation for details:
290 Connection options are listed at
291 http://indexdata.com/yaz/doc/zoom.tkl#zoom.connections
293 These options are set and fetched using the C<option()> method, which
294 may be called with either one or two arguments. In the two-argument
295 form, the option named by the first argument is set to the value of
296 the second argument, and its old value is returned. In the
297 one-argument form, the value of the specified option is returned.
299 For historical reasons, option values are not binary-clean, so that a
300 value containing a NUL byte will be returned in truncated form. The
301 C<option_binary()> method behaves identically to C<option()> except
302 that it is binary-clean, so that values containing NUL bytes are set
303 and returned correctly.
305 =head4 search() / search_pqf()
307 $rs = $conn->search(new ZOOM::Query::CQL('title=dinosaur'));
308 # The next two lines are equivalent
309 $rs = $conn->search(new ZOOM::Query::PQF('@attr 1=4 dinosaur'));
310 $rs = $conn->search_pqf('@attr 1=4 dinosaur');
312 The principal purpose of a search-and-retrieve protocol is searching
313 (and, er, retrieval), so the principal method used on a Connection
314 object is C<search()>. It accepts a single argument, a C<ZOOM::Query>
315 object (or, more precisely, an object of a subclass of this class);
316 and it creates and returns a new ResultSet object representing the set
317 of records resulting from the search.
319 Since queries using PQF (Prefix Query Format) are so common, we make
320 them a special case by providing a C<search_pqf()> method. This is
321 identical to C<search()> except that it accepts a string containing
322 the query rather than an object, thereby obviating the need to create
323 a C<ZOOM::Query::PQF> object. See the documentation of that class for
324 information about PQF.
326 =head4 scan() / scan_pqf()
328 $rs = $conn->scan(new ZOOM::Query::CQL('title=dinosaur'));
329 # The next two lines are equivalent
330 $rs = $conn->scan(new ZOOM::Query::PQF('@attr 1=4 dinosaur'));
331 $rs = $conn->scan_pqf('@attr 1=4 dinosaur');
333 Many Z39.50 servers allow you to browse their indexes to find terms to
334 search for. This is done using the C<scan> method, which creates and
335 returns a new ScanSet object representing the set of terms resulting
338 C<scan()> takes a single argument, but it has to work hard: it
339 specifies both what index to scan for terms, and where in the index to
340 start scanning. What's more, the specification of what index to scan
341 includes multiple facets, such as what database fields it's an index
342 of (author, subject, title, etc.) and whether to scan for whole fields
343 or single words (e.g. the title ``I<The Empire Strikes Back>'', or the
344 four words ``Back'', ``Empire'', ``Strikes'' and ``The'', interleaved
345 with words from other titles in the same index.
347 All of this is done by using a Query object representing a query of a
348 single term as the C<scan()> argument. The attributes associated with
349 the term indicate which index is to be used, and the term itself
350 indicates the point in the index at which to start the scan. For
351 example, if the argument is the query C<@attr 1=4 fish>, then
357 This is the BIB-1 attribute with type 1 (meaning access-point, which
358 specifies an index), and type 4 (which means ``title''). So the scan
359 is in the title index.
363 Start the scan from the lexicographically earliest term that is equal
364 to or falls after ``fish''.
368 The argument C<@attr 1=4 @attr 6=3 fish> would behave similarly; but
369 the BIB-1 attribute 6=3 mean completeness=``complete field'', so the
370 scan would be for complete titles rather than for words occurring in
373 This takes a bit of getting used to.
375 The behaviour is C<scan()> is affected by the following options, which
376 may be set on the Connection through which the scan is done:
380 =item number [default: 10]
382 Indicates how many terms should be returned in the ScanSet. The
383 number actually returned may be less, if the start-point is near the
384 end of the index, but will not be greater.
386 =item position [default: 1]
388 A 1-based index specifying where in the returned list of terms the
389 seed-term should appear. By default it should be the first term
390 returned, but C<position> may be set, for example, to zero (requesting
391 the next terms I<after> the seed-term), or to the same value as
392 C<number> (requesting the index terms I<before> the seed term).
394 =item stepSize [default: 0]
396 An integer indicating how many indexed terms are to be skipped between
397 each one returned in the ScanSet. By default, no terms are skipped,
398 but overriding this can be useful to get a high-level overview of the
401 Since scans using PQF (Prefix Query Format) are so common, we make
402 them a special case by providing a C<scan_pqf()> method. This is
403 identical to C<scan()> except that it accepts a string containing the
404 query rather than an object, thereby obviating the need to create a
405 C<ZOOM::Query::PQF> object.
411 $p = $conn->package();
412 $o = new ZOOM::Options();
413 $o->option(databaseName => "newdb");
414 $p = $conn->package($o);
416 Creates and returns a new C<ZOOM::Package>, to be used in invoking an
417 Extended Service. An options block may optionally be passed in. See
418 the C<ZOOM::Package> documentation.
422 if ($conn->last_event() == ZOOM::Event::CONNECT) {
423 print "Connected!\n";
426 Returns a C<ZOOM::Event> enumerated value indicating the type of the
427 last event that occurred on the connection. This is used only in
428 complex asynchronous applications - see the sections below on the
429 C<ZOOM::Event> enumeration and asynchronous applications.
435 Destroys a Connection object, tearing down any low-level connection
436 associated with it and freeing its resources. It is an error to reuse
437 a Connection that has been C<destroy()>ed.
439 =head2 ZOOM::ResultSet
441 $rs = $conn->search_pqf('@attr 1=4 mineral');
444 $rec = $rs->record($i-1);
445 print $rec->render();
448 A ResultSet object represents the set of zero or more records
449 resulting from a search, and is the means whereby these records can be
450 retrieved. A ResultSet object may maintain client side cache or some,
451 less, none, all or more of the server's records: in general, this is
452 supposed to an implementaton detail of no interest to a typical
453 application, although more sophisticated applications do have
454 facilities for messing with the cache. Most applications will only
455 need the C<size()>, C<record()> and C<sort()> methods.
457 There is no C<new()> method nor any other explicit constructor. The
458 only way to create a new ResultSet is by using C<search()> (or
459 C<search_pqf()>) on a Connection.
461 See the description of the C<Result Set> class in the ZOOM Abstract
463 http://zoom.z3950.org/api/zoom-current.html#3.4
469 $rs->option(elementSetName => "f");
471 Allows options to be set into, and read from, a ResultSet, just like
472 the Connection class's C<option()> method. There is no
473 C<option_binary()> method for ResultSet objects.
475 ResultSet options are listed at
476 http://indexdata.com/yaz/doc/zoom.resultsets.tkl
480 print "Found ", $rs->size(), " records\n";
482 Returns the number of records in the result set.
484 =head4 record() / record_immediate()
486 $rec = $rs->record(0);
487 $rec2 = $rs->record_immediate(0);
488 $rec3 = $rs->record_immediate(1)
489 or print "second record wasn't in cache\n";
491 The C<record()> method returns a C<ZOOM::Record> object representing
492 a record from result-set, whose position is indicated by the argument
493 passed in. This is a zero-based index, so that legitimate values
494 range from zero to C<$rs->size()-1>.
496 The C<record_immediate()> API is identical, but it never invokes a
497 network operation, merely returning the record from the ResultSet's
498 cache if it's already there, or an undefined value otherwise. So if
499 you use this method, B<you must always check the return value>.
503 $rs->records(0, 10, 0);
505 print $rs->record_immediate($i)->render();
508 @nextseven = $rs->records(10, 7, 1);
510 The C<record_immediate()> method only fetches records from the cache,
511 whereas C<record()> fetches them from the server if they have not
512 already been cached; but the ZOOM module has to guess what the most
513 efficient strategy for this is. It might fetch each record, alone
514 when asked for: that's optimal in an application that's only
515 interested in the top hit from each search, but pessimal for one that
516 wants to display a whole list of results. Conversely, the software's
517 strategy might be always to ask for blocks of a twenty records:
518 that's great for assembling long lists of things, but wasteful when
519 only one record is wanted. The problem is that the ZOOM module can't
520 tell, when you call C<$rs->record()>, what your intention is.
522 But you can tell it. The C<records()> method fetches a sequence of
523 records, all in one go. It takes three arguments: the first is the
524 zero-based index of the first record in the sequence, the second is
525 the number of records to fetch, and the third is a boolean indication
526 of whether or not to return the retrieved records as well as adding
527 them to the cache. (You can always pass 1 for this if you like, and
528 Perl will discard the unused return value, but there is a small
529 efficiency gain to be had by passing 0.)
531 Once the records have been retrieved from the server
532 (i.e. C<records()> has completed without throwing an exception), they
533 can be fetched much more efficiently using C<record()> - or
534 C<record_immediate()>, which is then guaranteed to succeed.
540 Resets the ResultSet's record cache, so that subsequent invocations of
541 C<record_immediate()> will fail. I struggle to imagine a real
542 scenario where you'd want to do this.
546 if ($rs->sort("yaz", "1=4 >i 1=21 >s") < 0) {
550 Sorts the ResultSet in place (discarding any cached records, as they
551 will in general be sorted into a different position). There are two
552 arguments: the first is a string indicating the type of the
553 sort-specification, and the second is the specification itself.
555 The C<sort()> method returns 0 on success, or -1 if the
556 sort-specification is invalid.
558 At present, the only supported sort-specification type is C<yaz>.
559 Such a specification consists of a space-separated sequence of keys,
560 each of which itself consists of two space-separated words (so that
561 the total number of words in the sort-specification is even). The two
562 words making up each key are a field and a set of flags. The field
563 can take one of two forms: if it contains an C<=> sign, then it is a
564 BIB-1 I<type>=I<value> pair specifying which field to sort
565 (e.g. C<1=4> for a title sort); otherwise it is sent for the server to
566 interpret as best it can. The word of flags is made up from one or
567 more of the following: C<s> for case sensitive, C<i> for case
568 insensitive; C<<> for ascending order and C<E<gt>> for descending
571 For example, the sort-specification in the code-fragment above will
572 sort the records in C<$rs> case-insensitively in descending order of
573 title, with records having equivalent titles sorted case-sensitively
574 in ascending order of subject. (The BIB-1 access points 4 and 21
575 represent title and subject respectively.)
581 Destroys a ResultSet object, freeing its resources. It is an error to
582 reuse a ResultSet that has been C<destroy()>ed.
586 $rec = $rs->record($i);
587 print $rec->render();
589 $marc = new_from_usmarc MARC::Record($raw);
590 print "Record title is: ", $marc->title(), "\n";
592 A Record object represents a record that has been retrived from the
595 There is no C<new()> method nor any other explicit constructor. The
596 only way to create a new Record is by using C<record()> (or
597 C<record_immediate()>, or C<records()>) on a ResultSet.
599 In general, records are ``owned'' by their result-sets that they were
600 retrieved from, so they do not have to be explicitly memory-managed:
601 they are deallocated (and therefore can no longer be used) when the
602 result-set is destroyed.
604 See the description of the C<Record> class in the ZOOM Abstract
606 http://zoom.z3950.org/api/zoom-current.html#3.5
612 print $rec->render();
613 print $rec->render("charset=latin1,utf8");
615 Returns a human-readable representation of the record. Beyond that,
616 no promises are made: careful programs should not make assumptions
617 about the format of the returned string.
619 If the optional argument is provided, then it is interpreted as in the
620 C<get()> method (q.v.)
622 This method is useful mostly for debugging.
628 $marc = new_from_usmarc MARC::Record($raw);
629 $trans = $rec->render("charset=latin1,utf8");
631 Returns an opaque blob of data that is the raw form of the record.
632 Exactly what this is, and what you can do with it, varies depending on
633 the record-syntax. For example, XML records will be returned as,
634 well, XML; MARC records will be returned as ISO 2709-encoded blocks
635 that can be decoded by software such as the fine C<Marc::Record>
636 module; GRS-1 record will be ... gosh, what an interesting question.
637 But no-one uses GRS-1 any more, do they?
639 If the optional argument is provided, then it is interpreted as in the
640 C<get()> method (q.v.)
644 $raw = $rec->get("raw");
645 $rendered = $rec->get("render");
646 $trans = $rec->get("render;charset=latin1,utf8");
647 $trans = $rec->get("render", "charset=latin1,utf8");
649 This is the underlying method used by C<render()> and C<raw()>, and
650 which in turn delegates to the C<ZOOM_record_get()> function of the
651 underlying ZOOM-C library. Most applications will find it more
652 natural to work with C<render()> and C<raw()>.
654 C<get()> may be called with either one or two arguments. The
655 two-argument form is syntactic sugar: the two arguments are simply
656 joined with a semi-colon to make a single argument, so the third and
657 fourth example invocations above are equivalent. The second argument
658 (or portion of the first argument following the semicolon) is used in
659 the C<type> argument of C<ZOOM_record_get()>, as described in
660 http://www.indexdata.com/yaz/doc/zoom.records.tkl
661 This is useful primarily for invoking the character-set transformation
662 - in the examples above, from ISO Latin-1 to UTF-8 Unicode.
664 =head4 clone() / destroy()
666 $rec = $rs->record($i);
667 $newrec = $rec->clone();
669 print $newrec->render();
672 Usually, it's convenient that Record objects are owned by their
673 ResultSets and go away when the ResultSet is destroyed; but
674 occasionally you need a Record to outlive its parent and destroy it
675 later, explicitly. To do this, C<clone()> the record, keep the new
676 Record object that is returned, and C<destroy()> it when it's no
677 longer needed. This is B<only> situation in which a Record needs to
680 =head2 ZOOM::Exception
682 In general, method calls throw an exception (of class
683 C<ZOOM::Exception>) if anything goes wrong, so you don't need to test
684 for success after each call. Exceptions are caught by enclosing the
685 main code in an C<eval{}> block and checking C<$@> on exit from that
686 block, as in the code-sample above.
688 There are a small number of exceptions to this rule: the three
689 record-fetching methods in the C<ZOOM::ResultSet> class,
691 C<record_immediate()>,
694 can all return undefined values for legitimate reasons, under
695 circumstances that do not merit throwing an exception. For this
696 reason, the return values of these methods should be checked. See the
697 individual methods' documentation for details.
699 An exception carries the following pieces of information:
705 A numeric code that specifies the type of error. This can be checked
706 for equality with known values, so that intelligent applications can
707 take appropriate action.
711 A human-readable message corresponding with the code. This can be
712 shown to users, but its value should not be tested, as it could vary
713 in different versions or under different locales.
715 =item additional information [optional]
717 A string containing information specific to the error-code. For
718 example, when the error-code is the BIB-1 diagnostic 109 ("Database
719 unavailable"), the additional information is the name of the database
720 that the application tried to use. For some error-codes, there is no
721 additional information at all; for some others, the additional
722 information is undefined and may just be an human-readable string.
724 =item diagnostic set [optional]
726 A short string specifying the diagnostic set from which the error-code
727 was drawn: for example, C<ZOOM> for a ZOOM-specific error such as
728 C<ZOOM::Error::MEMORY> ("out of memory"), and C<BIB-1> for a Z39.50
729 error-code drawn from the BIB-1 diagnostic set.
733 In theory, the error-code should be interpreted in the context of the
734 diagnostic set from which it is drawn; in practice, nearly all errors
735 are from either the ZOOM or BIB-1 diagnostic sets, and the codes in
736 those sets have been chosen so as not to overlap, so the diagnostic
737 set can usually be ignored.
739 See the description of the C<Exception> class in the ZOOM Abstract
741 http://zoom.z3950.org/api/zoom-current.html#3.7
747 die new ZOOM::Exception($errcode, $errmsg, $addinfo, $diagset);
749 Creates and returns a new Exception object with the specified
750 error-code, error-message, additional information and diagnostic set.
751 Applications will not in general need to use this, but may find it
752 useful to simulate ZOOM exceptions. As is usual with Perl, exceptions
753 are thrown using C<die()>.
755 =head4 code() / message() / addinfo() / diagset()
757 print "Error ", $@->code(), ": ", $@->message(), "\n";
758 print "(addinfo '", $@->addinfo(), "', set '", $@->diagset(), "')\n";
760 These methods, of no arguments, return the exception's error-code,
761 error-message, additional information and diagnostic set respectively.
767 Returns a human-readable rendition of an exception. The C<"">
768 operator is overloaded on the Exception class, so that an Exception
769 used in a string context is automatically rendered. Among other
770 consequences, this has the useful result that a ZOOM application that
771 died due to an uncaught exception will emit an informative message
776 $ss = $conn->scan('@attr 1=1003 a');
778 ($term, $occ) = $ss->term($n-1);
779 $rs = $conn->search_pqf('@attr 1=1003 "' . $term . "'");
780 assert($rs->size() == $occ);
782 A ScanSet represents a set of candidate search-terms returned from an
783 index scan. Its sole purpose is to provide access to those term, to
784 the corresponding display terms, and to the occurrence-counts of the
787 There is no C<new()> method nor any other explicit constructor. The
788 only way to create a new ScanSet is by using C<scan()> on a
791 See the description of the C<Scan Set> class in the ZOOM Abstract
793 http://zoom.z3950.org/api/zoom-current.html#3.6
799 print "Found ", $ss->size(), " terms\n";
801 Returns the number of terms in the scan set. In general, this will be
802 the scan-set size requested by the C<number> option in the Connection
803 on which the scan was performed [default 10], but it may be fewer if
804 the scan is close to the end of the index.
806 =head4 term() / display_term()
808 $ss = $conn->scan('@attr 1=1004 whatever');
809 ($term, $occurrences) = $ss->term(0);
810 ($displayTerm, $occurrences2) = $ss->display_term(0);
811 assert($occurrences == $occurrences2);
812 if (user_likes_the_look_of($displayTerm)) {
813 $rs = $conn->search_pqf('@attr 1=4 "' . $term . '"');
814 assert($rs->size() == $occurrences);
817 These methods return the scanned terms themselves. C<term()> returns
818 the term is a form suitable for submitting as part of a query, whereas
819 C<display_term()> returns it in a form suitable for displaying to a
820 user. Both versions also return the number of occurrences of the term
821 in the index, i.e. the number of hits that will be found if the term
822 is subsequently used in a query.
824 In most cases, the term and display term will be identical; however,
825 they may be different in cases where punctuation or case is
826 normalised, or where identifiers rather than the original document
831 print "scan status is ", $ss->option("scanStatus");
833 Allows options to be set into, and read from, a ScanSet, just like
834 the Connection class's C<option()> method. There is no
835 C<option_binary()> method for ScanSet objects.
837 ScanSet options are also described, though not particularly
839 http://indexdata.com/yaz/doc/zoom.scan.tkl
845 Destroys a ScanSet object, freeing its resources. It is an error to
846 reuse a ScanSet that has been C<destroy()>ed.
850 $p = $conn->package();
851 $p->option(action => "specialUpdate");
852 $p->option(recordIdOpaque => 145);
853 $p->option(record => content_of("/tmp/record.xml"));
857 This class represents an Extended Services Package: an instruction to
858 the server to do something not covered by the core parts of the Z39.50
859 standard (or the equivalent in SRW or SRU). Since the core protocols
860 are read-only, such requests are often used to make changes to the
861 database, such as in the record update example above.
863 Requesting an extended service is a four-step process: first, create a
864 package associated with the connection to the relevant database;
865 second, set options on the package to instruct the server on what to
866 do; third, send the package (which may result in an exception being
867 thrown if the server cannot execute the requested operations; and
868 finally, destroy the package.
870 Package options are listed at
871 http://indexdata.com/yaz/doc/zoom.ext.tkl
873 The particular options that have meaning are determined by the
874 top-level operation string specified as the argument to C<send()>.
875 For example, when the operation is C<update> (the most commonly used
876 extended service), the C<action> option may be set to any of
878 (add a new record, failing if that record already exists),
880 (delete a record, failing if it is not in the database).
882 (replace a record, failing if an old version is not already present)
885 (add a record, replacing any existing version that may be present).
887 For update, the C<record> option should be set to the full text of the
888 XML record to added, deleted or replaced. Depending on how the server
889 is configured, it may extract the record's unique ID from the text
890 (i.e. from a known element such as the C<001> field of a MARCXML
891 record), or it may require the unique ID to passed in explicitly using
892 the C<recordIdOpaque> option.
894 Extended services packages are B<not currently described> in the ZOOM
896 http://zoom.z3950.org/api/zoom-current.html
897 They will be added in a forthcoming version, and will function much
898 as those implemented in this module.
904 $p->option(recordIdOpaque => "46696f6e61");
906 Allows options to be set into, and read from, a Package, just like
907 the Connection class's C<option()> method. There is no
908 C<option_binary()> method for Package objects.
910 Package options are listed at
911 http://indexdata.com/yaz/doc/zoom.ext.tkl
917 Sends a package to the server associated with the Connection that
918 created it. Problems are reported by throwing an exception. The
919 single parameter indicates the operation that the server is being
920 requested to perform, and controls the interpretation of the package's
921 options. Valid operations include:
927 Request a copy of a nominated object, e.g. place an ILL request.
931 Create a new database, the name of which is specified by the
932 C<databaseName> option.
936 Drop an existing database, the name of which is specified by the
937 C<databaseName> option.
941 Commit changes made to the database within a transaction.
945 Modify the contents of the database by adding, deleting or replacing
946 records (as described above in the overview of the C<ZOOM::Package>
951 I have no idea what this does.
955 Although the module is capable of I<making> all these requests, not
956 all servers are capable of I<executing> them. Refusal is indicated by
957 throwing an exception. Problems may also be caused by lack of
958 privileges; so C<send()> must be used with caution, and is perhaps
959 best wrapped in a clause that checks for execptions, like so:
961 eval { $p->send("create") };
962 if ($@ && $@->isa("ZOOM::Exception")) {
963 print "Oops! ", $@->message(), "\n";
971 Destroys a Package object, freeing its resources. It is an error to
972 reuse a Package that has been C<destroy()>ed.
976 $q = new ZOOM::Query::CQL("creator=pike and subject=unix");
977 $q->sortby("1=4 >i 1=21 >s");
978 $rs = $conn->search($q);
981 C<ZOOM::Query> is a virtual base class from which various concrete
982 subclasses can be derived. Different subclasses implement different
983 types of query. The sole purpose of a Query object is to be used in a
984 C<search()> on a Connection; because PQF is such a common special
985 case, the shortcut Connection method C<search_pqf()> is provided.
987 The following Query subclasses are provided, each providing the
988 same set of methods described below:
992 =item ZOOM::Query::PQF
994 Implements Prefix Query Format (PQF), also sometimes known as Prefix
995 Query Notation (PQN). This esoteric but rigorous and expressive
996 format is described in the YAZ Manual at
997 http://indexdata.com/yaz/doc/tools.tkl#PQF
999 =item ZOOM::Query::CQL
1001 Implements the Common Query Language (CQL) of SRU, the Search/Retrieve
1002 URL. CQL is a much friendlier notation than PQF, using a simple infix
1003 notation. The queries are passed ``as is'' to the server rather than
1004 being compiled into a Z39.50 Type-1 query, so only CQL-compliant
1005 servers can support such querier. CQL is described at
1006 http://www.loc.gov/standards/sru/cql/
1007 and in a slight out-of-date but nevertheless useful tutorial at
1008 http://zing.z3950.org/cql/intro.html
1010 =item ZOOM::Query::CQL2RPN
1012 Implements CQL by compiling it on the client-side into a Z39.50
1013 Type-1 (RPN) query, and sending that. This provides essentially the
1014 same functionality as C<ZOOM::Query::CQL>, but it will work against
1015 any standard Z39.50 server rather than only against the small subset
1016 that support CQL natively. The drawback is that, because the
1017 compilation is done on the client side, a configuration file is
1018 required to direct the mapping of CQL constructs such as index names,
1019 relations and modifiers into Type-1 query attributes. An example CQL
1020 configuration file is included in the ZOOM-Perl distribution, in the
1021 file C<samples/cql/pqf.properties>
1025 See the description of the C<Query> class in the ZOOM Abstract
1027 http://zoom.z3950.org/api/zoom-current.html#3.3
1033 $q = new ZOOM::Query::CQL('title=dinosaur');
1034 $q = new ZOOM::Query::PQF('@attr 1=4 dinosaur');
1036 Creates a new query object, compiling the query passed as its argument
1037 according to the rules of the particular query-type being
1038 instantiated. If compilation fails, an exception is thrown.
1039 Otherwise, the query may be passed to the C<Connection> method
1042 $conn->option(cqlfile => "samples/cql/pqf.properties");
1043 $q = new ZOOM::Query::CQL2RPN('title=dinosaur', $conn);
1045 Note that for the C<ZOOM::Query::CQL2RPN> subclass, the Connection
1046 must also be passed into the constructor. This is used for two
1047 purposes: first, its C<cqlfile> option is used to find the CQL
1048 configuration file that directs the translations into RPN; and second,
1049 if compilation fails, then diagnostic information is cached in the
1050 Connection and be retrieved using C<$conn-E<gt>errcode()> and related
1055 $q->sortby("1=4 >i 1=21 >s");
1057 Sets a sort specification into the query, so that when a C<search()>
1058 is run on the query, the result is automatically sorted. The sort
1059 specification language is the same as the C<yaz> sort-specification
1060 type of the C<ResultSet> method C<sort()>, described above.
1066 Destroys a Query object, freeing its resources. It is an error to
1067 reuse a Query that has been C<destroy()>ed.
1069 =head2 ZOOM::Options
1071 $o1 = new ZOOM::Options();
1072 $o1->option(user => "alf");
1073 $o2 = new ZOOM::Options();
1074 $o2->option(password => "fruit");
1075 $opts = new ZOOM::Options($o1, $o2);
1076 $conn = create ZOOM::Connection($opts);
1077 $conn->connect($host); # Uses the specified username and password
1079 Several classes of ZOOM objects carry their own sets of options, which
1080 can be manipulated using their C<option()> method. Sometimes,
1081 however, it's useful to deal with the option sets directly, and the
1082 C<ZOOM::Options> class exists to enable this approach.
1084 Option sets are B<not currently described> in the ZOOM
1086 http://zoom.z3950.org/api/zoom-current.html
1087 They are an extension to that specification.
1093 $o1 = new ZOOM::Options();
1094 $o1and2 = new ZOOM::Options($o1);
1095 $o3 = new ZOOM::Options();
1096 $o1and3and4 = new ZOOM::Options($o1, $o3);
1098 Creates and returns a new option set. One or two (but no more)
1099 existing option sets may be passed as arguments, in which case they
1100 become ``parents'' of the new set, which thereby ``inherits'' their
1101 options, the values of the first parent overriding those of the second
1102 when both have a value for the same key. An option set that inherits
1103 from a parent that has its own parents also inherits the grandparent's
1106 =head4 option() / option_binary()
1108 $o->option(preferredRecordSyntax => "usmarc");
1109 $o->option_binary(iconBlob => "foo\0bar");
1110 die if length($o->option_binary("iconBlob") != 7);
1112 These methods are used to get and set options within a set, and behave
1113 the same way as the same-named C<Connection> methods - see above. As
1114 with the C<Connection> methods, values passed to and retrieved using
1115 C<option()> are interpreted as NUL-terminated, while those passed to
1116 and retrieved from C<option_binary()> are binary-clean.
1120 $o->option(x => "T");
1121 $o->option(y => "F");
1122 assert($o->bool("x", 1));
1123 assert(!$o->bool("y", 1));
1124 assert($o->bool("z", 1));
1126 The first argument is a key, and the second is a default value.
1127 Returns the value associated with the specified key as a boolean, or
1128 the default value if the key has not been set. The values C<T> (upper
1129 case) and C<1> are considered true; all other values (including C<t>
1130 (lower case) and non-zero integers other than one) are considered
1133 This method is provided in ZOOM-C because in a statically typed
1134 language it's convenient to have the result returned as an
1135 easy-to-test type. In a dynamically typed language such as Perl, this
1136 problem doesn't arise, so C<bool()> is nearly useless; but it is made
1137 available in case applications need to duplicate the idiosyncratic
1138 interpretation of truth and falsehood and ZOOM-C uses.
1142 $o->option(x => "012");
1143 assert($o->int("x", 20) == 12);
1144 assert($o->int("y", 20) == 20);
1146 Returns the value associated with the specified key as an integer, or
1147 the default value if the key has not been set. See the description of
1148 C<bool()> for why you almost certainly don't want to use this.
1152 $o->set_int(x => "29");
1154 Sets the value of the specified option as an integer. Of course, Perl
1155 happily converts strings to integers on its own, so you can just use
1156 C<option()> for this, but C<set_int()> is guaranteed to use the same
1157 string-to-integer conversion as ZOOM-C does, which might occasionally
1158 be useful. Though I can't imagine how.
1160 =head4 set_callback()
1164 return "$udata-$key-$udata";
1166 $o->set_callback(\&cb, "xyz");
1167 assert($o->option("foo") eq "xyz-foo-xyz");
1169 This method allows a callback function to be installed in an option
1170 set, so that the values of options can be calculated algorithmically
1171 rather than, as usual, looked up in a table. Along with the callback
1172 function itself, an additional datum is provided: when an option is
1173 subsequently looked up, this datum is passed to the callback function
1174 along with the key; and its return value is returned to the caller as
1175 the value of the option.
1178 Although it ought to be possible to specify callback function using
1179 the C<\&name> syntax above, or a literal C<sub { code }> code
1180 reference, the complexities of the Perl-internal memory management
1181 system mean that the function must currently be specified as a string
1182 containing the fully-qualified name, e.g. C<"main::cb">.>
1185 The current implementation of the this method leaks memory, not only
1186 when the callback is installed, but on every occasion that it is
1187 consulted to look up an option value.
1193 Destroys an Options object, freeing its resources. It is an error to
1194 reuse an Options object that has been C<destroy()>ed.
1198 The ZOOM module provides two enumerations that list possible return
1199 values from particular functions. They are described in the following
1204 if ($@->code() == ZOOM::Error::QUERY_PQF) {
1205 return "your query was not accepted";
1208 This class provides a set of manifest constants representing some of
1209 the possible error codes that can be raised by the ZOOM module. The
1210 methods that return error-codes are
1211 C<ZOOM::Exception::code()>,
1212 C<ZOOM::Connection::error_x()>
1214 C<ZOOM::Connection::errcode()>.
1216 The C<ZOOM::Error> class provides the constants
1226 C<UNSUPPORTED_PROTOCOL>,
1227 C<UNSUPPORTED_QUERY>,
1238 each of which specifies a client-side error. These codes constitute
1239 the C<ZOOM> diagnostic set.
1241 Since errors may also be diagnosed by the server, and returned to the
1242 client, error codes may also take values from the BIB-1 diagnostic set
1243 of Z39.50, listed at the Z39.50 Maintenance Agency's web-site at
1244 http://www.loc.gov/z3950/agency/defns/bib1diag.html
1246 All error-codes, whether client-side from the C<ZOOM::Error>
1247 enumeration or server-side from the BIB-1 diagnostic set, can be
1248 translated into human-readable messages by passing them to the
1249 C<ZOOM::diag_str()> utility function.
1253 if ($conn->last_event() == ZOOM::Event::CONNECT) {
1254 print "Connected!\n";
1257 In applications that need it - mostly complex multiplexing
1258 applications - The C<ZOOM::Connection::last_event()> method is used to
1259 return an indication of the last event that occurred on a particular
1260 connection. It always returns a value drawn from this enumeration,
1261 that is, one of C<NONE>, C<CONNECT>, C<SEND_DATA>, C<RECV_DATA>,
1262 C<TIMEOUT>, C<UNKNOWN>, C<SEND_APDU>, C<RECV_APDU>, C<RECV_RECORD>,
1263 C<RECV_SEARCH> or C<ZEND>.
1265 See the section below on asynchronous applications.
1269 ZOOM::Log::init_level(ZOOM::Log::mask_str("zoom,myapp,-warn"));
1270 ZOOM::Log::log("myapp", "starting up with pid ", $$);
1272 Logging facilities are provided by a set of functions in the
1273 C<ZOOM::Log> module. Note that C<ZOOM::Log> is not a class, and it
1274 is not possible to create C<ZOOM::Log> objects: the API is imperative,
1275 reflecting that of the underlying YAZ logging facilities. Although
1276 there are nine logging functions altogether, you can ignore nearly
1277 all of them: most applications that use logging will begin by calling
1278 C<mask_str()> and C<init_level()> once each, as above, and will then
1279 repeatedly call C<log()>.
1283 $level = ZOOM::Log::mask_str("zoom,myapp,-warn");
1285 Returns an integer corresponding to the log-level specified by the
1286 parameter. This is a string of zero or more comma-separated
1287 module-names, each indicating an individual module to be either added
1288 to the default log-level or removed from it (for those components
1289 prefixed by a minus-sign). The names may be those of either standard
1290 YAZ-logging modules such as C<fatal>, C<debug> and C<warn>, or custom
1291 modules such as C<myapp> in the example above. The module C<zoom>
1292 requests logging from the ZOOM module itself, which may be helpful for
1295 Note that calling this function does not in any way change the logging
1296 state: it merely returns a value. To change the state, this value
1297 must be passed to C<init_level()>.
1299 =head2 module_level()
1301 $level = ZOOM::Log::module_level("zoom");
1302 ZOOM::Log::log($level, "all systems clear: thrusters invogriated");
1304 Returns the integer corresponding to the single log-level specified as
1305 the parameter, or zero if that level has not been registered by a
1306 prior call to C<mask_str()>. Since C<log()> accepts either a numeric
1307 log-level or a string, there is no reason to call this function; but,
1308 what the heck, maybe you enjoy that kind of thing. Who are we to
1313 ZOOM::Log::init_level($level);
1315 Initialises the log-level to the specified integer, which is a bitmask
1316 of values, typically as returned from C<mask_str()>. All subsequent
1317 calls to C<log()> made with a log-level that matches one of the bits
1318 in this mask will result in a log-message being emitted. All logging
1319 can be turned off by calling C<init_level(0)>.
1321 =head2 init_prefix()
1323 ZOOM::Log::init_prefix($0);
1325 Initialises a prefix string to be included in all log-messages.
1329 ZOOM::Log::init_file("/tmp/myapp.log");
1331 Initialises the output file to be used for logging: subsequent
1332 log-messages are written to the nominated file. If this function is
1333 not called, log-messages are written to the standard error stream.
1337 ZOOM::Log::init($level, $0, "/tmp/myapp.log");
1339 Initialises the log-level, the logging prefix and the logging output
1340 file in a single operation.
1342 =head2 time_format()
1344 ZOOM::Log::time_format("%Y-%m-%d %H:%M:%S");
1346 Sets the format in which log-messages' timestamps are emitted, by
1347 means of a format-string like that used in the C function
1348 C<strftime()>. The example above emits year, month, day, hours,
1349 minutes and seconds in big-endian order, such that timestamps can be
1350 sorted lexicographically.
1352 =head2 init_max_size()
1354 (This doesn't seem to work, so I won't bother describing it.)
1358 ZOOM::Log::log(8192, "reducing to warp-factor $wf");
1359 ZOOM::Log::log("myapp", "starting up with pid ", $$);
1361 Provided that the first argument, log-level, is among the modules
1362 previously established by C<init_level()>, this function emits a
1363 log-message made up of a timestamp, the prefix supplied to
1364 C<init_prefix()>, if any, and the concatenation of all arguments after
1365 the first. The message is written to the standard output stream, or
1366 to the file previous specified by C<init_file()> if this has been
1369 The log-level argument may be either a numeric value, as returned from
1370 C<module_level()>, or a string containing the module name.
1372 =head1 ASYNCHRONOUS APPLICATIONS
1374 Although asynchronous applications are conceptually complex, the ZOOM
1375 support for them is provided through a very simple interface,
1376 consisting of one option (C<async>), one function (C<ZOOM::event()>),
1377 one Connection method (C<last_event()> and an enumeration
1380 The approach is as follows:
1384 =item Initialisation
1386 Create several connections to the various servers, each of them having
1387 the option C<async> set, and with whatever additional options are
1388 required - e.g. the piggyback retrieval record-count can be set so
1389 that records will be returned in search responses.
1393 Send searches to the connections, request records, etc.
1395 =item Event harvesting
1397 Repeatedly call C<ZOOM::event()> to discover what responses are being
1398 received from the servers. Each time this function returns, it
1399 indicates which of the connections has fired; this connection can then
1400 be interrogated with the C<last_event()> method to discover what event
1401 has occurred, and the return value - an element of the C<ZOOM::Event>
1402 enumeration - can be tested to determine what to do next. For
1403 example, the C<ZEND> event indicates that no further operations are
1404 outstanding on the connection, so any fetched records can now be
1405 immediately obtained.
1409 Here is a very short program (omitting all error-checking!) which
1410 demonstrates this process. It parallel-searches three servers (or more
1411 of you add them the list), displaying the first record in the
1412 result-set of each server as soon as it becomes available.
1415 @servers = ('z3950.loc.gov:7090/Voyager',
1416 'bagel.indexdata.com:210/gils',
1417 'agricola.nal.usda.gov:7190/Voyager');
1418 for ($i = 0; $i < @servers; $i++) {
1419 $z[$i] = new ZOOM::Connection($servers[$i], 0,
1420 async => 1, # asynchronous mode
1421 count => 1, # piggyback retrieval count
1422 preferredRecordSyntax => "usmarc");
1423 $r[$i] = $z[$i]->search_pqf("mineral");
1425 while (($i = ZOOM::event(\@z)) != 0) {
1426 $ev = $z[$i-1]->last_event();
1427 print("connection ", $i-1, ": ", ZOOM::event_str($ev), "\n");
1428 if ($ev == ZOOM::Event::ZEND) {
1429 $size = $r[$i-1]->size();
1430 print "connection ", $i-1, ": $size hits\n";
1431 print $r[$i-1]->record(0)->render()
1438 The ZOOM abstract API,
1439 http://zoom.z3950.org/api/zoom-current.html
1441 The C<Net::Z3950::ZOOM> module, included in the same distribution as this one.
1443 The C<Net::Z3950> module, which this one supersedes.
1444 http://perl.z3950.org/
1446 The documentation for the ZOOM-C module of the YAZ Toolkit, which this
1447 module is built on. Specifically, its lists of options are useful.
1448 http://indexdata.com/yaz/doc/zoom.tkl
1450 The BIB-1 diagnostic set of Z39.50,
1451 http://www.loc.gov/z3950/agency/defns/bib1diag.html
1455 Mike Taylor, E<lt>mike@indexdata.comE<gt>
1457 =head1 COPYRIGHT AND LICENCE
1459 Copyright (C) 2005 by Index Data.
1461 This library is free software; you can redistribute it and/or modify
1462 it under the same terms as Perl itself, either Perl version 5.8.4 or,
1463 at your option, any later version of Perl 5 you may have available.