1 <chapter id="examples">
2 <!-- $Id: examples.xml,v 1.11 2002-10-17 21:54:22 mike Exp $ -->
3 <title>Example Configurations</title>
6 <title>Overview</title>
9 <literal>zebraidx</literal> and <literal>zebrasrv</literal> are both
10 driven by a master configuration file, which may refer to other
11 subsidiary configuration files. By default, they try to use
12 <filename>zebra.cfg</filename> in the working directory as the
13 master file; but this can be changed using the <literal>-t</literal>
14 option to specify an alternative master configuration file.
17 The master configuration file tells Zebra:
22 Where to find subsidiary configuration files, including
23 <literal>default.idx</literal>
24 which specifies the default indexing rules.
30 What attribute sets to recognise in searches.
36 Policy details such as what record type to expect, what
37 low-level indexing algorithm to use, how to identify potential
38 duplicate records, etc.
45 Now let's see what goes in the <literal>zebra.cfg</literal> file
46 for some example configurations.
51 <title>Example 1: XML Indexing And Searching</title>
54 This example shows how Zebra can be used with absolutely minimal
55 configuration to index a body of
56 <ulink url="http://www.w3.org/xml/###">XML</ulink>
57 documents, and search them using
58 <ulink url="http://www.w3.org/xpath/###">XPath</ulink>
59 expressions to specify access points.
62 Go to the <literal>examples/dinosauricon</literal> subdirectory
63 of the distribution archive.
64 There you will find a <literal>records</literal> subdirectory,
65 which contains some raw XML data to be added to the database: in
66 this case, as single file, <literal>genera.xml</literal>,
67 which contain information about all the known dinosaur genera as of
71 Now we need to create the Zebra database, which we do with the
72 Zebra indexer, <literal>zebraidx</literal>, which is
73 driven by the <literal>zebra.cfg</literal> configuration file.
74 For our purposes, we don't need any
75 special behaviour - we can use the defaults - so we start with a
76 minimal file that just tells <literal>zebraidx</literal> where to
77 find the default indexing rules, and how to parse the records:
79 profilePath: .:../../tab:../../../yaz/tab
84 That's all you need for a minimal Zebra configuration. Now you can
85 roll the XML records into the database and build the indexes:
87 zebraidx update records
91 Now start the server. Like the indexer, its behaviour is
93 <literal>zebra.cfg</literal> file; and like the indexer, it works
94 just fine with this minimal configuration.
98 By default, the server listens on IP port number 9999, although
99 this can easily be changed - see
100 <xref linkend="zebrasrv"/>.
103 Now you can use the Z39.50 client program of your choice to execute
104 XPath-based boolean queries and fetch the XML records that satisfy
107 $ yaz-client tcp:@:9999
109 Z> find @attr 1=/GENUS/SPECIES/AUTHOR/@name Wedel
113 <GENUS name="Sauroposeidon" type="with">
114 <MEANING>lizard Poseidon <LOW>(Greek god of, among other things, earthquakes)</LOW></MEANING>
115 <SPECIES name="proteles">
116 <AUTHOR type="vide" name="Franklin" year="2000"></AUTHOR>
117 <AUTHOR name="Wedel, Cifelli, Sanders"></AUTHOR>
119 <PLACE name="Oklahoma"></PLACE>
120 <TIME value="Albian"></TIME>
121 <LENGTH value="30" q="1"></LENGTH>
122 <REMAINS content="rib, cervical vertebrae"></REMAINS>
124 <P> This new <NOMEN name="Brachiosaurus"></NOMEN>-like <LINK content="dinosaur"></LINK>
125 was perhaps the tallest. With its head raised, it stood 60 feet (nearly
126 20 m) tall. </P>
129 <idzebra xmlns="http://www.indexdata.dk/zebra/">
130 <size>593</size>
131 <localnumber>891</localnumber>
132 <filename>records/genera.xml</filename>
138 Now wasn't that easy?
143 <sect1 id="example2">
144 <title>Example 2: Supporting Interoperable Searches</title>
147 The problem with the previous example is that you need to know the
148 structure of the documents in order to find them. For example,
149 when we wanted to know the genera for which Matt Wedel is an
151 (<phrase role="taxon">Sauroposeidon proteles</phrase>),
152 we had to formulate a complex XPath
153 <literal>1=/GENUS/SPECIES/AUTHOR/@name</literal>
154 which embodies the knowledge that author names are specified in the
155 <literal>name</literal> attribute of the
156 <literal><AUTHOR></literal> element,
158 <literal><SPECIES></literal> element,
159 which in turn is inside the top-level
160 <literal><GENUS></literal> element.
163 This is bad not just because it requires a lot of typing, but more
164 significantly because it ties searching semantics to the physical
165 structure of the searched records. You can't use the same search
166 specification to search two databases if their internal
167 representations are different. Consider an alternative dinosaur
168 database in which the records have author names specified
169 inside an <literal><authorName></literal> element directly
170 inside a top-level <literal><taxon></literal> element: then
171 you'd need to search for them using
172 <literal>1=/taxon/authorName</literal>
175 How, then, can we build broadcasting Information Retrieval
176 applications that look for records in many different databases?
177 The Z39.50 protocol offers a powerful and general solution to this:
178 abstract ``access points''. In the Z39.50 model, an access point
179 is simply a point at which searches can be directed. Nothing is
180 said about implementation: in a given database, an access point
181 might be implemented as an index, a path into physical records, an
182 algorithm for interrogating relational tables or whatever works.
183 The key point is that the semantics of an access point are fixed
187 For convenience, access points are gathered into <firstterm>attribute
188 sets</firstterm>. For example, the BIB-1 attribute set is supposed to
189 contain bibliographic access points such as author, title, subject
190 and ISBN; the GEO attribute set contains access points pertaining
191 to geospatial information (bounding box, ###, etc.); the CIMI
192 attribute set contains access points to do with museum collections
193 (provenance, inscriptions, etc.)
196 In practice, the BIB-1 attribute set has tended to be a dumping
197 ground for all sorts of access points, so that, for example, it
198 includes some geospatial access points as well as strictly
199 bibliographic ones. Nevertheless, the key point is that this model
200 allows a layer of abstraction over the physical representation of
201 records in databases.
204 In the BIB-1 attribute set, an author search is represented by
205 access point 1003. (See
206 <ulink url="###bib1-semantics"/>)
207 So we need to configure our dinosaur database so that searches for
208 BIB-1 access point 1003 look the
209 <literal>name</literal> attribute of the
210 <literal><AUTHOR></literal> element,
212 <literal><SPECIES></literal> element,
214 <literal><GENUS></literal> element.
217 This is a two-step process. First, we need to tell Zebra that we
218 want to support the BIB-1 attribute set. Then we need to tell it
219 which elements of its record pertain to access point 1003.
238 You may have noticed as <literal>zebraidx</literal> was building
239 the database that it issued a warning, which we ignored at the
242 $ zebraidx update records
243 00:45:46-08/10: ../../index/zebraidx(5016) [warn] records/genera.xml:0 Couldn't open GENUS.abs [No such file or directory]
245 FIXME ### This needs more text
253 The master configuration file, <literal>zebra.cfg</literal>,
254 which is as short and simple as it can be:
256 # $Header: /home/cvsroot/idis/doc/examples.xml,v 1.11 2002-10-17 21:54:22 mike Exp $
257 # Bare-bones master configuration file for Zebra
258 profilePath: .:../../tab:../../../yaz/tab
260 Apart from the comments, which are ignored, all this specifies is
261 that the server should recognise the attribute set described in
263 <literal>bib1.att</literal>.
264 ### What is an attribute set?
270 The simplest hello-world example could go like this:
275 <title>The art of motorcycle maintenance</title>
276 <subject scheme="Dewey">zen</subject>
281 f @attr 1=/book/title motorcycle
283 f @attr 1=/book/subject[@scheme=Dewey] zen
285 If you suddenly decide you want broader interop, you can add
286 an abs file (more or less like this):
291 elm (2,1) title title
292 elm (2,21) subject subject
296 How to include images:
300 <imagedata fileref="system.eps" format="eps">
303 <imagedata fileref="system.gif" format="gif">
306 <phrase>The Multi-Lingual Search System Architecture</phrase>
310 <emphasis role="strong">
311 The Multi-Lingual Search System Architecture.
314 Network connections across local area networks are
315 represented by straight lines, and those over the
316 internet by jagged lines.
320 Whene the three <*object> thingies inside the top-level <mediaobject>
321 are decreasingly preferred version to include depending on what the
322 rendering engine can handle. I generated the EPS version of the image
323 by exporting a line-drawing done in TGIF, then converted that to the
324 GIF using a shell-script called "epstogif" which used an appallingly
325 baroque sequence of conversions, which I would prefer not to pollute
326 the Zebra build environment with:
330 # Yes, what follows is stupidly convoluted, but I can't find a
331 # more straightforward path from the EPS generated by tgif's
332 # "Print" command into a browser-friendly format.
334 file=`echo "$1" | sed 's/\.eps//'`
335 ps2pdf "$1" "$file".pdf
336 pdftopbm "$file".pdf "$file"
337 pnmscale 0.50 < "$file"-000001.pbm | pnmcrop | ppmtogif
338 rm -f "$file".pdf "$file"-000001.pbm
342 <!-- Keep this comment at the end of the file
347 sgml-minimize-attributes:nil
348 sgml-always-quote-attributes:t
351 sgml-parent-document: "zebra.xml"
352 sgml-local-catalogs: nil
353 sgml-namecase-general:t