1 <chapter id="tutorial">
2 <title>Tutorial</title>
5 <sect1 id="tutorial-oai">
6 <title>A first &acro.oai; indexing example</title>
9 In this section, we will test the system by indexing a small set of
10 sample &acro.oai; records that are included with the &zebra; distribution,
11 running a &zebra; server against the newly created database, and
12 searching the indexes with a client that connects to that server.
15 Go to the <literal>examples/oai-pmh</literal> subdirectory of the
16 distribution archive, or make a deep copy of the Debian installation
18 <literal>/usr/share/idzebra-2.0-examples/oai-pmh</literal>.
19 An XML file containing multiple &acro.oai;
20 records is located in the sub
21 directory <literal>examples/oai-pmh/data</literal>.
24 Additional OAI test records can be downloaded by running a shell
25 script (you may want to abort the script when you have waited
26 longer than your coffee brews ..).
34 To index these &acro.oai; records, type:
36 zebraidx-2.0 -c conf/zebra.cfg init
37 zebraidx-2.0 -c conf/zebra.cfg update data
38 zebraidx-2.0 -c conf/zebra.cfg commit
40 In case you have not installed zebra yet but have compiled the
41 binaries from this tarball, use the following command form:
43 ../../index/zebraidx -c conf/zebra.cfg this and that
45 On some systems the &zebra; binaries are installed under the
46 generic names, you need to use the following command form:
48 zebraidx -c conf/zebra.cfg this and that
53 In this command, the word <literal>update</literal> is followed
54 by the name of a directory: <literal>zebraidx</literal> updates all
55 files in the hierarchy rooted at <literal>data</literal>.
57 <literal>-c conf/zebra.cfg</literal> points to the proper
62 You might ask yourself how &acro.xml; content is indexed using &acro.xslt;
63 stylesheets: to satisfy your curiosity, you might want to run the
64 indexing transformation on an example debugging &acro.oai; record.
66 xsltproc conf/oai2index.xsl data/debug-record.xml
68 Here you see the &acro.oai; record transformed into the indexing
69 &acro.xml; format. &zebra; is creating several inverted indexes,
70 and their name and type are clearly visible in the indexing
75 If your indexing command was successful, you are now ready to
76 fire up a server. To start a server on port 9999, type:
78 zebrasrv-2.0 -c conf/zebra.cfg @:9999
83 The &zebra; index that you have just created has a single database
84 named <literal>Default</literal>.
85 The database contains several &acro.oai; records, and the server will
86 return records in the &acro.xml; format only. The indexing machine
87 did the splitting into individual records just behind the scenes.
93 <sect1 id="tutorial-oai-sru-pqf">
94 <title>Searching the &acro.oai; database by web service</title>
97 &zebra; has a build-in web service, which is close to the
98 &acro.sru; standard web service. We use it to access our new
99 database using any &acro.xml; enabled web browser.
100 This service is using the &acro.pqf; query language.
102 section we show how to run a fully compliant &acro.sru; server,
103 including support for the query language &acro.cql;
107 Searching and retrieving &acro.xml; records is easy. For example,
108 you can point your browser to one of the following URLs to
109 search for the term <literal>the</literal>. Just point your
110 browser at this link:
112 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the">
113 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the</ulink>
118 These URLs won't work unless you have indexed the example data
119 and started an &zebra; server as outlined in the previous section.
124 In case we actually want to retrieve one record, we need to alter
125 our URL to the following
126 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
127 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
132 This way we can page through our result set in chunks of records,
133 for example, we access the 6th to the 10th record using the URL
134 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc">
135 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=6&maximumRecords=5&recordSchema=dc
144 http://localhost:9999/?version=1.1&operation=searchRetrieve
145 &x-pquery=title%3Cthe
149 <sect1 id="tutorial-oai-sru-present">
150 <title>Presenting search results in different formats</title>
153 &zebra; uses &acro.xslt; stylesheets for both &acro.xml;record
155 display retrieval. In this example installation, they are two
156 retrieval schema's defined in
157 <literal>conf/dom-conf.xml</literal>:
158 the <literal>dc</literal> schema implemented in
159 <literal>conf/oai2dc.xsl</literal>, and
160 the <literal>zebra</literal> schema implemented in
161 <literal>conf/oai2zebra.xsl</literal>.
162 The URLs for accessing both are the same, except for the different
163 value of the <literal>recordSchema</literal> parameter:
164 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc">
165 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=dc
168 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra">
169 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra
171 For the curious, one can see that the &acro.xslt; transformations
174 xsltproc conf/oai2dc.xsl data/debug-record.xml
175 xsltproc conf/oai2zebra.xsl data/debug-record.xml
177 Notice also that the &zebra; specific parameters are injected by
178 the engine when retrieving data, therefore some of the attributes
179 in the <literal>zebra</literal> retrieval schema are not filled
180 when running the transformation from the command line.
185 In addition to the user defined retrieval schema's one can always
186 choose from many build-in schema's. In case one is only
187 interested in the &zebra; internal metadata about a certain
188 record, one uses the <literal>zebra::meta</literal> schema.
189 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta">
190 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::meta
195 The <literal>zebra::data</literal> schema is used to retrieve the
196 original stored &acro.oai; &acro.xml; record.
197 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data">
198 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::data
204 <sect1 id="tutorial-oai-sru-searches">
205 <title>More interesting searches</title>
208 The &acro.oai; indexing example defines many different index
209 names, a study of the <literal>conf/oai2index.xsl</literal>
210 stylesheet reveals the following word type indexes (i.e. those
211 with suffix <literal>:w</literal>):
223 By default, searches do access the <literal>any:w</literal> index,
224 but we can direct searches to any access point by constructing the
225 correct &acro.pqf; query. For example, to search in titles only,
228 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc">
229 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title the&startRecord=1&maximumRecords=1&recordSchema=dc
234 Similar we can direct searches to the other indexes defined. Or we
235 can create boolean combinations of searches on different
236 indexes. In this case we search for <literal>the</literal> in
237 <literal>title</literal> and for <literal>fish</literal> in
238 <literal>description</literal> using the query
239 <literal>@and @attr 1=title the @attr 1=description fish</literal>.
241 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and @attr 1=title the @attr 1=description fish&startRecord=1&maximumRecords=1&recordSchema=dc">
242 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@and @attr 1=title the @attr 1=description fish&startRecord=1&maximumRecords=1&recordSchema=dc
249 <sect1 id="tutorial-oai-sru-zebra-indexes">
250 <title>Investigating the content of the indexes</title>
253 How does the magic work? What is inside the indexes? Why is a certain
254 record found by a search, and another not?. The answer is in the
255 inverted indexes. You can easily investigate them using the
256 special &zebra; schema
257 <literal>zebra::index::fieldname</literal>. In this example you
258 can see that the <literal>title</literal> index has both word
259 (type <literal>:w</literal>) and phrase (type
260 <literal>:p</literal>)
262 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title">
263 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::index::title
268 But where in the indexes did the term match for the query occur?
269 Easily answered with the special &zebra; schema
270 <literal>zebra::snippet</literal>. The matching terms are
271 encapsulated by <literal><s></literal> tags.
272 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
273 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
278 How can I refine my search? Which interesting search terms are
279 found inside my hit set? Try the special &zebra; schema
280 <literal>zebra::facet::fieldname:type</literal>. In this case, we
281 investigate additional search terms for the
282 <literal>title:w</literal> index.
283 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w">
284 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::title:w
289 One can ask for multiple facets. Here, we want them from phrase
291 <literal>:p</literal>.
292 <ulink url="http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p">
293 http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=the&startRecord=1&maximumRecords=1&recordSchema=zebra::facet::publisher:p,title:p
300 <sect1 id="tutorial-oai-sru-yazfrontend">
301 <title>Setting up a correct &acro.sru; web service</title>
304 The &acro.sru; specification mandates that the &acro.cql; query
305 language is supported and properly configured. Also, the server
306 needs to be able to emit a proper &acro.explain; &acro.xml;
307 record, which is used to determine the capabilities of the
308 specific server instance.
312 In this example configuration we exploit the similarities between
313 the &acro.explain; record and the &acro.cql; query language
314 configuration, we generate the later from the former using an
315 &acro.xslt; transformation.
317 xsltproc conf/explain2cqlpqftxt.xsl conf/explain.xml > conf/cql2pqf.txt
322 We are all set to start the &acro.sru;/&acro.z3950; server including
323 &acro.pqf; and &acro.cql; query configuration. It uses the &yaz; frontend
324 server configuration - just type
326 zebrasrv -f conf/yazserver.xml
331 First, we'd like to be sure that we can see the &acro.explain;
332 &acro.xml; response correctly. You might use either of these equivalent
335 url="http://localhost:9999">http://localhost:9999
339 url="http://localhost:9999/?version=1.1&operation=explain">
340 http://localhost:9999/?version=1.1&operation=explain
346 Now we can issue true &acro.sru; requests. For example,
347 <literal>dc.title=the
348 and dc.description=fish</literal> results in the following page
350 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the and dc.description=fish&startRecord=1&maximumRecords=1&recordSchema=dc">
351 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=dc
356 Scan of indexes is a part of the &acro.sru; server business. For example,
357 scanning the <literal>dc.title</literal> index gives us an idea
358 what search terms are found there
360 url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish">
361 http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.title=fish
365 url="http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish">
366 http://localhost:9999/?version=1.1&operation=scan&scanClause=dc.identifier=fish
368 accesses the indexed identifiers.
372 In addition, all &zebra; internal special element sets or record
374 <literal>zebra::</literal> just work right out of the box
376 url="http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the and dc.description=fish&startRecord=1&maximumRecords=1&recordSchema=zebra::snippet">
377 http://localhost:9999/?version=1.1&operation=searchRetrieve&query=dc.title=the and dc.description=fish &startRecord=1&maximumRecords=1&recordSchema=zebra::snippet
386 <sect1 id="tutorial-oai-z3950">
387 <title>Searching the &acro.oai; database by &acro.z3950; protocol</title>
390 In this section we repeat the searches and presents we have done so
391 far using the binary &acro.z3950; protocol, you can use any
393 For instance, you can use the demo command-line client that comes
397 Connecting to the server is done by the command
399 yaz-client localhost:9999
404 When the client has connected, you can type:
415 &acro.z3950; presents using presentation stylesheets:
426 &acro.z3950; buildin Zebra presents (in this configuration only if
427 started without yaz-frontendserver):
430 Z> elements zebra::meta
433 Z> elements zebra::meta::sysno
440 Z> elements zebra::index
443 Z> elements zebra::snippet
446 Z> elements zebra::facet::any:w
449 Z> elements zebra::facet::publisher:p,title:p
455 &acro.z3950; searches targeted at specific indexes and boolean
456 combinations of these can be issued as well.
460 Z> find @attr 1=oai_identifier @attr 4=3 oai:caltechcstr.library.caltech.edu:4
463 Z> find @attr 1=oai_datestamp @attr 4=3 2001-04-20
466 Z> find @attr 1=oai_setspec @attr 4=3 7374617475733D756E707562
469 Z> find @attr 1=title communication
472 Z> find @attr 1=identifier @attr 4=3
473 http://resolver.caltech.edu/CaltechCSTR:1986.5228-tr-86
482 yaz-client localhost:9999
485 Z> scan @attr 1=oai_identifier @attr 4=3 oai
486 Z> scan @attr 1=oai_datestamp @attr 4=3 1
487 Z> scan @attr 1=oai_setspec @attr 4=3 2000
489 Z> scan @attr 1=title communication
490 Z> scan @attr 1=identifier @attr 4=3 a
495 &acro.z3950; search using server-side CQL conversion:
503 Z> find dc.creator = the
504 Z> find dc.creator = the
505 Z> find dc.title = the
507 Z> find dc.description < the
508 Z> find dc.title > some
510 Z> find dc.identifier="http://resolver.caltech.edu/CaltechCSTR:1978.2276-tr-78"
511 Z> find dc.relation = something
516 etc, etc. Notice that all indexes defined by 'type="0"' in the
517 indexing style sheet must be searched using the 'eq'
527 &acro.z3950; scan using server side CQL conversion -
528 unfortunately, this will _never_ work as it is not supported by the
529 &acro.z3950; standard.
530 If you want to use scan using server side CQL conversion, you need to
531 make an SRW connection using yaz-client, or a
532 SRU connection using REST Web Services - any browser will do.
538 All indexes defined by 'type="0"' in the
539 indexing style sheet must be searched using the '@attr 4=3'
540 structure attribute instruction.
545 Notice that searching and scan on indexes
546 <literal>contributor</literal>, <literal>language</literal>,
547 <literal>rights</literal>, and <literal>source</literal>
548 might fail, simply because none of the records in the small example set
549 have these fields set, and consequently, these indexes might not
558 <!-- Keep this comment at the end of the file
563 sgml-minimize-attributes:nil
564 sgml-always-quote-attributes:t
567 sgml-parent-document: "idzebra.xml"
568 sgml-local-catalogs: nil
569 sgml-namecase-general:t