1 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
2 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
4 <!ENTITY % local SYSTEM "local.ent">
6 <!ENTITY % entities SYSTEM "entities.ent">
8 <!ENTITY % common SYSTEM "common/common.ent">
11 <!-- $Id: pazpar2_conf.xml,v 1.21 2007-04-20 14:05:23 quinn Exp $ -->
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
18 <refentrytitle>Pazpar2 conf</refentrytitle>
19 <manvolnum>5</manvolnum>
23 <refname>pazpar2_conf</refname>
24 <refpurpose>Pazpar2 Configuration</refpurpose>
29 <command>pazpar2.conf</command>
33 <refsect1><title>DESCRIPTION</title>
35 The pazpar2 configuration file, together with any referenced XSLT files,
36 govern pazpar2's behavior as a client, and control the normalization and
37 extraction of data elements from incoming result records, for the
38 purposes of merging, sorting, facet analysis, and display.
42 The file is specified using the option -f on the pazpar2 command line.
43 There is not presently a way to reload the configuration file without
44 restarting pazpar2, although this will most likely be added some time
49 <refsect1><title>FORMAT</title>
51 The configuration file is XML-structured. It must be valid XML. All
52 elements specific to pazpar2 should belong to the namespace
53 "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the
54 following examples). The root element is named 'pazpar2'. Under the
55 root element are a number of elements which group categories of
56 information. The categories are described below.
59 <refsect2 id="config-server"><title>server</title>
61 This section governs overall behavior of the client. The data
62 elements are described below.
64 <variablelist> <!-- level 1 -->
69 Configures the webservice -- this controls how you can connect
70 to pazpar2 from your browser or server-side code. The
71 attributes 'host' and 'port' control the binding of the
72 server. The 'host' attribute can be used to bind the server to
73 a secondary IP address of your system, enabling you to run
74 pazpar2 on port 80 alongside a conventional web server. You
75 can override this setting on the command lineusing the option -h.
84 If this item is given, pazpar2 will forward all incoming HTTP
85 requests that do not contain the filename 'search.pz2' to the
86 host and port specified using the 'host' and 'port'
87 attributes. The 'myurl' attribute is required, and should provide
88 the base URL of the server. Generally, the HTTP URL for the host
89 specified in the 'listen' parameter. This functionality is
90 crucial if you wish to use
91 pazpar2 in conjunction with browser-based code (JS, Flash,
92 applets, etc.) which operates in a security sandbox. Such code
93 can only connect to the same server from which the enclosing
94 HTML page originated. Pazpar2s proxy functionality enables you
95 to host all of the main pages (plus images, CSS, etc) of your
96 application on a conventional webserver, while efficiently
97 processing webservice requests for metasearch status, results,
107 If this item is given, pazpar2 will send all Z39.50
108 packages through this Z39.50 proxy server.
109 At least one of the 'host' and 'post' attributes is required.
110 The 'host' attribute may contain both host name and port
111 number, seperated by a colon ':', or only the host name.
112 An empty 'host' attribute sets the Z39.50 host address
122 This nested element controls the behavior of pazpar2 with
123 respect to your data model. In pazpar2, incoming records are
124 normalized, using XSLT, into an internal representation.
125 The 'service' section controls the further processing and
126 extraction of data from the internal representation, primarily
127 through the 'metdata' sub-element.
130 <variablelist> <!-- Level 2 -->
131 <varlistentry><term>metadata</term>
134 One of these elements is required for every data element in
135 the internal representation of the record (see
136 <xref linkend="data_model"/>. It governs
137 subsequent processing as pertains to sorting, relevance
138 ranking, merging, and display of data elements. It supports
139 the following attributes:
142 <variablelist> <!-- level 3 -->
143 <varlistentry><term>name</term>
146 This is the name of the data element. It is matched
147 against the 'type' attribute of the 'metadata' element
148 in the normalized record. A warning is produced if
149 metdata elements with an unknown name are found in the
150 normalized record. This name is also used to represent
151 data elements in the records returned by the
152 webservice API, and to name sort lists and browse
158 <varlistentry><term>type</term>
161 The type of data element. This value governs any
162 normalization or special processing that might take
163 place on an element. Possible values are 'generic'
164 (basic string), 'year' (a range is computed if
165 multiple years are found in the record). Note: This
166 list is likely to increase in the future.
171 <varlistentry><term>brief</term>
174 If this is set to 'yes', then the data element is
175 includes in brief records in the webservice API. Note
176 that this only makes sense for metadata elements that
177 are merged (see below). The default value is 'no'.
182 <varlistentry><term>sortkey</term>
185 Specifies that this data element is to be used for
186 sorting. The possible values are 'numeric' (numeric
187 value), 'skiparticle' (string; skip common, leading
188 articles), and 'no' (no sorting). The default value is
194 <varlistentry><term>rank</term>
197 Specifies that this element is to be used to help rank
198 records against the user's query (when ranking is
199 requested). The value is an integer, used as a
200 multiplier against the basic TF*IDF score. A value of
201 1 is the base, higher values give additional weight to
202 elements of this type. The default is '0', which
203 excludes this element from the rank calculation.
208 <varlistentry><term>termlist</term>
211 Specifies that this element is to be used as a
212 termlist, or browse facet. Values are tabulated from
213 incoming records, and a highscore of values (with
214 their associated frequency) is made available to the
215 client through the webservice API. The possible values
216 are 'yes' and 'no' (default).
221 <varlistentry><term>merge</term>
224 This governs whether, and how elements are extracted
225 from individual records and merged into cluster
226 records. The possible values are: 'unique' (include
227 all unique elements), 'longest' (include only the
228 longest element (strlen), 'range' (calculate a range
229 of values across al matching records), 'all' (include
230 all elements), or 'no' (don't merge; this is the
235 </variablelist> <!-- attributes to metadata -->
239 </variablelist> <!-- Data elements in service directive -->
242 </variablelist> <!-- Data elements in server directive -->
247 <refsect1><title>EXAMPLE</title>
248 <para>Below is a working example configuration:
250 <?xml version="1.0" encoding="UTF-8"?>
251 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
254 <listen port="9004"/>
255 <proxy host="us1.indexdata.com" myurl="us1.indexdata.com"/>
257 <!-- <zproxy host="localhost" port="9000"/> -->
258 <!-- <zproxy host="localhost:9000"/> -->
259 <!-- <zproxy port="9000"/> -->
262 <metadata name="title" brief="yes" sortkey="skiparticle" merge="longest" rank="6"/>
263 <metadata name="isbn" merge="unique"/>
264 <metadata name="date" brief="yes" sortkey="numeric" type="year" merge="range"
266 <metadata name="author" brief="yes" termlist="yes" merge="longest" rank="2"/>
267 <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
268 <metadata name="url" merge="unique"/>
277 <refsect1 id="target_settings"><title>TARGET SETTINGS</title>
279 Pazpar2 features a cunning scheme by which you can associate various
280 kinds of attributes, or settings with search targets. This is done
281 through XML files; each file can associate one or more settings
282 with one or more targets. The file format is generic in nature,
283 designed to support a wide range of application requirements. The
284 settings can be purely technical things, like, how to perform a title
285 search against a given target, or it can associate arbitrary name=value
286 pairs with groups of targets -- for instance, if you would like to
287 place all commercial full-text bases in one group for selection
288 purposes, or you would like to control what targets are accessible
293 During startup, pazpar2 will recursively read a specified directory
294 (can be identified in the pazpar2.cfg file or on the command line), and
295 process any settings files found therein.
299 Clients of the pazpar2 webservice interface can selectively override
300 settings for individual targets within the scope of one session. This
301 can be used in conjunction with an external authentication system to
302 determine which resources are to be accessible to which users. Pazpar2
303 itself has no notion of end-users, and so can be used in conjunction
304 with any type of authentication system. Similarly, the authentication
305 tokens submitted to access-controlled search targets can similarly be
306 overriden, to allow use of pazpar2 in a consortial or multi-library
307 environment, where different end-users may need to be represented to
308 some search targets in different ways. This, again, can be managed
309 using an external database or other lookup mechanism.
312 <refsect2><title>SETTINGS FILE FORMAT</title>
314 Each file contains a root element named <settings>. It may
315 contain one or more <set> elements. The settings and set
316 elements may contain the following attributes. Attributes in the set node
317 overrides those in the setting root element. Each set node must
318 specify (directly, or inherited from the parent node) at least a
319 target, name, and value.
327 This specifies the search target to which this setting should be
328 applied. Targets are identified by their Z39.50 URL, generally
329 including the host, port, and database name, (e.g.
330 bagel.indexdata.com:210/marc). Two wildcard forms are accepted:
331 * (asterisk) matches all known targets;
332 bagel.indexdata.com:210/* matches all known databases on the given
336 A precedence system determines what happens if there are
337 overlapping values for the same setting name for the same
338 target. A setting for a specific target name overrides a
339 setting whch specifies target using a wildcard. This makes it
340 easy to set defaults for all targets, and then override them
341 for specific targets or hosts. If there are
342 multiple overlapping settings with the same name and target
343 value, the 'precedence' attribute determines what happens.
351 The name of the setting. This can be anything you like.
352 However, pazpar2 reserves a number of setting names for
353 specific purposes, all starting with 'pz:', and it is a good
354 idea to avoid that prefix if you make up your own setting
355 names. See below for a list of reserved variables.
363 The value of the setting. Generally, this can be anything you
364 want -- however, some of the reserved settings may expect
365 specific kinds of values.
370 <term>precedence</term>
373 This should be an integer. If not provided, the default value
374 is 0. If two (or more) settings have the same content for
375 target and name, the precedence value determines the outcome.
376 If both settings have the same precedence value, they are both
377 applied to the target(s). If one has a higher value, then the
378 value of that setting is applied, and the other one is ignored.
385 By setting defaults for target, name, or value in the root
386 settings node, you can use the settings files in many different
387 ways. For instance, you can use a single file to set defaults for
388 many different settings, like search fields, retrieval syntaxes,
389 etc. You can have one file per server, which groups settings for
390 that server or target. You could also have one file which associates
391 a number of targets with a given setting, for instance, to associate
392 many databases with a given category or class that makes sense
393 within your application.
397 The following examples illustrate uses of the settings system to
398 associate settings with targets to meet different requirements.
402 The example below associates a set of default values that can be
403 used across many targets. Note the wildcard for targets.
404 This associates the given settings with all targets for which no
405 other information is provided.
407 <settings target="*">
409 <!-- This file introduces default settings for pazpar2 -->
410 <!-- $Id: pazpar2_conf.xml,v 1.21 2007-04-20 14:05:23 quinn Exp $ -->
412 <!-- mapping for unqualified search -->
413 <set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
415 <!-- field-specific mappings -->
416 <set name="pz:cclmap:ti" value="u=4 s=al"/>
417 <set name="pz:cclmap:su" value="u=21 s=al"/>
418 <set name="pz:cclmap:isbn" value="u=7"/>
419 <set name="pz:cclmap:issn" value="u=8"/>
420 <set name="pz:cclmap:date" value="u=30 r=r"/>
422 <!-- Retrieval settings -->
424 <set name="pz:requestsyntax" value="marc21"/>
425 <!-- <set name="pz:elements" value="F"/> NOT YET IMPLEMENTED -->
427 <!-- Result normalization settings -->
429 <set name="pz:nativesyntax" value="iso2709"/>
430 <set name="pz:xslt" value="../etc/marc21.xsl"/>
438 The next example shows certain settings overriden for one target,
439 one which returns XML records containing DublinCore elements, and
440 which furthermore requires a username/password.
442 <settings target="funkytarget.com:210/db1">
443 <set name="pz:requestsyntax" value="xml"/>
444 <set name="pz:nativesyntax" value="xml"/>
445 <set name="pz:xslt" value="../etc/dublincore.xsl"/>
447 <set name="pz:authentication" value="myuser/password"/>
453 The following example associates a specific name/value combination
454 with a number of targets. The targets below are access-restricted,
455 and can only be used by users with special credentials.
457 <settings name="pz:allow" value="0">
458 <set target="funkytarget.com:210/*"/>
459 <set target="commercial.com:2100/expensiveDb"/>
466 <refsect2><title>RESERVED SETTING NAMES</title>
468 The following setting names are reserved by pazpar2 to control the
469 behavior of the client function.
474 <term>pz:cclmap:xxx</term>
477 This establishes a CCL field definition or other setting, for
478 the purpose of mapping end-user queries. XXX is the field or
479 setting name, and the value of the setting provides parameters
480 (e.g. parameters to send to the server, etc.). Please consult
481 the YAZ manual for a full overview of the many capabilities of
482 the powerful and flexible CCL parser.
485 Note that it is easy to etablish a set of default parameters,
486 and then override them individually for a given target.
491 <term>pz:requestsyntax</term>
494 This specifies the record syntax to use when requesting
495 records from a given server. The value can be a symbolic name like
496 marc21 or xml, or it can be a Z39.50-style dot-separated OID.
501 <term>pz:elements</term>
504 The element set name to be used when retrieving records from a
510 <term>pz:piggyback</term>
513 Piggybacking enables the server to retrieve records from the
514 server as part of the search response in Z39.50. Almost all
515 servers support this (or fail it gracefully), but a few
516 servers will produce undesirable results.
517 Set to '1' to enable piggybacking, '0' to disable it. Default
518 is 1 (piggybacking enabled).
523 <term>pz:nativesyntax</term>
526 The representation (syntax) of the retrieval records. Currently
527 recognized values are iso2709 and xml.
530 For iso2709, can also specify a native character set, e.g. "iso2709;latin-1".
531 If no character set is provided, MARC-8 is assumed.
539 Provides the path of an XSLT stylesheet which will be used to
540 map incoming records to the internal representation.
545 <term>pz:authentication</term>
548 Sets an authentication string for a given server. See the section on
549 authorization and authentication for discussion.
554 <term>pz:allow</term>
557 Allows or denies access to the resources it is applied to. Possible
558 values are '0' and '1'. The default is '1' (allow access to this resource).
559 See the manual section on authorization and authentication for discussion
560 about how to use this setting.
565 <term>pz:maxrecs</term>
568 Controls the maximum number of records to be retrieved from a
569 server. The default is 100.
577 This setting can't be 'set' -- it contains the ID (normally
578 ZURL) for a given target, and is useful for filtering --
579 specifically when you want to select one or more specific
580 targets in the search command.
589 <!-- Keep this comment at the end of the file
594 sgml-minimize-attributes:nil
595 sgml-always-quote-attributes:t
598 sgml-parent-document:nil
599 sgml-local-catalogs: nil
600 sgml-namecase-general:t