1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.1//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd"
5 <!ENTITY % local SYSTEM "local.ent">
7 <!ENTITY % entities SYSTEM "entities.ent">
9 <!ENTITY % idcommon SYSTEM "common/common.ent">
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
18 <refentrytitle>Pazpar2 conf</refentrytitle>
19 <manvolnum>5</manvolnum>
23 <refname>pazpar2_conf</refname>
24 <refpurpose>Pazpar2 Configuration</refpurpose>
29 <command>pazpar2.conf</command>
33 <refsect1><title>DESCRIPTION</title>
35 The Pazpar2 configuration file, together with any referenced XSLT files,
36 govern Pazpar2's behavior as a client, and control the normalization and
37 extraction of data elements from incoming result records, for the
38 purposes of merging, sorting, facet analysis, and display.
42 The file is specified using the option -f on the Pazpar2 command line.
43 There is not presently a way to reload the configuration file without
44 restarting Pazpar2, although this will most likely be added some time
49 <refsect1><title>FORMAT</title>
51 The configuration file is XML-structured. It must be valid XML. All
52 elements specific to Pazpar2 should belong to the namespace
53 <literal>http://www.indexdata.com/pazpar2/1.0</literal>
54 (this is assumed in the
55 following examples). The root element is named <literal>pazpar2</literal>.
56 Under the root element are a number of elements which group categories of
57 information. The categories are described below.
60 <refsect2 id="config-server"><title>server</title>
62 This section governs overall behavior of the client. The data
63 elements are described below.
65 <variablelist> <!-- level 1 -->
70 Configures the webservice -- this controls how you can connect
71 to Pazpar2 from your browser or server-side code. The
72 attributes 'host' and 'port' control the binding of the
73 server. The 'host' attribute can be used to bind the server to
74 a secondary IP address of your system, enabling you to run
75 Pazpar2 on port 80 alongside a conventional web server. You
76 can override this setting on the command line using the option -h.
85 If this item is given, Pazpar2 will forward all incoming HTTP
86 requests that do not contain the filename 'search.pz2' to the
87 host and port specified using the 'host' and 'port'
88 attributes. The 'myurl' attribute is required, and should provide
89 the base URL of the server. Generally, the HTTP URL for the host
90 specified in the 'listen' parameter. This functionality is
91 crucial if you wish to use
92 Pazpar2 in conjunction with browser-based code (JS, Flash,
93 applets, etc.) which operates in a security sandbox. Such code
94 can only connect to the same server from which the enclosing
95 HTML page originated. Pazpar2s proxy functionality enables you
96 to host all of the main pages (plus images, CSS, etc) of your
97 application on a conventional webserver, while efficiently
98 processing webservice requests for metasearch status, results,
105 <term>relevance</term>
108 Specifies ICU tokenization and transformation rules
109 for tokens that are used in Pazpar2's relevance ranking. The 'id'
110 attribute is currently not used, and the 'locale'
111 attribute must be set to one of the locale strings
112 defined in ICU. The child elements listed below can be
113 in any order, except the 'index' element which logically
114 belongs to the end of the list. The stated tokenization,
115 transformation and charmapping instructions are performed
116 in order from top to bottom.
118 <variablelist> <!-- Level 2 -->
119 <varlistentry><term>casemap</term>
122 The attribute 'rule' defines the direction of the
123 per-character casemapping, allowed values are "l"
124 (lower), "u" (upper), "t" (title).
128 <varlistentry><term>transform</term>
131 Normalization and transformation of tokens follows
132 the rules defined in the 'rule' attribute. For
133 possible values we refer to the extensive ICU
134 documentation found at the
135 <ulink url="&url.icu.transform;">ICU
136 transformation</ulink> home page. Set filtering
137 principles are explained at the
138 <ulink url="&url.icu.unicode.set;">ICU set and
139 filtering</ulink> page.
143 <varlistentry><term>tokenize</term>
146 Tokenization is the only rule in the ICU chain
147 which splits one token into multiple tokens. The
148 'rule' attribute may have the following values:
149 "s" (sentence), "l" (line-break), "w" (word), and
150 "c" (character), the later probably not being
151 very useful in a pruning Pazpar2 installation.
163 Specifies ICU tokenization and transformation rules
164 for tokens that are used in Pazpar2's sorting. The contents
165 is similar to that of <literal>relevance</literal>.
171 <term>mergekey</term>
174 Specifies ICU tokenization and transformation rules
175 for tokens that are used in Pazpar2's mergekey. The contents
176 is similar to that of <literal>relevance</literal>.
185 This nested element controls the behavior of Pazpar2 with
186 respect to your data model. In Pazpar2, incoming records are
187 normalized, using XSLT, into an internal representation.
188 The 'service' section controls the further processing and
189 extraction of data from the internal representation, primarily
190 through the 'metadata' sub-element.
193 <variablelist> <!-- Level 2 -->
194 <varlistentry><term>metadata</term>
197 One of these elements is required for every data element in
198 the internal representation of the record (see
199 <xref linkend="data_model"/>. It governs
200 subsequent processing as pertains to sorting, relevance
201 ranking, merging, and display of data elements. It supports
202 the following attributes:
205 <variablelist> <!-- level 3 -->
206 <varlistentry><term>name</term>
209 This is the name of the data element. It is matched
210 against the 'type' attribute of the
212 in the normalized record. A warning is produced if
213 metadata elements with an unknown name are
215 normalized record. This name is also used to
217 data elements in the records returned by the
218 webservice API, and to name sort lists and browse
224 <varlistentry><term>type</term>
227 The type of data element. This value governs any
228 normalization or special processing that might take
229 place on an element. Possible values are 'generic'
230 (basic string), 'year' (a range is computed if
231 multiple years are found in the record). Note: This
232 list is likely to increase in the future.
237 <varlistentry><term>brief</term>
240 If this is set to 'yes', then the data element is
241 includes in brief records in the webservice API. Note
242 that this only makes sense for metadata elements that
243 are merged (see below). The default value is 'no'.
248 <varlistentry><term>sortkey</term>
251 Specifies that this data element is to be used for
252 sorting. The possible values are 'numeric' (numeric
253 value), 'skiparticle' (string; skip common, leading
254 articles), and 'no' (no sorting). The default value is
260 <varlistentry><term>rank</term>
263 Specifies that this element is to be used to
265 records against the user's query (when ranking is
266 requested). The value is an integer, used as a
267 multiplier against the basic TF*IDF score. A value of
268 1 is the base, higher values give additional
270 elements of this type. The default is '0', which
271 excludes this element from the rank calculation.
276 <varlistentry><term>termlist</term>
279 Specifies that this element is to be used as a
280 termlist, or browse facet. Values are tabulated from
281 incoming records, and a highscore of values (with
282 their associated frequency) is made available to the
283 client through the webservice API.
285 are 'yes' and 'no' (default).
290 <varlistentry><term>merge</term>
293 This governs whether, and how elements are extracted
294 from individual records and merged into cluster
295 records. The possible values are: 'unique' (include
296 all unique elements), 'longest' (include only the
297 longest element (strlen), 'range' (calculate a range
298 of values across all matching records), 'all' (include
299 all elements), or 'no' (don't merge; this is the
305 <varlistentry><term>setting</term>
308 This attribute allows you to make use of static database
309 settings in the processing of records. Three possible values
310 are allowed. 'no' is the default and doesn't do anything.
311 'postproc' copies the value of a setting with the same name
312 into the output of the normalization stylesheet(s). 'parameter'
313 makes the value of a setting with the same name available
314 as a parameter to the normalization stylesheet, so you
315 can further process the value inside of the stylesheet, or use
316 the value to decide how to deal with other data values.
320 The purpose of using settings in this way can either be to
321 control the behavior of normalization stylesheet in a database-
322 dependent way, or to easily make database-dependent values
323 available to display-logic in your user interface, without having
324 to implement complicated interactions between the user interface
325 and your configuration system.
328 </variablelist> <!-- attributes to metadata -->
332 </variablelist> <!-- Data elements in service directive -->
335 </variablelist> <!-- Data elements in server directive -->
340 <refsect1><title>EXAMPLE</title>
341 <para>Below is a working example configuration:
343 <?xml version="1.0" encoding="UTF-8"?>
344 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
347 <listen port="9004"/>
348 <proxy host="us1.indexdata.com" myurl="us1.indexdata.com"/>
350 <!-- optional ICU ranking configuration example -->
352 <icu_chain id="el:word" locale="el">
353 <normalize rule="[:Control:] Any-Remove"/>
355 <normalize rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
362 <metadata name="title" brief="yes" sortkey="skiparticle" merge="longest" rank="6"/>
363 <metadata name="isbn" merge="unique"/>
364 <metadata name="date" brief="yes" sortkey="numeric" type="year" merge="range"
366 <metadata name="author" brief="yes" termlist="yes" merge="longest" rank="2"/>
367 <metadata name="subject" merge="unique" termlist="yes" rank="3"/>
368 <metadata name="url" merge="unique"/>
377 <refsect1 id="target_settings"><title>TARGET SETTINGS</title>
379 Pazpar2 features a cunning scheme by which you can associate various
380 kinds of attributes, or settings with search targets. This can be done
381 through XML files which are read at startup; each file can associate
382 one or more settings with one or more targets. The file format is generic
383 in nature, designed to support a wide range of application requirements. The
384 settings can be purely technical things, like, how to perform a title
385 search against a given target, or it can associate arbitrary name=value
386 pairs with groups of targets -- for instance, if you would like to
387 place all commercial full-text bases in one group for selection
388 purposes, or you would like to control what targets are accessible
389 to users by default. Per-database settings values can even be used
390 to drive sorting, facet/termlist generation, or end-user interface display
395 During startup, Pazpar2 will recursively read a specified directory
396 (can be identified in the pazpar2.cfg file or on the command line), and
397 process any settings files found therein.
401 Clients of the Pazpar2 webservice interface can selectively override
402 settings for individual targets within the scope of one session. This
403 can be used in conjunction with an external authentication system to
404 determine which resources are to be accessible to which users. Pazpar2
405 itself has no notion of end-users, and so can be used in conjunction
406 with any type of authentication system. Similarly, the authentication
407 tokens submitted to access-controlled search targets can similarly be
408 overridden, to allow use of Pazpar2 in a consortial or multi-library
409 environment, where different end-users may need to be represented to
410 some search targets in different ways. This, again, can be managed
411 using an external database or other lookup mechanism. Setting overrides
412 can be performed either using the 'init' or the 'settings' webservice
417 In fact, every setting that applies to a database (except pz:id, which
418 can only be used for filtering targets to use for a search) can be overridden
419 on a per-session basis. This allows the client to override specific CCL fields
420 for searching, etc., to meet the needs of a session or user.
424 Finally, as an extreme case of this, the webservice client can
425 introduce entirely new targets, on the fly, as part of the init or
426 settings command. This is useful if you desire to manage information
427 about your search targets in a separate application such as a database.
428 You do not need any static settings file whatsoever to run Pazpar2 -- as
429 long as the webservice client is prepared to supply the necessary
430 information at the beginning of every session.
435 The following discussion of practical issues related to session and settings
436 management are cast in terms of a user interface based on Ajax/Javascript
437 technology. It would apply equally well to many other kinds of browser-based logic.
442 Typically, a Javascript client is not allowed to directly alter the parameters
443 of a session. There are two reasons for this. One has to do with access
444 to information; typically, information about a user will be stored in a
445 system on the server side, or it will be accessible in some way from the server.
446 However, since the Javascript client cannot be entirely trusted (some hostile
447 agent might in fact 'pretend' to be a regular ws client), it is more robust
448 to control session settings from scripting that you run as part of your
449 webserver. Typically, this can be handled during the session initialization,
454 Step 1: The Javascript client loads, and asks the webserver for a new Pazpar2
455 session ID. This can be done using a Javascript call, for instance. Note that
456 it is possible to submit Ajax HTTPXmlRequest calls either to Pazpar2 or to the
457 webserver that Pazpar2 is proxying for. See (XXX Insert link to Pazpar2 protocol).
461 Step 2: Code on the webserver authenticates the user, by database lookup,
462 LDAP access, NCIP, etc. Determines which resources the user has access to,
463 and any user-specific parameters that are to be applied during this session.
467 Step 3: The webserver initializes a new Pazpar2 settings, and sets user-specific
468 parameters as necessary, using the init webservice command. A new session ID is
473 Step 4: The webserver returns this session ID to the Javascript client, which then
474 uses the session ID to submit searches, show results, etc.
478 Step 5: When the Javascript client ceases to use the session, Pazpar2 destroys
479 any session-specific information.
482 <refsect2><title>SETTINGS FILE FORMAT</title>
484 Each file contains a root element named <settings>. It may
485 contain one or more <set> elements. The settings and set
486 elements may contain the following attributes. Attributes in the set node
487 overrides those in the setting root element. Each set node must
488 specify (directly, or inherited from the parent node) at least a
489 target, name, and value.
497 This specifies the search target to which this setting should be
498 applied. Targets are identified by their Z39.50 URL, generally
499 including the host, port, and database name, (e.g.
500 <literal>bagel.indexdata.com:210/marc</literal>).
501 Two wildcard forms are accepted:
502 * (asterisk) matches all known targets;
503 <literal>bagel.indexdata.com:210/*</literal> matches all
504 known databases on the given host.
507 A precedence system determines what happens if there are
508 overlapping values for the same setting name for the same
509 target. A setting for a specific target name overrides a
510 setting which specifies target using a wildcard. This makes it
511 easy to set defaults for all targets, and then override them
512 for specific targets or hosts. If there are
513 multiple overlapping settings with the same name and target
514 value, the 'precedence' attribute determines what happens.
522 The name of the setting. This can be anything you like.
523 However, Pazpar2 reserves a number of setting names for
524 specific purposes, all starting with 'pz:', and it is a good
525 idea to avoid that prefix if you make up your own setting
526 names. See below for a list of reserved variables.
534 The value of the setting. Generally, this can be anything you
535 want -- however, some of the reserved settings may expect
536 specific kinds of values.
541 <term>precedence</term>
544 This should be an integer. If not provided, the default value
545 is 0. If two (or more) settings have the same content for
546 target and name, the precedence value determines the outcome.
547 If both settings have the same precedence value, they are both
548 applied to the target(s). If one has a higher value, then the
549 value of that setting is applied, and the other one is ignored.
556 By setting defaults for target, name, or value in the root
557 settings node, you can use the settings files in many different
558 ways. For instance, you can use a single file to set defaults for
559 many different settings, like search fields, retrieval syntaxes,
560 etc. You can have one file per server, which groups settings for
561 that server or target. You could also have one file which associates
562 a number of targets with a given setting, for instance, to associate
563 many databases with a given category or class that makes sense
564 within your application.
568 The following examples illustrate uses of the settings system to
569 associate settings with targets to meet different requirements.
573 The example below associates a set of default values that can be
574 used across many targets. Note the wildcard for targets.
575 This associates the given settings with all targets for which no
576 other information is provided.
578 <settings target="*">
580 <!-- This file introduces default settings for pazpar2 -->
582 <!-- mapping for unqualified search -->
583 <set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
585 <!-- field-specific mappings -->
586 <set name="pz:cclmap:ti" value="u=4 s=al"/>
587 <set name="pz:cclmap:su" value="u=21 s=al"/>
588 <set name="pz:cclmap:isbn" value="u=7"/>
589 <set name="pz:cclmap:issn" value="u=8"/>
590 <set name="pz:cclmap:date" value="u=30 r=r"/>
592 <!-- Retrieval settings -->
594 <set name="pz:requestsyntax" value="marc21"/>
595 <set name="pz:elements" value="F"/>
597 <!-- Query encoding -->
598 <set name="pz:queryencoding" value="iso-8859-1"/>
600 <!-- Result normalization settings -->
602 <set name="pz:nativesyntax" value="iso2709"/>
603 <set name="pz:xslt" value="../etc/marc21.xsl"/>
611 The next example shows certain settings overridden for one target,
612 one which returns XML records containing DublinCore elements, and
613 which furthermore requires a username/password.
615 <settings target="funkytarget.com:210/db1">
616 <set name="pz:requestsyntax" value="xml"/>
617 <set name="pz:nativesyntax" value="xml"/>
618 <set name="pz:xslt" value="../etc/dublincore.xsl"/>
620 <set name="pz:authentication" value="myuser/password"/>
626 The following example associates a specific name/value combination
627 with a number of targets. The targets below are access-restricted,
628 and can only be used by users with special credentials.
630 <settings name="pz:allow" value="0">
631 <set target="funkytarget.com:210/*"/>
632 <set target="commercial.com:2100/expensiveDb"/>
639 <refsect2><title>RESERVED SETTING NAMES</title>
641 The following setting names are reserved by Pazpar2 to control the
642 behavior of the client function.
647 <term>pz:cclmap:xxx</term>
650 This establishes a CCL field definition or other setting, for
651 the purpose of mapping end-user queries. XXX is the field or
652 setting name, and the value of the setting provides parameters
653 (e.g. parameters to send to the server, etc.). Please consult
654 the YAZ manual for a full overview of the many capabilities of
655 the powerful and flexible CCL parser.
658 Note that it is easy to establish a set of default parameters,
659 and then override them individually for a given target.
664 <term>pz:requestsyntax</term>
667 This specifies the record syntax to use when requesting
668 records from a given server. The value can be a symbolic name like
669 marc21 or xml, or it can be a Z39.50-style dot-separated OID.
674 <term>pz:elements</term>
677 The element set name to be used when retrieving records from a
683 <term>pz:piggyback</term>
686 Piggybacking enables the server to retrieve records from the
687 server as part of the search response in Z39.50. Almost all
688 servers support this (or fail it gracefully), but a few
689 servers will produce undesirable results.
690 Set to '1' to enable piggybacking, '0' to disable it. Default
691 is 1 (piggybacking enabled).
696 <term>pz:nativesyntax</term>
699 The representation (syntax) of the retrieval records. Currently
700 recognized values are iso2709 and xml.
703 For iso2709, can also specify a native character set, e.g. "iso2709;latin-1".
704 If no character set is provided, MARC-8 is assumed.
707 If pz:nativesyntax is not specified, pazpar2 will attempt to determine
708 the value based on the response from the server.
714 <term>pz:queryencoding</term>
717 The encoding of the search terms that a target accepts. Most
718 targets do not honor UTF-8 in which case this needs to be specified.
719 Each term in a query will be converted if this setting is given.
728 Provides the path of an XSLT stylesheet which will be used to
729 map incoming records to the internal representation.
734 <term>pz:authentication</term>
737 Sets an authentication string for a given server. See the section on
738 authorization and authentication for discussion.
743 <term>pz:allow</term>
746 Allows or denies access to the resources it is applied to. Possible
747 values are '0' and '1'. The default is '1' (allow access to this resource).
748 See the manual section on authorization and authentication for discussion
749 about how to use this setting.
754 <term>pz:maxrecs</term>
757 Controls the maximum number of records to be retrieved from a
758 server. The default is 100.
766 This setting can't be 'set' -- it contains the ID (normally
767 ZURL) for a given target, and is useful for filtering --
768 specifically when you want to select one or more specific
769 targets in the search command.
774 <term>pz:zproxy</term>
777 The 'pz:zproxy' setting has the value syntax
778 'host.internet.adress:port', it is used to tunnel Z39.50
779 requests through the named Z39.50 proxy.
785 <term>pz:apdulog</term>
788 If the 'pz:apdulog' setting is defined and has other value than 0,
789 then Z39.50 APDUs are written to the log.
798 This setting enables SRU/SRW support. It has three possible settings.
799 'get', enables SRU access through GET requests. 'post' enables SRU/POST
800 support, less commonly supported, but useful if very large requests are
801 to be submitted. 'srw' enables the SRW variation of the protocol.
807 <term>pz:sru_version</term>
810 This allows SRU version to be specified. If unset Pazpar2
811 will the default of YAZ (currently 1.2). Should be set
818 <term>pz:pqf_prefix</term>
821 Allows you to specify an arbitrary PQF query language substring. The provided
822 string is prefixed the user's query after it has been normalized to PQF
823 internally in pazpar2. This allows you to attach complex 'filters' to
824 queries for a gien target, sometimes necessary to select sub-catalogs
825 in union catalog systems, etc.
833 <refsect1><title>SEE ALSO</title>
836 <refentrytitle>pazpar2</refentrytitle>
837 <manvolnum>8</manvolnum>
840 <refentrytitle>yaz-icu</refentrytitle>
841 <manvolnum>1</manvolnum>
844 <refentrytitle>pazpar2_protocol</refentrytitle>
845 <manvolnum>7</manvolnum>
850 <!-- Keep this comment at the end of the file
855 sgml-minimize-attributes:nil
856 sgml-always-quote-attributes:t
859 sgml-parent-document:nil
860 sgml-local-catalogs: nil
861 sgml-namecase-general:t