1 <?xml version="1.0" standalone="no"?>
2 <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
3 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
5 <!ENTITY % local SYSTEM "local.ent">
7 <!ENTITY % entities SYSTEM "entities.ent">
9 <!ENTITY % idcommon SYSTEM "common/common.ent">
12 <refentry id="pazpar2_conf">
14 <productname>Pazpar2</productname>
15 <productnumber>&version;</productnumber>
16 <orgname>Index Data</orgname>
20 <refentrytitle>Pazpar2 conf</refentrytitle>
21 <manvolnum>5</manvolnum>
22 <refmiscinfo class="manual">File formats and conventions</refmiscinfo>
26 <refname>pazpar2_conf</refname>
27 <refpurpose>Pazpar2 Configuration</refpurpose>
32 <command>pazpar2.conf</command>
37 <title>DESCRIPTION</title>
39 The Pazpar2 configuration file, together with any referenced XSLT files,
40 govern Pazpar2's behavior as a client, and control the normalization and
41 extraction of data elements from incoming result records, for the
42 purposes of merging, sorting, facet analysis, and display.
46 The file is specified using the option -f on the Pazpar2 command line.
47 There is not presently a way to reload the configuration file without
48 restarting Pazpar2, although this will most likely be added some time
56 The configuration file is XML-structured. It must be well-formed XML. All
57 elements specific to Pazpar2 should belong to the namespace
58 <literal>http://www.indexdata.com/pazpar2/1.0</literal>
59 (this is assumed in the
60 following examples). The root element is named "<literal>pazpar2</literal>".
61 Under the root element are a number of elements which group categories of
62 information. The categories are described below.
65 <refsect2 id="config-threads">
66 <title>threads</title>
68 This section is optional and is supported for Pazpar2 version 1.3.1 and
69 later . It is identified by element "<literal>threads</literal>" which
70 may include one attribute "<literal>number</literal>" which specifies
71 the number of worker-threads that the Pazpar2 instance is to use.
72 A value of 0 (zero) disables worker-threads (all work is carried out
76 <refsect2 id="config-sockets">
77 <title>sockets</title>
79 This section is optional and is supported for Pazpar2 version 1.13.0 and
80 later . It is identified by element "<literal>sockets</literal>" which
81 may include one attribute "<literal>max</literal>" which specifies
82 the maximum number of sockets to be used by Pazpar2.
85 <refsect2 id="config-file">
88 This configuration takes one attribute <literal>path</literal> which
89 specifies a path to search for local files, such as XSLTs and settings.
90 The path is a colon separated list of directories. Its default value
91 is "<literal>.</literal>" which is equivalent to the location of the
92 main configuration file (where indeed the file element is given).
95 <refsect2 id="config-server">
98 This section governs overall behavior of a server endpoint. It is identified
99 by the element "server" which takes an optional attribute, "id", which
100 identifies this particular Pazpar2 server. Any string value for "id"
105 elements are described below. From Pazpar2 version 1.2 this is
106 a repeatable element.
108 <variablelist> <!-- level 1 -->
113 Configures the webservice -- this controls how you can connect
114 to Pazpar2 from your browser or server-side code. The
115 attributes 'host' and 'port' control the binding of the
116 server. The 'host' attribute can be used to bind the server to
117 a secondary IP address of your system, enabling you to run
118 Pazpar2 on port 80 alongside a conventional web server. You
119 can override this setting on the command line using the option -h.
128 If this item is given, Pazpar2 will forward all incoming HTTP
129 requests that do not contain the filename 'search.pz2' to the
130 host and port specified using the 'host' and 'port'
131 attributes. The 'myurl' attribute is required, and should provide
132 the base URL of the server. Generally, the HTTP URL for the host
133 specified in the 'listen' parameter. This functionality is
134 crucial if you wish to use
135 Pazpar2 in conjunction with browser-based code (JS, Flash,
136 applets, etc.) which operates in a security sandbox. Such code
137 can only connect to the same server from which the enclosing
138 HTML page originated. Pazpar2s proxy functionality enables you
139 to host all of the main pages (plus images, CSS, etc) of your
140 application on a conventional webserver, while efficiently
141 processing webservice requests for metasearch status, results,
148 <term>icu_chain</term>
151 Specifies character set normalization for relevancy / sorting /
152 mergekey and facets - for the server. These definitions serves as
153 default for services that don't have these given. For the meaning
154 of these settings refer to the
155 <xref linkend="icuchain"/> element inside service.
161 <term>relevance / sort / mergekey / facet</term>
164 Obsolete. Use element icu_chain instead.
170 <term>settings</term>
173 Specifies target settings for the server.. These settings serves
174 as default for all services which don't have these given.
175 The settings element requires one attribute 'src' which specifies
176 a settings file or a directory . If a directory is given all
177 files with suffix <filename>.xml</filename> is read from this
179 <xref linkend="target_settings"/> for more information.
185 <term id="service_conf">service</term>
188 This nested element controls the behavior of Pazpar2 with
189 respect to your data model. In Pazpar2, incoming records are
190 normalized, using XSLT, into an internal representation.
191 The 'service' section controls the further processing and
192 extraction of data from the internal representation, primarily
193 through the 'metadata' sub-element.
196 Pazpar2 version 1.2 and later allows multiple service elements.
197 Multiple services must be given a unique ID by specifying
198 attribute <literal>id</literal>.
199 A single service may be unnamed (service ID omitted). The
200 service ID is referred to in the
201 <link linkend="command-init"><literal>init</literal></link> webservice
202 command's <literal>service</literal> parameter.
205 <variablelist> <!-- Level 2 -->
207 <term>metadata</term>
210 One of these elements is required for every data element in
211 the internal representation of the record (see
212 <xref linkend="data_model"/>. It governs
213 subsequent processing as pertains to sorting, relevance
214 ranking, merging, and display of data elements. It supports
215 the following attributes:
218 <variablelist> <!-- level 3 -->
223 This is the name of the data element. It is matched
224 against the 'type' attribute of the
226 in the normalized record. A warning is produced if
227 metadata elements with an unknown name are
229 normalized record. This name is also used to
231 data elements in the records returned by the
232 webservice API, and to name sort lists and browse
242 The type of data element. This value governs any
243 normalization or special processing that might take
244 place on an element. Possible values are 'generic'
245 (basic string), 'year' (a range is computed if
246 multiple years are found in the record). Note: This
247 list is likely to increase in the future.
256 If this is set to 'yes', then the data element is
257 includes in brief records in the webservice API. Note
258 that this only makes sense for metadata elements that
259 are merged (see below). The default value is 'no'.
268 Specifies that this data element is to be used for
269 sorting. The possible values are 'numeric' (numeric
270 value), 'skiparticle' (string; skip common, leading
271 articles), and 'no' (no sorting). The default value is
275 When 'skiparticle' is used, some common articles from the
276 English and German languages are ignored. At present the
277 list is: 'the', 'den', 'der', 'die', 'des', 'an', 'a'.
283 <term id="metadata-rank">rank</term>
286 Specifies that this element is to be used to
288 records against the user's query (when ranking is
290 The valus is of the form
294 where M is an integer, used as a
295 weight against the basic TF*IDF score. A value of
296 1 is the base, higher values give additional weight to
297 elements of this type. The default is '0', which
298 excludes this element from the rank calculation.
301 F is a CCL field and N is the multipler for terms
302 that matches those part of the CCL field in search.
303 The F+N combo allows the system to use a different
304 multipler for a certain field. For example, a rank value of
305 "<literal>1 au 3</literal>" gives a multipler of 3 for
306 all terms part of the au(thor) terms and 1 for everything else.
309 For Pazpar2 1.6.13 and later, the rank may also defined
310 "per-document", by the normalization stylesheet.
313 The per field rank was introduced in Pazpar2 1.6.15. Earlier
314 releases only allowed a rank value M (simple integer).
316 See <xref linkend="relevance_ranking"/> for more
322 <term>termlist</term>
325 Specifies that this element is to be used as a
326 termlist, or browse facet. Values are tabulated from
327 incoming records, and a highscore of values (with
328 their associated frequency) is made available to the
329 client through the webservice API.
331 are 'yes' and 'no' (default).
340 This governs whether, and how elements are extracted
341 from individual records and merged into cluster
342 records. The possible values are: 'unique' (include
343 all unique elements), 'longest' (include only the
344 longest element (strlen), 'range' (calculate a range
345 of values across all matching records), 'all' (include
346 all elements), or 'no' (don't merge; this is the
350 Pazpar 1.6.24 also offers a new value for merge, 'first', which
351 is like 'all' but only takes all from first database that returns
352 the particular metadata field.
358 <term>mergekey</term>
361 If set to '<literal>required</literal>', the value of this
362 metadata element is appended to the resulting mergekey if
363 the metadata is present in a record instance.
364 If the metadata element is not present, the a unique mergekey
365 will be generated instead.
368 If set to '<literal>optional</literal>', the value of this
369 metadata element is appended to the resulting mergekey if the
370 the metadata is present in a record instance. If the metadata
371 is not present, it will be empty.
374 If set to '<literal>no</literal>' or the mergekey attribute is
375 omitted, the metadata will not be used in the creation of a
382 <term id="facetrule">facetrule</term>
385 Specifies the ICU rule set to be used for normalizing
386 facets. If facetrule is omitted from metadata, the
387 rule set 'facet' is used.
393 <term id="limitcluster">limitcluster</term>
396 Allow a limit on merged metadata. The value of this attribute
397 is the name of actual metadata content to be used for matching
398 (most often same name as metadata name).
402 Requires Pazpar2 1.6.23 or later.
409 <term id="metadata_limitmap">limitmap</term>
412 Specifies a default limitmap for this field. This is to avoid mass
413 configuring of targets. However it is important to review/do
414 this on a per target since it is usually target-specific.
415 See limitmap for format.
421 <term id="metadata_facetmap">facetmap</term>
424 Specifies a default facetmap for this field. This is to avoid mass
425 configuring of targets. However it is important to review/do
426 this on a per target since it is usually target-specific.
427 See facetmap for format.
433 <term id="icurule">icurule</term>
436 Specifies the ICU rule set to be used for normalizing
437 metadata text. The "display" part of the rule is kept
438 in the returned metadata record (record+show commands), the
439 end result - normalized text - is used for performing
440 within-cluster merge (unique, longest, etc). If the icurule is
441 omitted, type generic (text) is converted as follows:
442 any of the characters "<literal> ,/.:([</literal>" are
443 chopped of prefix and suffix of text content
444 <emphasis>unless</emphasis> it includes the
445 characters "<literal>://</literal>" (URL).
449 Requires Pazpar2 1.9.0 or later.
459 This attribute allows you to make use of static database
460 settings in the processing of records. Three possible values
461 are allowed. 'no' is the default and doesn't do anything.
462 'postproc' copies the value of a setting with the same name
463 into the output of the normalization stylesheet(s). 'parameter'
464 makes the value of a setting with the same name available
465 as a parameter to the normalization stylesheet, so you
466 can further process the value inside of the stylesheet, or use
467 the value to decide how to deal with other data values.
470 The purpose of using settings in this way can either be to
471 control the behavior of normalization stylesheet in a database-
472 dependent way, or to easily make database-dependent values
473 available to display-logic in your user interface, without having
474 to implement complicated interactions between the user interface
475 and your configuration system.
480 </variablelist> <!-- attributes to metadata -->
486 <term id="servicexslt" xreflabel="xslt">xslt</term>
489 Defines a XSLT stylesheet. The <literal>xslt</literal>
490 element takes exactly one attribute <literal>id</literal>
491 which names the stylesheet. This can be referred to in target
492 settings <xref linkend="pzxslt"/>.
495 The content of the xslt element is the embedded stylesheet XML
500 <term id="icuchain" xreflabel="icu_chain">icu_chain</term>
503 Specifies a named ICU rule set. The icu_chain element must include
504 attribute 'id' which specifies the identifier (name) for the ICU
506 Pazpar2 uses the particular rule sets for particular purposes.
507 Rule set 'relevance' is used to normalize
508 terms for relevance ranking. Rule set 'sort' is used to
509 normalize terms for sorting. Rule set 'mergekey' is used to
510 normalize terms for making a mergekey and, finally. Rule set 'facet'
511 is normally used to normalize facet terms, unless
512 <xref linkend="facetrule">facetrule</xref> is given for a
516 The icu_chain element must also include a 'locale'
517 attribute which must be set to one of the locale strings
518 defined in ICU. The child elements listed below can be
519 in any order, except the 'index' element which logically
520 belongs to the end of the list. The stated tokenization,
521 transformation and charmapping instructions are performed
522 in order from top to bottom.
524 <variablelist> <!-- Level 2 -->
529 The attribute 'rule' defines the direction of the
530 per-character casemapping, allowed values are "l"
531 (lower), "u" (upper), "t" (title).
536 <term>transform</term>
539 Normalization and transformation of tokens follows
540 the rules defined in the 'rule' attribute. For
541 possible values we refer to the extensive ICU
542 documentation found at the
543 <ulink url="&url.icu.transform;">ICU
544 transformation</ulink> home page. Set filtering
545 principles are explained at the
546 <ulink url="&url.icu.unicode.set;">ICU set and
547 filtering</ulink> page.
552 <term>tokenize</term>
555 Tokenization is the only rule in the ICU chain
556 which splits one token into multiple tokens. The
557 'rule' attribute may have the following values:
558 "s" (sentence), "l" (line-break), "w" (word), and
559 "c" (character), the later probably not being
560 very useful in a pruning Pazpar2 installation.
566 From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
567 Refer to the <ulink url="&url.yaz.yaz-icu;">yaz-icu</ulink>
568 utility for more information.
574 <term>relevance</term>
577 Specifies the ICU rule set used for relevance ranking.
578 The child element of 'relevance' must be 'icu_chain' and the
579 'id' attribute of the icu_chain is ignored. This
580 definition is obsolete and should be replaced by the equivalent
583 <icu_chain id="relevance" locale="en">..<icu_chain>
593 Specifies the ICU rule set used for sorting.
594 The child element of 'sort' must be 'icu_chain' and the
595 'id' attribute of the icu_chain is ignored. This
596 definition is obsolete and should be replaced by the equivalent
599 <icu_chain id="sort" locale="en">..<icu_chain>
606 <term>mergekey</term>
609 Specifies ICU tokenization and transformation rules
610 for tokens that are used in Pazpar2's mergekey.
611 The child element of 'mergekey' must be 'icu_chain' and the
612 'id' attribute of the icu_chain is ignored. This
613 definition is obsolete and should be replaced by the equivalent
616 <icu_chain id="mergekey" locale="en">..<icu_chain>
626 Specifies ICU tokenization and transformation rules
627 for tokens that are used in Pazpar2's facets.
628 The child element of 'facet' must be 'icu_chain' and the
629 'id' attribute of the icu_chain is ignored. This
630 definition is obsolete and should be replaced by the equivalent
633 <icu_chain id="facet" locale="en">..<icu_chain>
640 <term>ccldirective</term>
643 Customizes the CCL parsing (interpretation of query parameter
645 The name and value of the CCL directive is gigen by attributes
646 'name' and 'value' respectively. Refer to possible list of names
649 url="http://www.indexdata.com/yaz/doc/tools.html#ccl.directives.table">
656 <varlistentry id="service-rank">
660 Customizes the ranking (relevance) algorithm. Also known as
661 rank tweaks. The rank element
662 accepts the following attributes - all being optional:
669 Attribute 'cluster' is a boolean
670 that controls whether Pazpar2 should boost ranking for merged
671 records. Is 'yes' by default. A value of 'no' will make
672 Pazpar2 average ranking of each record in a cluster.
680 Attribute 'debug' is a boolean
681 that controls whether Pazpar2 should include details
682 about ranking for each document in the show command's
683 response. Enable by using value "yes", disable by using
684 value "no" (default).
692 Attribute 'follow' is a a floating point number greater than
693 or equal to 0. A positive number will boost weight for terms
694 that occur close to each other (proximity, distance).
695 A value of 1, will double the weight if two terms are in
696 proximity distance of 1 (next to each other). The default
697 value of 'follow' is 0 (order will not affect weight).
705 Attribute 'lead' is a floating point number.
706 It controls if term weight should be reduced by position
707 from start in a metadata field. A positive value of 'lead'
708 will reduce weight as it apperas further away from the lead
709 of the field. Default value is 0 (no reduction of weight by
718 Attribute 'length' determines how/if term weight should be
719 divided by lenght of metadata field. A value of "linear"
720 divide by length. A value of "log" will divide by log2(length).
721 A value of "none" will leave term weight as is (no division).
722 Default value is "linear".
728 Refer to <xref linkend="relevance_ranking"/> to see how
729 these tweaks are used in computation of score.
732 Customization of ranking algorithm was introduced with
733 Pazpar2 1.6.18. The semantics of some of the fields changed
734 in versions up to 1.6.22.
739 <varlistentry id="sort-default">
740 <term>sort-default</term>
743 Specifies the default sort criteria (default 'relevance'),
744 which previous was hard-coded as default criteria in search.
745 This is a fix/work-around to avoid re-searching when using
746 target-based sorting. In order for this to work efficient,
747 the search must also have the sort critera parameter; otherwise
748 pazpar2 will do re-searching on search criteria changes, if
749 changed between search and show command.
752 This configuration was added in pazpar2 1.6.20.
762 Specifies a variable that will be inherited by all targets defined in settings
764 <set name="test" value="en"..<set>
771 <term>settings</term>
774 Specifies target settings for this service. Refer to
775 <xref linkend="target_settings"/>.
780 <varlistentry id="service-timeout">
784 Specifies timeout parameters for this service.
785 The <literal>timeout</literal>
786 element supports the following attributes:
787 <literal>session</literal>, <literal>z3950_operation</literal>,
788 <literal>z3950_session</literal> which specifies
789 'session timeout', 'Z39.50 operation timeout',
790 'Z39.50 session timeout' respectively. The Z39.50 operation
791 timeout is the time Pazpar2 will wait for an active Z39.50/SRU
792 operation before it gives up (times out). The Z39.50 session
793 time out is the time Pazpar2 will keep the session alive for
794 an idle session (no operation).
797 The following is recommended but not required:
798 z3950_operation (30) < session (60) < z3950_session (180) .
799 The default values are given in parantheses.
802 The Z39.50 operation timeout may be set per database. Refer to
803 <xref linkend="pztimeout"/>.
807 </variablelist> <!-- Data elements in service directive -->
810 </variablelist> <!-- Data elements in server directive -->
815 <title>EXAMPLE</title>
817 Below is a working example configuration:
821 <?xml version="1.0" encoding="UTF-8"?>
822 <pazpar2 xmlns="http://www.indexdata.com/pazpar2/1.0">
824 <threads number="10"/>
826 <listen port="9004"/>
829 <metadata name="title" brief="yes" sortkey="skiparticle"
830 merge="longest" rank="6"/>
831 <metadata name="isbn" merge="unique"/>
832 <metadata name="date" brief="yes" sortkey="numeric"
833 type="year" merge="range" termlist="yes"/>
834 <metadata name="author" brief="yes" termlist="yes"
835 merge="longest" rank="2"/>
836 <metadata name="subject" merge="unique" termlist="yes" rank="3" limitmap="local:"/>
837 <metadata name="url" merge="unique"/>
838 <icu_chain id="relevance" locale="el">
839 <transform rule="[:Control:] Any-Remove"/>
841 <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
844 <settings src="mysettings"/>
845 <timeout session="60"/>
853 <refsect1 id="config-include">
854 <title>INCLUDE FACILITY</title>
856 The XML configuration may be partitioned into multiple files by using
857 the <literal>include</literal> element which takes a single attribute,
858 <literal>src</literal>. The <literal>src</literal> attribute is
859 regular Shell like glob-pattern. For example,
861 <include src="/etc/pazpar2/conf.d/*.xml"/>
865 The include facility requires Pazpar2 version 1.2.
869 <refsect1 id="target_settings">
870 <title>TARGET SETTINGS</title>
872 Pazpar2 features a cunning scheme by which you can associate various
873 kinds of attributes, or settings with search targets. This can be done
874 through XML files which are read at startup; each file can associate
875 one or more settings with one or more targets. The file format is generic
876 in nature, designed to support a wide range of application requirements.
877 The settings can be purely technical things, like, how to perform a title
878 search against a given target, or it can associate arbitrary name=value
879 pairs with groups of targets -- for instance, if you would like to
880 place all commercial full-text bases in one group for selection
881 purposes, or you would like to control what targets are accessible
882 to users by default. Per-database settings values can even be used
883 to drive sorting, facet/termlist generation, or end-user interface display
888 During startup, Pazpar2 will recursively read a specified directory
889 (can be identified in the pazpar2.cfg file or on the command line), and
890 process any settings files found therein.
894 Clients of the Pazpar2 webservice interface can selectively override
895 settings for individual targets within the scope of one session. This
896 can be used in conjunction with an external authentication system to
897 determine which resources are to be accessible to which users. Pazpar2
898 itself has no notion of end-users, and so can be used in conjunction
899 with any type of authentication system. Similarly, the authentication
900 tokens submitted to access-controlled search targets can similarly be
901 overridden, to allow use of Pazpar2 in a consortial or multi-library
902 environment, where different end-users may need to be represented to
903 some search targets in different ways. This, again, can be managed
904 using an external database or other lookup mechanism. Setting overrides
905 can be performed either using the
906 <link linkend="command-init">init</link> or the
907 <link linkend="command-settings">settings</link> webservice
912 In fact, every setting that applies to a database (except pz:id, which
913 can only be used for filtering targets to use for a search) can be overridden
914 on a per-session basis.
915 This allows the client to override specific CCL fields for
916 searching, etc., to meet the needs of a session or user.
920 Finally, as an extreme case of this, the webservice client can
921 introduce entirely new targets, on the fly, as part of the
922 <link linkend="command-init">init</link> or
923 <link linkend="command-settings">settings</link> command.
924 This is useful if you desire to manage information
925 about your search targets in a separate application such as a database.
926 You do not need any static settings file whatsoever to run Pazpar2 -- as
927 long as the webservice client is prepared to supply the necessary
928 information at the beginning of every session.
933 The following discussion of practical issues related to session
934 and settings management are cast in terms of a user interface based on
935 Ajax/Javascript technology. It would apply equally well to many other
936 kinds of browser-based logic.
941 Typically, a Javascript client is not allowed to directly alter the
942 parameters of a session. There are two reasons for this. One has to do
943 with access to information; typically, information about a user will
944 be stored in a system on the server side, or it will be accessible in
945 some way from the server. However, since the Javascript client cannot
946 be entirely trusted (some hostile agent might in fact 'pretend' to be
947 a regular ws client), it is more robust to control session settings
948 from scripting that you run as part of your webserver. Typically, this
949 can be handled during the session initialization, as follows:
953 Step 1: The Javascript client loads, and asks the webserver for a
954 new Pazpar2 session ID. This can be done using a Javascript call, for
955 instance. Note that it is possible to submit Ajax HTTPXmlRequest calls
956 either to Pazpar2 or to the webserver that Pazpar2 is proxying
957 for. See (XXX Insert link to Pazpar2 protocol).
961 Step 2: Code on the webserver authenticates the user, by database lookup,
962 LDAP access, NCIP, etc. Determines which resources the user has access to,
963 and any user-specific parameters that are to be applied during this session.
967 Step 3: The webserver initializes a new Pazpar2 settings, and sets
968 user-specific parameters as necessary, using the init webservice
969 command. A new session ID is returned.
973 Step 4: The webserver returns this session ID to the Javascript
974 client, which then uses the session ID to submit searches, show
979 Step 5: When the Javascript client ceases to use the session,
980 Pazpar2 destroys any session-specific information.
984 <title>SETTINGS FILE FORMAT</title>
986 Each file contains a root element named <settings>. It may
987 contain one or more <set> elements. The settings and set
988 elements may contain the following attributes. Attributes in the set
989 node overrides those in the setting root element. Each set node must
990 specify (directly, or inherited from the parent node) at least a
991 target, name, and value.
999 This specifies the search target to which this setting should be
1000 applied. Targets are identified by their Z39.50 URL, generally
1001 including the host, port, and database name, (e.g.
1002 <literal>bagel.indexdata.com:210/marc</literal>).
1003 Two wildcard forms are accepted:
1004 * (asterisk) matches all known targets;
1005 <literal>bagel.indexdata.com:210/*</literal> matches all
1006 known databases on the given host.
1009 A precedence system determines what happens if there are
1010 overlapping values for the same setting name for the same
1011 target. A setting for a specific target name overrides a
1012 setting which specifies target using a wildcard. This makes it
1013 easy to set defaults for all targets, and then override them
1014 for specific targets or hosts. If there are
1015 multiple overlapping settings with the same name and target
1016 value, the 'precedence' attribute determines what happens.
1019 For Pazpar2 1.6.4 or later, the target ID may be user-defined, in
1020 which case, the actual host, port, etc is given by setting
1021 <xref linkend="pzurl"/>.
1029 The name of the setting. This can be anything you like.
1030 However, Pazpar2 reserves a number of setting names for
1031 specific purposes, all starting with 'pz:', and it is a good
1032 idea to avoid that prefix if you make up your own setting
1033 names. See below for a list of reserved variables.
1041 The value of the setting. Generally, this can be anything you
1042 want -- however, some of the reserved settings may expect
1043 specific kinds of values.
1048 <term>precedence</term>
1051 This should be an integer. If not provided, the default value
1052 is 0. If two (or more) settings have the same content for
1053 target and name, the precedence value determines the outcome.
1054 If both settings have the same precedence value, they are both
1055 applied to the target(s). If one has a higher value, then the
1056 value of that setting is applied, and the other one is ignored.
1063 By setting defaults for target, name, or value in the root
1064 settings node, you can use the settings files in many different
1065 ways. For instance, you can use a single file to set defaults for
1066 many different settings, like search fields, retrieval syntaxes,
1067 etc. You can have one file per server, which groups settings for
1068 that server or target. You could also have one file which associates
1069 a number of targets with a given setting, for instance, to associate
1070 many databases with a given category or class that makes sense
1071 within your application.
1075 The following examples illustrate uses of the settings system to
1076 associate settings with targets to meet different requirements.
1080 The example below associates a set of default values that can be
1081 used across many targets. Note the wildcard for targets.
1082 This associates the given settings with all targets for which no
1083 other information is provided.
1085 <settings target="*">
1087 <!-- This file introduces default settings for pazpar2 -->
1089 <!-- mapping for unqualified search -->
1090 <set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
1092 <!-- field-specific mappings -->
1093 <set name="pz:cclmap:ti" value="u=4 s=al"/>
1094 <set name="pz:cclmap:su" value="u=21 s=al"/>
1095 <set name="pz:cclmap:isbn" value="u=7"/>
1096 <set name="pz:cclmap:issn" value="u=8"/>
1097 <set name="pz:cclmap:date" value="u=30 r=r"/>
1099 <set name="pz:limitmap:title" value="rpn:@attr 1=4 @attr 6=3"/>
1100 <set name="pz:limitmap:date" value="ccl:date"/>
1102 <!-- Retrieval settings -->
1104 <set name="pz:requestsyntax" value="marc21"/>
1105 <set name="pz:elements" value="F"/>
1107 <!-- Query encoding -->
1108 <set name="pz:queryencoding" value="iso-8859-1"/>
1110 <!-- Result normalization settings -->
1112 <set name="pz:nativesyntax" value="iso2709"/>
1113 <set name="pz:xslt" value="../etc/marc21.xsl"/>
1121 The next example shows certain settings overridden for one target,
1122 one which returns XML records containing DublinCore elements, and
1123 which furthermore requires a username/password.
1125 <settings target="funkytarget.com:210/db1">
1126 <set name="pz:requestsyntax" value="xml"/>
1127 <set name="pz:nativesyntax" value="xml"/>
1128 <set name="pz:xslt" value="../etc/dublincore.xsl"/>
1130 <set name="pz:authentication" value="myuser/password"/>
1136 The following example associates a specific name/value combination
1137 with a number of targets. The targets below are access-restricted,
1138 and can only be used by users with special credentials.
1140 <settings name="pz:allow" value="0">
1141 <set target="funkytarget.com:210/*"/>
1142 <set target="commercial.com:2100/expensiveDb"/>
1150 <title>RESERVED SETTING NAMES</title>
1152 The following setting names are reserved by Pazpar2 to control the
1153 behavior of the client function.
1159 <term>pz:allow</term>
1162 Allows or denies access to the resources it is applied to. Possible
1163 values are '0' and '1'.
1164 The default is '1' (allow access to this resource).
1170 <term>pz:apdulog</term>
1173 If the 'pz:apdulog' setting is defined and has other value than 0,
1174 then Z39.50 APDUs are written to the log.
1180 <term>pz:authentication</term>
1183 Sets an authentication string for a given database. For Z39.50,
1184 this is carried as part of the Initialize Request. In order to carry
1185 the information in the "open" elements, separate
1186 username and password with a slash (In Z39.50 it is a VisibleString).
1187 In order to carry the information in the idPass elements, separate
1188 username term, password term and, optionally, a group term with a
1190 If three terms are given, the order is
1191 <emphasis>user, group, password</emphasis>.
1192 If only two terms are given, the order is
1193 <emphasis>user, password</emphasis>.
1196 For HTTP based procotols, such as SRU and Apache Solr, the
1197 authentication string includes a username term and, optionally,
1199 Each term is separated by a single blank. The
1200 authentication information is passed either by HTTP basic
1201 authentication or via URL parameters. The mode of operation is
1202 determined by <literal>pz:authentication_mode</literal> setting.
1208 <term>pz:authentication_mode</term>
1211 Determines how authentication is carried in HTTP based protocols.
1212 Value may be "<literal>basic</literal>" or "<literal>url</literal>".
1218 <term>pz:block_timeout</term>
1221 (Not yet implemented).
1222 Specifies the time for which a block should be released anyway.
1228 <term>pz:cclmap:xxx</term>
1231 This establishes a CCL field definition or other setting, for
1232 the purpose of mapping end-user queries. XXX is the field or
1233 setting name, and the value of the setting provides parameters
1234 (e.g. parameters to send to the server, etc.). Please consult
1235 the YAZ manual for a full overview of the many capabilities of
1236 the powerful and flexible CCL parser.
1239 Note that it is easy to establish a set of default parameters,
1240 and then override them individually for a given target.
1246 <term>pz:elements</term>
1249 The element set name to be used when retrieving records from a
1256 <term>pz:extendrecs</term>
1259 If a show command goes to the boundary of a result set for a
1260 database - depends on sorting - and pz:extendrecs is set to a positive
1261 value. then Pazpar2 wait for show to fetch pz:extendrecs more
1262 records. This setting is best used if a database does native
1263 sorting, because the result set otherwise may be completely
1264 re-sorted during extended fetch.
1265 The default value of pz:extendrecs is 0 (no extended fetch).
1269 The pz:extendrecs setting appeared in Pazpar2 version 1.6.26.
1270 But the bahavior changed with the release of Pazpar2 1.6.29.
1277 <term>pz:facetmap:<replaceable>name</replaceable></term>
1280 Specifies that for field <replaceable>name</replaceable>, the target
1281 supports (native) facets. The value is the name of the
1282 field on the target.
1288 <term>pz:facetmap:split:<replaceable>name</replaceable></term>
1291 Like pz:facetmap, but makes Pazpar2 inspect the term value consisting
1292 of two items separated by colon. First item is the raw ID to be
1293 sent to database if limitmap on the field
1294 <replaceable>name</replaceable> is used. The second item is
1298 This facility was added in Pazpar2 version 1.11.0.
1307 This setting can't be 'set' -- it contains the ID (normally
1308 ZURL) for a given target, and is useful for filtering --
1309 specifically when you want to select one or more specific
1310 targets in the search command.
1315 <varlistentry id="limitmap">
1316 <term>pz:limitmap:<replaceable>name</replaceable></term>
1319 Specifies attributes for limiting a search to a field - using
1320 the limit parameter for search. It can be used to filter locally
1321 or remotely (search in a target). In some cases the mapping of
1322 a field to a value is identical to an existing cclmap field; in
1323 other cases the field must be specified in a different way - for
1324 example to match a complete field (rather than parts of a subfield).
1327 The value of limitmap may have one of three forms: referral to
1328 an existing CCL field, a raw PQF string or a local limit. Leading string
1329 determines type; either <literal>ccl:</literal> for CCL field,
1330 <literal>rpn:</literal> for PQF/RPN, or <literal>local:</literal>
1331 for filtering in Pazpar2. The local filtering may be followed
1332 by a field a metadata field (default is to use the name of the
1336 For Pazpar2 version 1.6.23 and later the limitmap may include multiple
1337 specifications, separated by <literal>,</literal> (comma).
1339 <literal>ccl:title,local:ltitle,rpn:@attr 1=4</literal>.
1343 The limitmap facility is supported for Pazpar2 version 1.6.0.
1344 Local filtering is supported in Pazpar2 1.6.6.
1351 <term>pz:maxrecs</term>
1354 Controls the maximum number of records to be retrieved from a
1355 server. The default is 100.
1361 <term>pz:memcached</term>
1364 If set and non-empty,
1365 <ulink url="&url.libmemcached;">libMemcached</ulink> will
1366 configured and enabled for the target.
1367 The value of this setting is same as the ZOOM option
1368 <literal>memcached</literal>, which in turn is the configuration
1369 string passed to the <function>memcached</function> function
1370 of <ulink url="&url.libmemcached;">libMemcached</ulink>.
1373 This setting is honored in Pazpar2 1.6.39 or later. Pazpar2 must
1374 be using YAZ version 5.0.13 or later.
1380 <term>pz:redis</term>
1383 If set and non-empty,
1384 <ulink url="&url.redis;">redis</ulink> will
1385 configured and enabled for the target.
1386 The value of this setting is exactly as the redis option for
1390 This setting is honored in Pazpar2 1.6.43 or later. Pazpar2 must
1391 be using YAZ version 5.2.0 or later.
1397 <term>pz:nativesyntax</term>
1400 Specifies how Pazpar2 shoule map retrieved records to XML. Currently
1401 supported values are <literal>xml</literal>,
1402 <literal>iso2709</literal> and <literal>txml</literal>.
1405 The value <literal>iso2709</literal> makes Pazpar2 convert retrieved
1406 MARC records to MARCXML. In order to convert to XML, the exact
1407 chacater set of the MARC must be known (if not, the resulting
1408 XML is probably not well-formed). The character set may be
1409 specified by adding:
1410 <literal>;</literal><replaceable>charset</replaceable> to
1411 <literal>iso2709</literal>. If omitted, a charset of
1412 MARC-8 is assumed. This is correct for most MARC21/USMARC records.
1415 The value <literal>txml</literal> is like <literal>iso2709</literal>
1416 except that records are converted to TurboMARC instead of MARCXML.
1419 The value <literal>xml</literal> is used if Pazpar2 retrieves
1420 records that are already XML (no conversion takes place).
1426 <term>pz:negotiation_charset</term>
1429 Sets character set for Z39.50 negotiation. Most targets do not support
1430 this, and some will even close connection if set (crash on server
1431 side or similar). If set, you probably want to set it to
1432 <literal>UTF-8</literal>.
1438 <term>pz:piggyback</term>
1441 Piggybacking enables the server to retrieve records from the
1442 server as part of the search response in Z39.50. Almost all
1443 servers support this (or fail it gracefully), but a few
1444 servers will produce undesirable results.
1445 Set to '1' to enable piggybacking, '0' to disable it. Default
1446 is 1 (piggybacking enabled).
1451 <term>pz:pqf_prefix</term>
1454 Allows you to specify an arbitrary PQF query language substring.
1455 The provided string is prefixed to the user's query after it has been
1456 normalized to PQF internally in pazpar2.
1457 This allows you to attach complex 'filters' to queries for a given
1458 target, sometimes necessary to select sub-catalogs
1459 in union catalog systems, etc.
1465 <term>pz:pqf_strftime</term>
1468 Allows you to extend a query with dates and operators.
1469 The provided string allows certain substitutions and serves as a
1471 The special two character sequence '%%' gets converted to the
1472 original query. Other characters leading with the percent sign are
1473 conversions supported by strftime.
1474 All other characters are copied verbatim. For example, the string
1475 <literal>@and @attr 1=30 @attr 2=3 %Y %%</literal>
1476 would search for current year combined with the original PQF (%%).
1479 This setting can also be used as more general alternative to
1480 pz:pqf_prefix -- a way of embedding the submitted query
1481 anywhere in the string rather than appending it to prefix. For
1482 example, if it is desired to omit all records satisfying the
1483 query <literal>@attr 1=pica.bib 0007</literal> then this
1484 subquery can be combined with the submitted query as the second
1485 argument of <literal>@andnot</literal> by using the
1486 pz:pqf_strftime value <literal>@not %% @attr 1=pica.bib
1493 <term>pz:preferred</term>
1496 Specifies that a target is preferred, e.g. possible local, faster
1497 target. Using block=preferred on <link linkend="command-show">
1498 show command</link> will wait for all these
1499 targets to return records before releasing the block.
1500 If no target is preferred, the block=preferred will identical to
1501 block=1, which release when one target has returned records.
1507 <term>pz:present_chunk</term>
1510 Controls the chunk size in present requests. Pazpar2 will
1511 make (maxrecs / chunk) request(s). The default is 20.
1517 <term>pz:queryencoding</term>
1520 The encoding of the search terms that a target accepts. Most
1521 targets do not honor UTF-8 in which case this needs to be specified.
1522 Each term in a query will be converted if this setting is given.
1528 <term>pz:recordfilter</term>
1531 Specifies a filter which allows Pazpar2 to only include
1532 records that meet a certain criteria in a result.
1533 Unmatched records will be ignored.
1534 The filter takes the form name, name~value, or name=value, which
1535 will include only records with metadata element (name) that has the
1536 substring (~value) given, or matches exactly (=value).
1537 If value is omitted all records with the named metadata element
1538 present will be included.
1543 <varlistentry id="requestsyntax">
1544 <term>pz:requestsyntax</term>
1547 This specifies the record syntax to use when requesting
1548 records from a given server. The value can be a symbolic name like
1549 marc21 or xml, or it can be a Z39.50-style dot-separated OID.
1555 <term>pz:sort</term>
1558 Specifies sort criteria to be applied to the result set.
1559 Only works for targets which support the sort service.
1564 <varlistentry id="pzsortmap">
1565 <term>pz:sortmap:<replaceable>field</replaceable></term>
1568 Specifies native sorting for a target where
1569 <replaceable>field</replaceable> is a sort criterion (see command
1570 show). The value has two components separated by a colon: strategy and
1571 native-field. Strategy is one of <literal>z3950</literal>,
1572 <literal>type7</literal>, <literal>cql</literal>,
1573 <literal>sru11</literal>, or <literal>embed</literal>.
1574 The second component, native-field, is the field that is recognized
1579 Only supported for Pazpar2 1.6.4 and later.
1589 This setting enables
1590 <ulink url="&url.sru;">SRU</ulink>/<ulink url="&url.solr;">Solr</ulink>
1592 It has four possible settings.
1593 'get', enables SRU access through GET requests. 'post' enables SRU/POST
1594 support, less commonly supported, but useful if very large requests are
1595 to be submitted. 'soap' enables the SRW (SRU over SOAP) variation of
1599 A value of 'solr' enables Solr client support. This is supported
1600 for Pazpar version 1.5.0 and later.
1606 <term>pz:sru_version</term>
1609 This allows SRU version to be specified. If unset Pazpar2
1610 will the default of YAZ (currently 1.2). Should be set
1611 to 1.1 or 1.2. For Solr, the current supported/tested version
1618 <term>pz:termlist_term_count</term>
1621 Specifies number of facet terms to be requested from the target.
1622 The default is unspecified e.g. server-decided. Also see pz:facetmap.
1628 <term>pz:termlist_term_factor</term>
1631 Specifies whether to use a factor for pazpar2 generated facets (1)
1633 When mixing locally generated (by the downloaded (pz:maxrecs) samples)
1634 facet with native (target-generated) facets, the later will
1635 dominated the dominate the facet list since they are generated
1636 based on the complete result set.
1637 By scaling up the facet count using the ratio between total hit
1638 count and the sample size,
1639 the total facet count can be approximated and thus better compared
1640 with native facets. This is not enabled by default.
1646 <varlistentry id="pztimeout">
1647 <term>pz:timeout</term>
1650 Specifies timeout for operation (eg search, and fetch) for
1651 a database. This overrides the z3650_operation timeout
1652 that is given for a service. See <xref linkend="service-timeout"/>.
1656 The timeout facility is supported for Pazpar2 version 1.8.4 and later.
1662 <varlistentry id="pzurl">
1666 Specifies URL for the target and overrides the target ID.
1670 <literal>pz:url</literal> is only recognized for
1671 Pazpar2 1.6.4 and later.
1678 <term id="pzxslt" xreflabel="pz:xslt">pz:xslt</term>
1681 Is a comma separated list of of stylesheet names that specifies
1682 how to convert incoming records to the internal representation.
1685 For each name, the embedded stylesheets (XSL) that comes with the
1686 service definition are consulted first and takes precedence over
1687 external files; see <xref linkend="servicexslt"/>
1688 of service definition).
1689 If the name does not match an embedded stylesheet it is
1690 considered a filename.
1693 The suffix of each file specifies the kind of tranformation.
1694 Suffix "<literal>.xsl</literal>" makes an XSL transform. Suffix
1695 "<literal>.mmap</literal>" will use the MMAP transform (described below).
1698 The special value "<literal>auto</literal>" will use a file
1699 which is the <link linkend="requestsyntax">pz:requestsyntax's</link>
1701 <literal>'.xsl'</literal>.
1704 When mapping MARC records, XSLT can be bypassed for increased
1705 performance with the alternate "MARC map" format. Provide the
1706 path of a file with extension ".mmap" containing on each line:
1708 <field> <subfield> <metadata element></programlisting>
1715 To map the field value specify a subfield of '$'. To store a
1716 concatenation of all subfields, specify a subfield of '*'.
1722 <term>pz:zproxy</term>
1725 The 'pz:zproxy' setting has the value syntax
1726 'host.internet.adress:port', it is used to tunnel Z39.50
1727 requests through the named Z39.50 proxy.
1737 <title>SEE ALSO</title>
1740 <refentrytitle>pazpar2</refentrytitle>
1741 <manvolnum>8</manvolnum>
1744 <refentrytitle>yaz-icu</refentrytitle>
1745 <manvolnum>1</manvolnum>
1748 <refentrytitle>pazpar2_protocol</refentrytitle>
1749 <manvolnum>7</manvolnum>
1754 <!-- Keep this comment at the end of the file
1757 nxml-child-indent: 1