- <variablelist> <!-- Level 2 -->
- <varlistentry><term>metadata</term>
- <listitem>
- <para>
- One of these elements is required for every data element in
- the internal representation of the record (see
- <xref linkend="data_model"/>. It governs
- subsequent processing as pertains to sorting, relevance
- ranking, merging, and display of data elements. It supports
- the following attributes:
- </para>
-
- <variablelist> <!-- level 3 -->
- <varlistentry><term>name</term>
- <listitem>
- <para>
- This is the name of the data element. It is matched
- against the 'type' attribute of the
- 'metadata' element
- in the normalized record. A warning is produced if
- metdata elements with an unknown name are
- found in the
- normalized record. This name is also used to
- represent
- data elements in the records returned by the
- webservice API, and to name sort lists and browse
- facets.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry><term>type</term>
- <listitem>
- <para>
- The type of data element. This value governs any
- normalization or special processing that might take
- place on an element. Possible values are 'generic'
- (basic string), 'year' (a range is computed if
- multiple years are found in the record). Note: This
- list is likely to increase in the future.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry><term>brief</term>
- <listitem>
- <para>
- If this is set to 'yes', then the data element is
- includes in brief records in the webservice API. Note
- that this only makes sense for metadata elements that
- are merged (see below). The default value is 'no'.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry><term>sortkey</term>
- <listitem>
- <para>
- Specifies that this data element is to be used for
- sorting. The possible values are 'numeric' (numeric
- value), 'skiparticle' (string; skip common, leading
- articles), and 'no' (no sorting). The default value is
- 'no'.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry><term>rank</term>
- <listitem>
- <para>
- Specifies that this element is to be used to
- help rank
- records against the user's query (when ranking is
- requested). The value is an integer, used as a
- multiplier against the basic TF*IDF score. A value of
- 1 is the base, higher values give additional
- weight to
- elements of this type. The default is '0', which
- excludes this element from the rank calculation.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry><term>termlist</term>
- <listitem>
- <para>
- Specifies that this element is to be used as a
- termlist, or browse facet. Values are tabulated from
- incoming records, and a highscore of values (with
- their associated frequency) is made available to the
- client through the webservice API.
- The possible values
- are 'yes' and 'no' (default).
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry><term>merge</term>
- <listitem>
- <para>
- This governs whether, and how elements are extracted
- from individual records and merged into cluster
- records. The possible values are: 'unique' (include
- all unique elements), 'longest' (include only the
- longest element (strlen), 'range' (calculate a range
- of values across al matching records), 'all' (include
- all elements), or 'no' (don't merge; this is the
- default);
- </para>
- </listitem>
- </varlistentry>
- </variablelist> <!-- attributes to metadata -->
-
- </listitem>
- </varlistentry>
- </variablelist> <!-- Data elements in service directive -->
- </listitem>
- </varlistentry>
- </variablelist> <!-- Data elements in server directive -->
- </refsect2>
-
- </refsect1>
-
- <refsect1><title>EXAMPLE</title>
- <para>Below is a working example configuration:
- <screen><![CDATA[
+ <varlistentry>
+ <term>termlist</term>
+ <listitem>
+ <para>
+ Specifies that this element is to be used as a
+ termlist, or browse facet. Values are tabulated from
+ incoming records, and a highscore of values (with
+ their associated frequency) is made available to the
+ client through the webservice API.
+ The possible values
+ are 'yes' and 'no' (default).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>merge</term>
+ <listitem>
+ <para>
+ This governs whether, and how elements are extracted
+ from individual records and merged into cluster
+ records. The possible values are: 'unique' (include
+ all unique elements), 'longest' (include only the
+ longest element (strlen), 'range' (calculate a range
+ of values across all matching records), 'all' (include
+ all elements), or 'no' (don't merge; this is the
+ default);
+ </para>
+ <para>
+ Pazpar 1.6.24 also offers a new value for merge, 'first', which
+ is like 'all' but only takes all from first database that returns
+ the particular metadata field.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>mergekey</term>
+ <listitem>
+ <para>
+ If set to '<literal>required</literal>', the value of this
+ metadata element is appended to the resulting mergekey if
+ the metadata is present in a record instance.
+ If the metadata element is not present, the a unique mergekey
+ will be generated instead.
+ </para>
+ <para>
+ If set to '<literal>optional</literal>', the value of this
+ metadata element is appended to the resulting mergekey if the
+ the metadata is present in a record instance. If the metadata
+ is not present, it will be empty.
+ </para>
+ <para>
+ If set to '<literal>no</literal>' or the mergekey attribute is
+ omitted, the metadata will not be used in the creation of a
+ mergekey.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term id="facetrule">facetrule</term>
+ <listitem>
+ <para>
+ Specifies the ICU rule set to be used for normalizing
+ facets. If facetrule is omitted from metadata, the
+ rule set 'facet' is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term id="limitcluster">limitcluster</term>
+ <listitem>
+ <para>
+ Allow a limit on merged metadata. The value of this attribute
+ is the name of actual metadata content to be used for matching
+ (most often same name as metadata name).
+ </para>
+ <note>
+ <para>
+ Requires Pazpar2 1.6.23 or later.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term id="metadata_limitmap">limitmap</term>
+ <listitem>
+ <para>
+ Specifies a default limitmap for this field. This is to avoid mass
+ configuring of targets. However it is important to review/do
+ this on a per target since it is usually target-specific.
+ See limitmap for format.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term id="metadata_facetmap">facetmap</term>
+ <listitem>
+ <para>
+ Specifies a default facetmap for this field. This is to avoid mass
+ configuring of targets. However it is important to review/do
+ this on a per target since it is usually target-specific.
+ See facetmap for format.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>setting</term>
+ <listitem>
+ <para>
+ This attribute allows you to make use of static database
+ settings in the processing of records. Three possible values
+ are allowed. 'no' is the default and doesn't do anything.
+ 'postproc' copies the value of a setting with the same name
+ into the output of the normalization stylesheet(s). 'parameter'
+ makes the value of a setting with the same name available
+ as a parameter to the normalization stylesheet, so you
+ can further process the value inside of the stylesheet, or use
+ the value to decide how to deal with other data values.
+ </para>
+ <para>
+ The purpose of using settings in this way can either be to
+ control the behavior of normalization stylesheet in a database-
+ dependent way, or to easily make database-dependent values
+ available to display-logic in your user interface, without having
+ to implement complicated interactions between the user interface
+ and your configuration system.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist> <!-- attributes to metadata -->
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term id="servicexslt" xreflabel="xslt">xslt</term>
+ <listitem>
+ <para>
+ Defines a XSLT stylesheet. The <literal>xslt</literal>
+ element takes exactly one attribute <literal>id</literal>
+ which names the stylesheet. This can be referred to in target
+ settings <xref linkend="pzxslt"/>.
+ </para>
+ <para>
+ The content of the xslt element is the embedded stylesheet XML
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term id="icuchain" xreflabel="icu_chain">icu_chain</term>
+ <listitem>
+ <para>
+ Specifies a named ICU rule set. The icu_chain element must include
+ attribute 'id' which specifies the identifier (name) for the ICU
+ rule set.
+ Pazpar2 uses the particular rule sets for particular purposes.
+ Rule set 'relevance' is used to normalize
+ terms for relevance ranking. Rule set 'sort' is used to
+ normalize terms for sorting. Rule set 'mergekey' is used to
+ normalize terms for making a mergekey and, finally. Rule set 'facet'
+ is normally used to normalize facet terms, unless
+ <xref linkend="facetrule">facetrule</xref> is given for a
+ metadata field.
+ </para>
+ <para>
+ The icu_chain element must also include a 'locale'
+ attribute which must be set to one of the locale strings
+ defined in ICU. The child elements listed below can be
+ in any order, except the 'index' element which logically
+ belongs to the end of the list. The stated tokenization,
+ transformation and charmapping instructions are performed
+ in order from top to bottom.
+ </para>
+ <variablelist> <!-- Level 2 -->
+ <varlistentry>
+ <term>casemap</term>
+ <listitem>
+ <para>
+ The attribute 'rule' defines the direction of the
+ per-character casemapping, allowed values are "l"
+ (lower), "u" (upper), "t" (title).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>transform</term>
+ <listitem>
+ <para>
+ Normalization and transformation of tokens follows
+ the rules defined in the 'rule' attribute. For
+ possible values we refer to the extensive ICU
+ documentation found at the
+ <ulink url="&url.icu.transform;">ICU
+ transformation</ulink> home page. Set filtering
+ principles are explained at the
+ <ulink url="&url.icu.unicode.set;">ICU set and
+ filtering</ulink> page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>tokenize</term>
+ <listitem>
+ <para>
+ Tokenization is the only rule in the ICU chain
+ which splits one token into multiple tokens. The
+ 'rule' attribute may have the following values:
+ "s" (sentence), "l" (line-break), "w" (word), and
+ "c" (character), the later probably not being
+ very useful in a pruning Pazpar2 installation.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ From Pazpar2 version 1.1 the ICU wrapper from YAZ is used.
+ Refer to the <ulink url="&url.yaz.yaz-icu;">yaz-icu</ulink>
+ utility for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>relevance</term>
+ <listitem>
+ <para>
+ Specifies the ICU rule set used for relevance ranking.
+ The child element of 'relevance' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+ <screen>
+ <icu_chain id="relevance" locale="en">..<icu_chain>
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>sort</term>
+ <listitem>
+ <para>
+ Specifies the ICU rule set used for sorting.
+ The child element of 'sort' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+ <screen>
+ <icu_chain id="sort" locale="en">..<icu_chain>
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>mergekey</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's mergekey.
+ The child element of 'mergekey' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+ <screen>
+ <icu_chain id="mergekey" locale="en">..<icu_chain>
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>facet</term>
+ <listitem>
+ <para>
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's facets.
+ The child element of 'facet' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+ <screen>
+ <icu_chain id="facet" locale="en">..<icu_chain>
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>ccldirective</term>
+ <listitem>
+ <para>
+ Customizes the CCL parsing (interpretation of query parameter
+ in search).
+ The name and value of the CCL directive is gigen by attributes
+ 'name' and 'value' respectively. Refer to possible list of names
+ in the
+ <ulink
+ url="http://www.indexdata.com/yaz/doc/tools.html#ccl.directives.table">
+ YAZ manual
+ </ulink>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="service-rank">
+ <term>rank</term>
+ <listitem>
+ <para>
+ Customizes the ranking (relevance) algorithm. Also known as
+ rank tweaks. The rank element
+ accepts the following attributes - all being optional:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>cluster</term>
+ <listitem>
+ <para>
+ Attribute 'cluster' is a boolean
+ that controls whether Pazpar2 should boost ranking for merged
+ records. Is 'yes' by default. A value of 'no' will make
+ Pazpar2 average ranking of each record in a cluster.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>debug</term>
+ <listitem>
+ <para>
+ Attribute 'debug' is a boolean
+ that controls whether Pazpar2 should include details
+ about ranking for each document in the show command's
+ response. Enable by using value "yes", disable by using
+ value "no" (default).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>follow</term>
+ <listitem>
+ <para>
+ Attribute 'follow' is a a floating point number greater than
+ or equal to 0. A positive number will boost weight for terms
+ that occur close to each other (proximity, distance).
+ A value of 1, will double the weight if two terms are in
+ proximity distance of 1 (next to each other). The default
+ value of 'follow' is 0 (order will not affect weight).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lead</term>
+ <listitem>
+ <para>
+ Attribute 'lead' is a floating point number.
+ It controls if term weight should be reduced by position
+ from start in a metadata field. A positive value of 'lead'
+ will reduce weight as it apperas further away from the lead
+ of the field. Default value is 0 (no reduction of weight by
+ position).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>length</term>
+ <listitem>
+ <para>
+ Attribute 'length' determines how/if term weight should be
+ divided by lenght of metadata field. A value of "linear"
+ divide by length. A value of "log" will divide by log2(length).
+ A value of "none" will leave term weight as is (no division).
+ Default value is "linear".
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ Refer to <xref linkend="relevance_ranking"/> to see how
+ these tweaks are used in computation of score.
+ </para>
+ <para>
+ Customization of ranking algorithm was introduced with
+ Pazpar2 1.6.18. The semantics of some of the fields changed
+ in versions up to 1.6.22.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="sort-default">
+ <term>sort-default</term>
+ <listitem>
+ <para>
+ Specifies the default sort criteria (default 'relevance'),
+ which previous was hard-coded as default criteria in search.
+ This is a fix/work-around to avoid re-searching when using
+ target-based sorting. In order for this to work efficient,
+ the search must also have the sort critera parameter; otherwise
+ pazpar2 will do re-searching on search criteria changes, if
+ changed between search and show command.
+ </para>
+ <para>
+ This configuration was added in pazpar2 1.6.20.
+ </para>
+ </listitem>
+ </varlistentry>
+
+<!--
+ <varlistentry>
+ <term>set</term>
+ <listitem>
+ <para>
+ Specifies a variable that will be inherited by all targets defined in settings
+ <screen>
+ <set name="test" value="en"..<set>
+ </screen>
+ </para>
+ </listitem>
+ </varlistentry>
+-->
+ <varlistentry>
+ <term>settings</term>
+ <listitem>
+ <para>
+ Specifies target settings for this service. Refer to
+ <xref linkend="target_settings"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>timeout</term>
+ <listitem>
+ <para>
+ Specifies timeout parameters for this service.
+ The <literal>timeout</literal>
+ element supports the following attributes:
+ <literal>session</literal>, <literal>z3950_operation</literal>,
+ <literal>z3950_session</literal> which specifies
+ 'session timeout', 'Z39.50 operation timeout',
+ 'Z39.50 session timeout' respectively. The Z39.50 operation
+ timeout is the time Pazpar2 will wait for an active Z39.50/SRU
+ operation before it gives up (times out). The Z39.50 session
+ time out is the time Pazpar2 will keep the session alive for
+ an idle session (no operation).
+ </para>
+ <para>
+ The following is recommended but not required:
+ z3950_operation (30) < session (60) < z3950_session (180) .
+ The default values are given in parantheses.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist> <!-- Data elements in service directive -->
+ </listitem>
+ </varlistentry>
+ </variablelist> <!-- Data elements in server directive -->
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>EXAMPLE</title>
+ <para>
+ Below is a working example configuration:
+ </para>
+ <screen>
+ <![CDATA[