X-Git-Url: http://lists.indexdata.dk/cgi-bin?a=blobdiff_plain;ds=sidebyside;f=doc%2Fpazpar2_conf.xml;h=887e894dd46a28f9417c98f436b0c7b8655715fd;hb=0615829971e16e247d81517747886e8e3c3e1f02;hp=17578a9715e373ac86872ef7a14ec2677bebef4e;hpb=f110bc50c58b63a6fe4eaaddf40a4789c27b83bf;p=pazpar2-moved-to-github.git
diff --git a/doc/pazpar2_conf.xml b/doc/pazpar2_conf.xml
index 17578a9..887e894 100644
--- a/doc/pazpar2_conf.xml
+++ b/doc/pazpar2_conf.xml
@@ -1,6 +1,6 @@
-
%local;
@@ -13,10 +13,13 @@
Pazpar2
&version;
+ Index Data
+
Pazpar2 conf
5
+ File formats and conventions
@@ -30,7 +33,8 @@
- DESCRIPTION
+
+ DESCRIPTION
The Pazpar2 configuration file, together with any referenced XSLT files,
govern Pazpar2's behavior as a client, and control the normalization and
@@ -46,7 +50,8 @@
- FORMAT
+
+ FORMAT
The configuration file is XML-structured. It must be well-formed XML. All
elements specific to Pazpar2 should belong to the namespace
@@ -57,24 +62,27 @@
information. The categories are described below.
- threads
-
- This section is optional and is supported for Pazpar2 version 1.3.1 and
- later . It is identified by element "threads" which
- may include one attribute "number" which specifies
- the number of worker-threads that the Pazpar2 instance is to use.
- A value of 0 (zero) disables worker-threads (all work is carried out
- in main thread).
-
+
+ threads
+
+ This section is optional and is supported for Pazpar2 version 1.3.1 and
+ later . It is identified by element "threads" which
+ may include one attribute "number" which specifies
+ the number of worker-threads that the Pazpar2 instance is to use.
+ A value of 0 (zero) disables worker-threads (all work is carried out
+ in main thread).
+
- server
+
+ server
This section governs overall behavior of a server endpoint. It is identified
by the element "server" which takes an optional attribute, "id", which
identifies this particular Pazpar2 server. Any string value for "id"
may be given.
- The data
+
+ The data
elements are described below. From Pazpar2 version 1.2 this is
a repeatable element.
@@ -118,13 +126,23 @@
- relevance / sort / mergekey
+ icu_chain
- Specifies character set normalization for relevancy / sorting
- and the mergekey - for the server. These definitions serves as
+ Specifies character set normalization for relevancy / sorting /
+ mergekey and facets - for the server. These definitions serves as
default for services that don't have these given. For the meaning
- of these settings refer to the "relevance" element inside service.
+ of these settings refer to the
+ element inside service.
+
+
+
+
+
+ relevance / sort / mergekey / facet
+
+
+ Obsolete. Use element icu_chain instead.
@@ -166,19 +184,21 @@
- metadata
+
+ metadata
One of these elements is required for every data element in
the internal representation of the record (see
. It governs
- subsequent processing as pertains to sorting, relevance
- ranking, merging, and display of data elements. It supports
- the following attributes:
+ subsequent processing as pertains to sorting, relevance
+ ranking, merging, and display of data elements. It supports
+ the following attributes:
- name
+
+ name
This is the name of the data element. It is matched
@@ -196,7 +216,8 @@
- type
+
+ type
The type of data element. This value governs any
@@ -209,7 +230,8 @@
- brief
+
+ brief
If this is set to 'yes', then the data element is
@@ -220,7 +242,8 @@
- sortkey
+
+ sortkey
Specifies that this data element is to be used for
@@ -232,7 +255,8 @@
- rank
+
+ rank
Specifies that this element is to be used to
@@ -248,7 +272,8 @@
- termlist
+
+ termlist
Specifies that this element is to be used as a
@@ -262,7 +287,8 @@
- merge
+
+ merge
This governs whether, and how elements are extracted
@@ -276,8 +302,9 @@
-
- mergekey
+
+
+ mergekey
If set to 'required', the value of this
@@ -300,7 +327,19 @@
- setting
+
+ facetrule
+
+
+ Specifies the ICU rule set to be used for normalizing
+ facets. If facetrule is omitted from metadata, the
+ rule set 'facet' is used.
+
+
+
+
+
+ setting
This attribute allows you to make use of static database
@@ -328,15 +367,26 @@
-
+
- relevance
+ icu_chain
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's relevance ranking.
- The 'id' attribute is currently not used, and the 'locale'
- attribute must be set to one of the locale strings
+ Specifies a named ICU rule set. The icu_chain element must include
+ attribute 'id' which specifies the identifier (name) for the ICU
+ rule set.
+ Pazpar2 uses the particular rule sets for particular purposes.
+ Rule set 'relevance' is used to normalize
+ terms for relevance ranking. Rule set 'sort' is used to
+ normalize terms for sorting. Rule set 'mergekey' is used to
+ normalize terms for making a mergekey and, finally. Rule set 'facet'
+ is normally used to normalize facet terms, unless
+ facetrule is given for a
+ metadata field.
+
+
+ The icu_chain element must also include a 'locale'
+ attribute which must be set to one of the locale strings
defined in ICU. The child elements listed below can be
in any order, except the 'index' element which logically
belongs to the end of the list. The stated tokenization,
@@ -344,7 +394,8 @@
in order from top to bottom.
- casemap
+
+ casemap
The attribute 'rule' defines the direction of the
@@ -353,7 +404,8 @@
- transform
+
+ transform
Normalization and transformation of tokens follows
@@ -361,14 +413,15 @@
possible values we refer to the extensive ICU
documentation found at the
ICU
- transformation home page. Set filtering
+ transformation home page. Set filtering
principles are explained at the
ICU set and
- filtering page.
+ filtering page.
- tokenize
+
+ tokenize
Tokenization is the only rule in the ICU chain
@@ -390,12 +443,33 @@
+ relevance
+
+
+ Specifies the ICU rule set used for relevance ranking.
+ The child element of 'relevance' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="relevance" locale="en">..<icu_chain>
+
+
+
+
+
+
sort
- Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's sorting. The contents
- is similar to that of relevance.
+ Specifies the ICU rule set used for sorting.
+ The child element of 'sort' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="sort" locale="en">..<icu_chain>
+
@@ -405,13 +479,36 @@
Specifies ICU tokenization and transformation rules
- for tokens that are used in Pazpar2's mergekey. The contents
- is similar to that of relevance.
+ for tokens that are used in Pazpar2's mergekey.
+ The child element of 'mergekey' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="mergekey" locale="en">..<icu_chain>
+
+ facet
+
+
+ Specifies ICU tokenization and transformation rules
+ for tokens that are used in Pazpar2's facets.
+ The child element of 'facet' must be 'icu_chain' and the
+ 'id' attribute of the icu_chain is ignored. This
+ definition is obsolete and should be replaced by the equivalent
+ construct:
+
+ <icu_chain id="facet" locale="en">..<icu_chain>
+
+
+
+
+
+
settings
@@ -444,68 +541,69 @@
-
-
-
-
- EXAMPLE
- Below is a working example configuration:
-
-
-
-
-
-
-
-
+ EXAMPLE
+
+ Below is a working example configuration:
+
+
+
+
+
+
+
+
+
+
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ]]>
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ]]>
+
- INCLUDE FACILITY
+
+ INCLUDE FACILITY
The XML configuration may be partitioned into multiple files by using
the include element which takes a single attribute,
src. The of the src attribute is
regular Shell like glob-pattern. For example,
- ]]>
+
+ ]]>
The include facility requires Pazpar2 version 1.2.
- TARGET SETTINGS
+
+ TARGET SETTINGS
Pazpar2 features a cunning scheme by which you can associate various
kinds of attributes, or settings with search targets. This can be done
@@ -552,7 +650,7 @@
on a per-session basis. This allows the client to override specific CCL fields
for searching, etc., to meet the needs of a session or user.
-
+
Finally, as an extreme case of this, the webservice client can
introduce entirely new targets, on the fly, as part of the
@@ -564,66 +662,70 @@
long as the webservice client is prepared to supply the necessary
information at the beginning of every session.
-
+
- The following discussion of practical issues related to session and settings
- management are cast in terms of a user interface based on Ajax/Javascript
- technology. It would apply equally well to many other kinds of browser-based logic.
+ The following discussion of practical issues related to session
+ and settings management are cast in terms of a user interface based on
+ Ajax/Javascript technology. It would apply equally well to many other
+ kinds of browser-based logic.
-
+
- Typically, a Javascript client is not allowed to directly alter the parameters
- of a session. There are two reasons for this. One has to do with access
- to information; typically, information about a user will be stored in a
- system on the server side, or it will be accessible in some way from the server.
- However, since the Javascript client cannot be entirely trusted (some hostile
- agent might in fact 'pretend' to be a regular ws client), it is more robust
- to control session settings from scripting that you run as part of your
- webserver. Typically, this can be handled during the session initialization,
- as follows:
+ Typically, a Javascript client is not allowed to directly alter the
+ parameters of a session. There are two reasons for this. One has to do
+ with access to information; typically, information about a user will
+ be stored in a system on the server side, or it will be accessible in
+ some way from the server. However, since the Javascript client cannot
+ be entirely trusted (some hostile agent might in fact 'pretend' to be
+ a regular ws client), it is more robust to control session settings
+ from scripting that you run as part of your webserver. Typically, this
+ can be handled during the session initialization, as follows:
-
+
- Step 1: The Javascript client loads, and asks the webserver for a new Pazpar2
- session ID. This can be done using a Javascript call, for instance. Note that
- it is possible to submit Ajax HTTPXmlRequest calls either to Pazpar2 or to the
- webserver that Pazpar2 is proxying for. See (XXX Insert link to Pazpar2 protocol).
-
-
+ Step 1: The Javascript client loads, and asks the webserver for a
+ new Pazpar2 session ID. This can be done using a Javascript call, for
+ instance. Note that it is possible to submit Ajax HTTPXmlRequest calls
+ either to Pazpar2 or to the webserver that Pazpar2 is proxying
+ for. See (XXX Insert link to Pazpar2 protocol).
+
+
Step 2: Code on the webserver authenticates the user, by database lookup,
LDAP access, NCIP, etc. Determines which resources the user has access to,
and any user-specific parameters that are to be applied during this session.
-
+
- Step 3: The webserver initializes a new Pazpar2 settings, and sets user-specific
- parameters as necessary, using the init webservice command. A new session ID is
- returned.
+ Step 3: The webserver initializes a new Pazpar2 settings, and sets
+ user-specific parameters as necessary, using the init webservice
+ command. A new session ID is returned.
-
+
- Step 4: The webserver returns this session ID to the Javascript client, which then
- uses the session ID to submit searches, show results, etc.
+ Step 4: The webserver returns this session ID to the Javascript
+ client, which then uses the session ID to submit searches, show
+ results, etc.
-
+
- Step 5: When the Javascript client ceases to use the session, Pazpar2 destroys
- any session-specific information.
+ Step 5: When the Javascript client ceases to use the session,
+ Pazpar2 destroys any session-specific information.
- SETTINGS FILE FORMAT
+
+ SETTINGS FILE FORMAT
Each file contains a root element named <settings>. It may
contain one or more <set> elements. The settings and set
- elements may contain the following attributes. Attributes in the set node
- overrides those in the setting root element. Each set node must
+ elements may contain the following attributes. Attributes in the set
+ node overrides those in the setting root element. Each set node must
specify (directly, or inherited from the parent node) at least a
target, name, and value.
-
+
target
@@ -686,7 +788,7 @@
-
+
By setting defaults for target, name, or value in the root
settings node, you can use the settings files in many different
@@ -698,80 +800,84 @@
many databases with a given category or class that makes sense
within your application.
-
+
The following examples illustrate uses of the settings system to
associate settings with targets to meet different requirements.
-
+
The example below associates a set of default values that can be
used across many targets. Note the wildcard for targets.
This associates the given settings with all targets for which no
other information is provided.
+
-
+
-
-
+
+
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
-
+
-
-
+
+
-
-
+
+
-
+
-
-
+
+
-
+
- ]]>
+ ]]>
-
+
The next example shows certain settings overridden for one target,
one which returns XML records containing DublinCore elements, and
which furthermore requires a username/password.
-
-
-
+
+
+
+
-
-
- ]]>
+
+
+ ]]>
-
+
The following example associates a specific name/value combination
with a number of targets. The targets below are access-restricted,
and can only be used by users with special credentials.
-
-
-
- ]]>
+
+
+
+
+ ]]>
-
+
-
- RESERVED SETTING NAMES
+
+
+ RESERVED SETTING NAMES
The following setting names are reserved by Pazpar2 to control the
behavior of the client function.
@@ -860,9 +966,9 @@
pz:queryencoding
- The encoding of the search terms that a target accepts. Most
- targets do not honor UTF-8 in which case this needs to be specified.
- Each term in a query will be converted if this setting is given.
+ The encoding of the search terms that a target accepts. Most
+ targets do not honor UTF-8 in which case this needs to be specified.
+ Each term in a query will be converted if this setting is given.
@@ -902,12 +1008,13 @@
performance with the alternate "MARC map" format. Provide the
path of a file with extension ".mmap" containing on each line:
- <field> <subfield> <metadata element>
+ <field> <subfield> <metadata element>
For example:
- 245 a title
- 500 $ description
- 773 * citation
+ 245 a title
+ 500 $ description
+ 773 * citation
+
To map the field value specify a subfield of '$'. To store a
concatenation of all subfields, specify a subfield of '*'.
@@ -927,9 +1034,10 @@
Allows or denies access to the resources it is applied to. Possible
- values are '0' and '1'. The default is '1' (allow access to this resource).
- See the manual section on authorization and authentication for discussion
- about how to use this setting.
+ values are '0' and '1'.
+ The default is '1' (allow access to this resource).
+ See the manual section on authorization and authentication for
+ discussion about how to use this setting.
@@ -988,8 +1096,8 @@
the protocol.
- A value of 'solr' anables SOLR client support. This is supported
- for Pazpar version 1.5.0 and later.
+ A value of 'solr' anables SOLR client support. This is supported
+ for Pazpar version 1.5.0 and later.
@@ -1000,7 +1108,7 @@
This allows SRU version to be specified. If unset Pazpar2
will the default of YAZ (currently 1.2). Should be set
- to 1.1 or 1.2.
+ to 1.1 or 1.2. For SOLR, the current supported/tested version is 1.4
@@ -1051,20 +1159,88 @@
Specifies a filter which allows Pazpar2 to only include
- records that meet a certain criteria in a result. Unmatched records
- will be ignored. The filter takes the form name[~value] , which
+ records that meet a certain criteria in a result.
+ Unmatched records will be ignored.
+ The filter takes the form name, name~value, or name=value, which
will include only records with metadata element (name) that has the
- substring (value) given. If value is omitted all records with the
- metadata present will be included.
+ substring (~value) given, or matches exactly (=value).
+ If value is omitted all records with the named metadata element
+ present will be included.
+
+
+ pz:preferred
+
+
+ Specifies that a target is preferred, e.g. possible local, faster
+ target. Using block=pref on show command will wait for all these
+ targets to return records before releasing the block.
+ If no target is preferred, the block=pref will identical to block=1,
+ which release when one target has returned records.
+
+
+
+
+
+ pz:block_timeout
+
+
+ (Not yet implemented).
+ Specifies the time for which a block should be released anyway.
+
+
+
+
+
+ pz:facetmap:name
+
+
+ Specifies that for field name, the target
+ supports (native) facets. The value is the name of the
+ field on the target.
+
+
+
+ At this point only SOLR targets have been tested with this
+ facility.
+
+
+
+
+
+
+ pz:limitmap:name
+
+
+ Specifies attributes for limiting a search to a field - using
+ the limit parameter for search. In some cases the mapping of
+ a field to a value is identical to an existing cclmap field; in
+ other cases the field must be specified in a different way - for
+ example to match a complete field (rather than parts of a subfield).
+
+
+ The value of limitmap may have one of two forms: referral to
+ an exisiting CCL field or a raw PQF string. Leading string
+ determines type; either ccl: for CCL field or
+ rpn: for PQF/RPN.
+
+
+
+ The limitmap facility is supported for Pazpar2 version 1.6.0.
+
+
+
+
+
-
+
- SEE ALSO
+
+ SEE ALSO
pazpar2
@@ -1083,15 +1259,7 @@