%local;
<!ENTITY % entities SYSTEM "entities.ent">
%entities;
- <!ENTITY % common SYSTEM "common/common.ent">
- %common;
+ <!ENTITY % idcommon SYSTEM "common/common.ent">
+ %idcommon;
]>
-<!-- $Id: pazpar2_conf.xml,v 1.20 2007-04-11 03:34:11 quinn Exp $ -->
+<!-- $Id: pazpar2_conf.xml,v 1.26 2007-06-06 12:02:48 marc Exp $ -->
<refentry id="pazpar2_conf">
<refentryinfo>
<productname>Pazpar2</productname>
</varlistentry>
<varlistentry>
- <term>zproxy</term>
+ <term>icu_chain</term>
<listitem>
<para>
- If this item is given, pazpar2 will send all Z39.50
- packages through this Z39.50 proxy server.
- At least one of the 'host' and 'post' attributes is required.
- The 'host' attribute may contain both host name and port
- number, seperated by a colon ':', or only the host name.
- An empty 'host' attribute sets the Z39.50 host address
- to 'localhost'.
+ Definition of ICU tokenization and normalization rules
+ are used if ICU support is compiled in. The 'id'
+ attribute is currently not used, and the 'locale'
+ attribute must be set to one of the locale strings
+ defined in ICU. The child elements listed below can be
+ in any order, except the 'index' element which logically
+ belongs to the end of the list. The stated tokenization,
+ normalization and charmapping instructions are performed
+ in order from top to bottom.
</para>
+ <variablelist> <!-- Level 2 -->
+ <varlistentry><term>casemap</term>
+ <listitem>
+ <para>
+ The attribure 'rule' defines the direction of the
+ per-character casemapping, allowed values are "l"
+ (lower), "u" (upper), "t" (title).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>normalize</term>
+ <listitem>
+ <para>
+ Normalization and transformation of tokens follows
+ the rules defined in the 'rule' attribute. For
+ possible values we refer to the extensive ICU
+ documentation found at the
+ <ulink url="&url.icu.transform;">ICU
+ transformation</ulink> home page. Set filtering
+ principles are explained at the
+ <ulink url="&url.icu.unicode.set;">ICU set and
+ filtering</ulink> page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>tokenize</term>
+ <listitem>
+ <para>
+ Tokenization is the only rule in the ICU chain
+ which splits one token into multiple tokens. The
+ 'rule' attribute may have the following values:
+ "s" (sentence), "l" (line-break), "w" (word), and
+ "c" (character), the later probably not beeing
+ very useful in a runing pazpar2 installation.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry><term>index</term>
+ <listitem>
+ <para>
+ Finally the 'index' element instruction - without
+ any 'rule' attribute - is used to store the tokens
+ after chain processing in the relevance ranking
+ unit of Pazpar2. It will always be the last
+ instruction in the chain.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</listitem>
</varlistentry>
<listitem>
<para>
This is the name of the data element. It is matched
- against the 'type' attribute of the 'metadata' element
+ against the 'type' attribute of the
+ 'metadata' element
in the normalized record. A warning is produced if
- metdata elements with an unknown name are found in the
- normalized record. This name is also used to represent
+ metdata elements with an unknown name are
+ found in the
+ normalized record. This name is also used to
+ represent
data elements in the records returned by the
webservice API, and to name sort lists and browse
facets.
<varlistentry><term>rank</term>
<listitem>
<para>
- Specifies that this element is to be used to help rank
+ Specifies that this element is to be used to
+ help rank
records against the user's query (when ranking is
requested). The value is an integer, used as a
multiplier against the basic TF*IDF score. A value of
- 1 is the base, higher values give additional weight to
+ 1 is the base, higher values give additional
+ weight to
elements of this type. The default is '0', which
excludes this element from the rank calculation.
</para>
termlist, or browse facet. Values are tabulated from
incoming records, and a highscore of values (with
their associated frequency) is made available to the
- client through the webservice API. The possible values
+ client through the webservice API.
+ The possible values
are 'yes' and 'no' (default).
</para>
</listitem>
<listen port="9004"/>
<proxy host="us1.indexdata.com" myurl="us1.indexdata.com"/>
- <!-- <zproxy host="localhost" port="9000"/> -->
- <!-- <zproxy host="localhost:9000"/> -->
- <!-- <zproxy port="9000"/> -->
+ <!-- optional ICU ranking configuration example -->
+ <!--
+ <icu_chain id="el:word" locale="el">
+ <normalize rule="[:Control:] Any-Remove"/>
+ <tokenize rule="l"/>
+ <normalize rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
+ <casemap rule="l"/>
+ <index/>
+ </icu_chain>
+ -->
<service>
<metadata name="title" brief="yes" sortkey="skiparticle" merge="longest" rank="6"/>
<refsect1 id="target_settings"><title>TARGET SETTINGS</title>
<para>
Pazpar2 features a cunning scheme by which you can associate various
- kinds of attributes, or settings with search targets. This is done
- through XML files; each file can associate one or more settings
- with one or more targets. The file format is generic in nature,
- designed to support a wide range of application requirements. The
+ kinds of attributes, or settings with search targets. This can be done
+ through XML files which are read at startup; each file can associate
+ one or more settings with one or more targets. The file format is generic
+ in nature, designed to support a wide range of application requirements. The
settings can be purely technical things, like, how to perform a title
search against a given target, or it can associate arbitrary name=value
pairs with groups of targets -- for instance, if you would like to
overriden, to allow use of pazpar2 in a consortial or multi-library
environment, where different end-users may need to be represented to
some search targets in different ways. This, again, can be managed
- using an external database or other lookup mechanism.
+ using an external database or other lookup mechanism. Setting overrides
+ can be performed either using the 'init' or the 'settings' webservice
+ command (see XXX ref to pazpar2 protocol).
+ </para>
+
+ <para>
+ In fact, every setting that applies to a database (except pz:id, which
+ can only be used for filtering targets to use for a search) can be overriden
+ on a per-session basis. This allows the client to override specific CCL fields
+ for searching, etc., to meet the needs of a session or user.
+ </para>
+
+ <para>
+ Finally, as an extreme case of this, the webservice client can
+ introduce entirely new targets, on the fly, as part of the init or
+ settings command. This is useful if you desire to manage information
+ about your search targets in a separate application such as a database.
+ You do not need any static settings file whatsoever to run pazpar2 -- as
+ long as the webservice client is prepared to supply the necessary
+ information at the beginning of every session.
+ </para>
+
+ <para>
+ NOTE: The following discussion of practical issues related to session and settings
+ management are cast in terms of a user interface based on Ajax/Javascript
+ technology. It would apply equally well to many other kinds of browser-based logic.
+ </para>
+
+ <para>
+ Typically, a Javascript client is not allowed to directly alter the parameters
+ of a session. There are two reasons for this. One has to do with access
+ to information; typically, information about a user will be stored in a
+ system on the server side, or it will be accessible in some way from the server.
+ However, since the Javascript client cannot be entirely trusted (some hostile
+ agent might in fact 'pretend' to be a regular ws client), it is more robust
+ to control session sesttings from scripting that you run as part of your
+ webserver. Typically, this can be handled during the session initialization,
+ as follows:
+ </para>
+
+ <para>
+ Step 1: The Javascript client loads, and asks the webserver for a new pazpar2
+ session ID. This can be done using a Javascript call, for instance. Note that
+ it is possible to submit Ajax HTTPXmlRequest calls either to pazpar2 or to the
+ webserver that pazpar2 is proxying for. See (XXX Insert link to pazpar2 protocol).
+ </para>
+
+ <para>
+ Step 2: Code on the webserver authenticates the user, by database lookup,
+ LDAP access, NCIP, etc. Determines which resources the user has access to,
+ and any user-specific parameters that are to be applied during this session.
+ </para>
+
+ <para>
+ Step 3: The webserver initializes a new pazpar2 settings, and sets user-specific
+ parameters as necessary, using the init webservice command. A new session ID is
+ returned.
+ </para>
+
+ <para>
+ Step 4: The webserver returns this session ID to the Javascript client, which then
+ uses the session ID to submit searches, show results, etc.
+ </para>
+
+ <para>
+ Step 5: When the Javascript client ceases to use the session, pazpar2 destroys
+ any session-specific information.
</para>
<refsect2><title>SETTINGS FILE FORMAT</title>
<settings target="*">
<!-- This file introduces default settings for pazpar2 -->
- <!-- $Id: pazpar2_conf.xml,v 1.20 2007-04-11 03:34:11 quinn Exp $ -->
+ <!-- $Id: pazpar2_conf.xml,v 1.26 2007-06-06 12:02:48 marc Exp $ -->
<!-- mapping for unqualified search -->
<set name="pz:cclmap:term" value="u=1016 t=l,r s=al"/>
<listitem>
<para>
The element set name to be used when retrieving records from a
- server.
+ server (not yet implemented).
</para>
</listitem>
</varlistentry>
The representation (syntax) of the retrieval records. Currently
recognized values are iso2709 and xml.
</para>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>pz:encoding</term>
- <listitem>
<para>
- The native encoding (character set) of retrieval records. Can be anything
- recognized by conv, but typical values are marc8 and latin1.
- The default is UTF-8.
+ For iso2709, can also specify a native character set, e.g. "iso2709;latin-1".
+ If no character set is provided, MARC-8 is assumed.
</para>
</listitem>
</varlistentry>
<listitem>
<para>
Controls the maximum number of records to be retrieved from a
- server. The default is 100.
+ server. The default is 100 (not yet implemented).
</para>
</listitem>
</varlistentry>
</para>
</listitem>
</varlistentry>
+ <varlistentry>
+ <term>pz:zproxy</term>
+ <listitem>
+ <para>
+ The 'pz:zproxy' setting has the value syntax
+ 'host.internet.adress:port', it is used to tunnel Z39.50
+ requests through the named Z39.50 proxy.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect2>