+
+ <section id="record-model-domxml-canonical-index-pi">
+ <title>Processing-instruction governed indexing format</title>
+
+ <para>The output of the processing instruction driven
+ indexing &acro.xslt; stylesheets must contain
+ processing instructions named
+ <literal>zebra-2.0</literal>.
+ The output of the &acro.xslt; indexing transformation is then
+ parsed using &acro.dom; methods, and the contained instructions are
+ performed on the <emphasis>elements and their
+ subtrees directly following the processing instructions</emphasis>.
+ </para>
+ <para>
+ For example, the output of the command
+ <screen>
+ xsltproc dom-index-pi.xsl marc-one.xml
+ </screen>
+ might look like this:
+ <screen>
+ <![CDATA[
+ <?xml version="1.0" encoding="UTF-8"?>
+ <?zebra-2.0 record id=11224466 rank=42?>
+ <record>
+ <?zebra-2.0 index control:0?>
+ <control>11224466</control>
+ <?zebra-2.0 index any:w title:w title:p title:s?>
+ <title>How to program a computer</title>
+ </record>
+ ]]>
+ </screen>
+ </para>
+ </section>
+
+ <section id="record-model-domxml-canonical-index-element">
+ <title>Magic element governed indexing format</title>
+
+ <para>The output of the indexing &acro.xslt; stylesheets must contain
+ certain elements in the magic
+ <literal>xmlns:z="http://indexdata.com/zebra-2.0"</literal>
+ namespace. The output of the &acro.xslt; indexing transformation is then
+ parsed using &acro.dom; methods, and the contained instructions are
+ performed on the <emphasis>magic elements and their
+ subtrees</emphasis>.
+ </para>
+ <para>
+ For example, the output of the command
+ <screen>
+ xsltproc dom-index-element.xsl marc-one.xml
+ </screen>
+ might look like this:
+ <screen>
+ <![CDATA[
+ <?xml version="1.0" encoding="UTF-8"?>
+ <z:record xmlns:z="http://indexdata.com/zebra-2.0"
+ z:id="11224466" z:rank="42">
+ <z:index name="control:0">11224466</z:index>
+ <z:index name="any:w title:w title:p title:s">
+ How to program a computer</z:index>
+ </z:record>
+ ]]>
+ </screen>
+ </para>
+ </section>
+
+
+ <section id="record-model-domxml-canonical-index-semantics">
+ <title>Semantics of the indexing formats</title>
+
+ <para>
+ Both indexing formats are defined with equal semantics and
+ behavior in mind:
+ <itemizedlist>
+ <listitem>
+ <para>&zebra; specific instructions are either
+ processing instructions named
+ <literal>zebra-2.0</literal> or
+ elements contained in the namespace
+ <literal>xmlns:z="http://indexdata.com/zebra-2.0"</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>There must be exactly one <literal>record</literal>
+ instruction, which sets the scope for the following,
+ possibly nested <literal>index</literal> instructions.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The unique <literal>record</literal> instruction
+ may have additional attributes <literal>id</literal>,
+ <literal>rank</literal> and <literal>type</literal>.
+ Attribute <literal>id</literal> is the value of the opaque ID
+ and may be any string not containing the whitespace character
+ <literal>' '</literal>.
+ The <literal>rank</literal> attribute value must be a
+ non-negative integer. See
+ <xref linkend="administration-ranking"/> .
+ The <literal>type</literal> attribute specifies how the record
+ is to be treated. The following values may be given for
+ <literal>type</literal>:
+ <variablelist>
+ <varlistentry>
+ <term><literal>insert</literal></term>
+ <listitem>
+ <para>
+ The record is inserted. If the record already exists, it is
+ skipped (i.e. not replaced).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>replace</literal></term>
+ <listitem>
+ <para>
+ The record is replaced. If the record does not already exist,
+ it is skipped (i.e. not inserted).
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>delete</literal></term>
+ <listitem>
+ <para>
+ The record is deleted. If the record does not already exist,
+ a warning issued and rest of records are skipped in
+ from the input stream.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>update</literal></term>
+ <listitem>
+ <para>
+ The record is inserted or replaced depending on whether the
+ record exists or not. This is the default behavior but may
+ be effectively changed by "outside" the scope of the DOM
+ filter by zebraidx commands or extended services updates.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>adelete</literal></term>
+ <listitem>
+ <para>
+ The record is deleted. If the record does not already exist,
+ it is skipped (i.e. nothing is deleted).
+ </para>
+ <note>
+ <para>
+ Requires version 2.0.54 or later.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ Note that the value of <literal>type</literal> is only used to
+ determine the action if and only if the Zebra indexer is running
+ in "update" mode (i.e zebraidx update) or if the specialUpdate
+ action of the
+ <link linkend="administration-extended-services-z3950">Extended
+ Service Update</link> is used.
+ For this reason a specialUpdate may end up deleting records!
+ </para>
+ </listitem>
+ <listitem>
+ <para> Multiple and possible nested <literal>index</literal>
+ instructions must contain at least one
+ <literal>indexname:indextype</literal>
+ pair, and may contain multiple such pairs separated by the
+ whitespace character <literal>' '</literal>. In each index
+ pair, the name and the type of the index is separated by a
+ colon character <literal>':'</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Any index name consisting of ASCII letters, and following the
+ standard &zebra; rules will do, see
+ <xref linkend="querymodel-pqf-apt-mapping-accesspoint"/>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Index types are restricted to the values defined in
+ the standard configuration
+ file <filename>default.idx</filename>, see
+ <xref linkend="querymodel-bib1"/> and
+ <xref linkend="fields-and-charsets"/> for details.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ &acro.dom; input documents which are not resulting in both one
+ unique valid
+ <literal>record</literal> instruction and one or more valid
+ <literal>index</literal> instructions can not be searched and
+ found. Therefore,
+ invalid document processing is aborted, and any content of
+ the <literal><extract></literal> and
+ <literal><store></literal> pipelines is discarded.
+ A warning is issued in the logs.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>The examples work as follows:
+ From the original &acro.xml; file
+ <literal>marc-one.xml</literal> (or from the &acro.xml; record &acro.dom; of the
+ same form coming from an <literal><input></literal>
+ pipeline),
+ the indexing
+ pipeline <literal><extract></literal>
+ produces an indexing &acro.xml; record, which is defined by
+ the <literal>record</literal> instruction
+ &zebra; uses the content of
+ <literal>z:id="11224466"</literal>
+ or
+ <literal>id=11224466</literal>
+ as internal
+ record ID, and - in case static ranking is set - the content of
+ <literal>rank=42</literal>
+ or
+ <literal>z:rank="42"</literal>
+ as static rank.
+ </para>
+
+
+ <para>In these examples, the following literal indexes are constructed:
+ <screen>
+ any:w
+ control:0
+ title:w
+ title:p
+ title:s
+ </screen>
+ where the indexing type is defined after the
+ literal <literal>':'</literal> character.
+ Any value from the standard configuration
+ file <filename>default.idx</filename> will do.
+ Finally, any
+ <literal>text()</literal> node content recursively contained
+ inside the <literal><z:index></literal> element, or any
+ element following a <literal>index</literal> processing instruction,
+ will be filtered through the
+ appropriate char map for character normalization, and will be
+ inserted in the named indexes.
+ </para>
+ <para>
+ Finally, this example configuration can be queried using &acro.pqf;
+ queries, either transported by &acro.z3950;, (here using a yaz-client)
+ <screen>
+ <![CDATA[
+ Z> open localhost:9999
+ Z> elem dc
+ Z> form xml
+ Z>
+ Z> find @attr 1=control @attr 4=3 11224466
+ Z> scan @attr 1=control @attr 4=3 ""
+ Z>
+ Z> find @attr 1=title program
+ Z> scan @attr 1=title ""
+ Z>
+ Z> find @attr 1=title @attr 4=2 "How to program a computer"
+ Z> scan @attr 1=title @attr 4=2 ""
+ ]]>
+ </screen>
+ or the proprietary
+ extensions <literal>x-pquery</literal> and
+ <literal>x-pScanClause</literal> to
+ &acro.sru;, and &acro.srw;
+ <screen>
+ <![CDATA[
+ http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title program
+ http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr 1=title ""
+ ]]>
+ </screen>
+ See <xref linkend="zebrasrv-sru"/> for more information on &acro.sru;/&acro.srw;
+ configuration, and <xref linkend="gfs-config"/> or the &yaz;
+ <ulink url="&url.yaz.cql;">&acro.cql; section</ulink>
+ for the details or the &yaz; frontend server.
+ </para>
+ <para>
+ Notice that there are no <filename>*.abs</filename>,
+ <filename>*.est</filename>, <filename>*.map</filename>, or other &acro.grs1;
+ filter configuration files involves in this process, and that the
+ literal index names are used during search and retrieval.
+ </para>
+ <para>
+ In case that we want to support the usual
+ <literal>bib-1</literal> &acro.z3950; numeric access points, it is a
+ good idea to choose string index names defined in the default
+ configuration file <filename>tab/bib1.att</filename>, see
+ <xref linkend="attset-files"/>
+ </para>
+
+ </section>
+