Allow to NSIS build without html doc (but warn about it)

[idzebra-moved-to-github.git] / doc / architecture.xml
diff --git a/doc/architecture.xml b/doc/architecture.xml

index 6ac71dc..ce39c22 100644 (file)
--- a/doc/architecture.xml
+++ b/doc/architecture.xml
@@ -1,41 +1,39 @@
   <chapter id="architecture">
   <chapter id="architecture">
-  <!-- $Id: architecture.xml,v 1.7 2006-04-24 12:53:03 marc Exp $ -->
-  <title>Overview of Zebra Architecture</title>
-  
+  <title>Overview of &zebra; Architecture</title>
  
  
-  <sect1 id="architecture-representation">
+  <section id="architecture-representation">
     <title>Local Representation</title>
  
     <para>
     <title>Local Representation</title>
  
     <para>
-    As mentioned earlier, Zebra places few restrictions on the type of
+    As mentioned earlier, &zebra; places few restrictions on the type of
      data that you can index and manage. Generally, whatever the form of
      the data, it is parsed by an input filter specific to that format, and
      data that you can index and manage. Generally, whatever the form of
      the data, it is parsed by an input filter specific to that format, and
-    turned into an internal structure that Zebra knows how to handle. This
+    turned into an internal structure that &zebra; knows how to handle. This
      process takes place whenever the record is accessed - for indexing and
      retrieval.
     </para>
  
     <para>
      The RecordType parameter in the <literal>zebra.cfg</literal> file, or
      process takes place whenever the record is accessed - for indexing and
      retrieval.
     </para>
  
     <para>
      The RecordType parameter in the <literal>zebra.cfg</literal> file, or
-    the <literal>-t</literal> option to the indexer tells Zebra how to
+    the <literal>-t</literal> option to the indexer tells &zebra; how to
      process input records.
      Two basic types of processing are available - raw text and structured
      data. Raw text is just that, and it is selected by providing the
      process input records.
      Two basic types of processing are available - raw text and structured
      data. Raw text is just that, and it is selected by providing the
-    argument <emphasis>text</emphasis> to Zebra. Structured records are
+    argument <emphasis>text</emphasis> to &zebra;. Structured records are
      all handled internally using the basic mechanisms described in the
      subsequent sections.
      all handled internally using the basic mechanisms described in the
      subsequent sections.
-    Zebra can read structured records in many different formats.
+    &zebra; can read structured records in many different formats.
      <!--
      How this is done is governed by additional parameters after the
      "grs" keyword, separated by "." characters.
      -->
     </para>
      <!--
      How this is done is governed by additional parameters after the
      "grs" keyword, separated by "." characters.
      -->
     </para>
-  </sect1>
+  </section>
  
  
-  <sect1 id="architecture-maincomponents">
+  <section id="architecture-maincomponents">
     <title>Main Components</title>
     <para>
     <title>Main Components</title>
     <para>
-    The Zebra system is designed to support a wide range of data management
+    The &zebra; system is designed to support a wide range of data management
      applications. The system can be configured to handle virtually any
      kind of structured data. Each record in the system is associated with
      a <emphasis>record schema</emphasis> which lends context to the data
      applications. The system can be configured to handle virtually any
      kind of structured data. Each record in the system is associated with
      a <emphasis>record schema</emphasis> which lends context to the data
@@ -45,40 +43,40 @@
      one database, the system poses no such restrictions.
     </para>
     <para>
      one database, the system poses no such restrictions.
     </para>
     <para>
-    The Zebra indexer and information retrieval server consists of the
+    The &zebra; indexer and information retrieval server consists of the
      following main applications: the <command>zebraidx</command>
      indexing maintenance utility, and the <command>zebrasrv</command>
      information query and retrieval server. Both are using some of the
      same main components, which are presented here.
      following main applications: the <command>zebraidx</command>
      indexing maintenance utility, and the <command>zebrasrv</command>
      information query and retrieval server. Both are using some of the
      same main components, which are presented here.
-   </para>    
-   <para>    
-    The virtual Debian package <literal>idzebra1.4</literal>
+   </para>
+   <para>
+    The virtual Debian package <literal>idzebra-2.0</literal>
      installs all the necessary packages to start
      installs all the necessary packages to start
-    working with Zebra - including utility programs, development libraries,
-    documentation and modules. 
-  </para>    
-   
-   <sect2 id="componentcore">
-    <title>Core Zebra Libraries Containing Common Functionality</title>
+    working with &zebra; - including utility programs, development libraries,
+    documentation and modules.
+  </para>
+
+   <section id="componentcore">
+    <title>Core &zebra; Libraries Containing Common Functionality</title>
      <para>
      <para>
-     The core Zebra module is the meat of the <command>zebraidx</command>
+     The core &zebra; module is the meat of the <command>zebraidx</command>
      indexing maintenance utility, and the <command>zebrasrv</command>
      information query and retrieval server binaries. Shortly, the core
      indexing maintenance utility, and the <command>zebrasrv</command>
      information query and retrieval server binaries. Shortly, the core
-    libraries are responsible for  
+    libraries are responsible for
       <variablelist>
        <varlistentry>
         <term>Dynamic Loading</term>
         <listitem>
          <para>of external filter modules, in case the application is
          not compiled statically. These filter modules define indexing,
       <variablelist>
        <varlistentry>
         <term>Dynamic Loading</term>
         <listitem>
          <para>of external filter modules, in case the application is
          not compiled statically. These filter modules define indexing,
-        search and retrieval capabilities of the various input formats.  
+        search and retrieval capabilities of the various input formats.
          </para>
         </listitem>
        </varlistentry>
        <varlistentry>
         <term>Index Maintenance</term>
         <listitem>
          </para>
         </listitem>
        </varlistentry>
        <varlistentry>
         <term>Index Maintenance</term>
         <listitem>
-        <para> Zebra maintains Term Dictionaries and ISAM index
+        <para> &zebra; maintains Term Dictionaries and ISAM index
          entries in inverted index structures kept on disk. These are
          optimized for fast inset, update and delete, as well as good
          search performance.
          entries in inverted index structures kept on disk. These are
          optimized for fast inset, update and delete, as well as good
          search performance.
@@ -88,13 +86,13 @@
        <varlistentry>
         <term>Search Evaluation</term>
         <listitem>
        <varlistentry>
         <term>Search Evaluation</term>
         <listitem>
-        <para>by execution of search requests expressed in PQF/RPN
+        <para>by execution of search requests expressed in &acro.pqf;/&acro.rpn;
           data structures, which are handed over from
           data structures, which are handed over from
-         the YAZ server frontend API. Search evaluation includes
+         the &yaz; server frontend &acro.api;. Search evaluation includes
           construction of hit lists according to boolean combinations
           of simpler searches. Fast performance is achieved by careful
           use of index structures, and by evaluation specific index hit
           construction of hit lists according to boolean combinations
           of simpler searches. Fast performance is achieved by careful
           use of index structures, and by evaluation specific index hit
-         lists in correct order. 
+         lists in correct order.
          </para>
         </listitem>
        </varlistentry>
          </para>
         </listitem>
        </varlistentry>
@@ -105,7 +103,7 @@
           components call resorting/re-ranking algorithms on the hit
           sets. These might also be pre-sorted not only using the
           assigned document ID's, but also using assigned static rank
           components call resorting/re-ranking algorithms on the hit
           sets. These might also be pre-sorted not only using the
           assigned document ID's, but also using assigned static rank
-         information. 
+         information.
          </para>
         </listitem>
        </varlistentry>
          </para>
         </listitem>
        </varlistentry>
@@ -113,7 +111,7 @@
         <term>Record Presentation</term>
         <listitem>
          <para>returns - possibly ranked - result sets, hit
         <term>Record Presentation</term>
         <listitem>
          <para>returns - possibly ranked - result sets, hit
-         numbers, and the like internal data to the YAZ server backend API
+         numbers, and the like internal data to the &yaz; server backend &acro.api;
           for shipping to the client. Each individual filter module
           implements it's own specific presentation formats.
          </para>
           for shipping to the client. Each individual filter module
           implements it's own specific presentation formats.
          </para>
@@ -121,215 +119,266 @@
        </varlistentry>
       </variablelist>
       </para>
        </varlistentry>
       </variablelist>
       </para>
-    <para> 
-     The Debian package <literal>libidzebra1.4</literal> 
-     contains all run-time libraries for Zebra, the 
-     documentation in PDF and HTML is found in 
-     <literal>idzebra1.4-doc</literal>, and
-     <literal>idzebra1.4-common</literal>
-     includes common essential Zebra configuration files.
+    <para>
+     The Debian package <literal>libidzebra-2.0</literal>
+     contains all run-time libraries for &zebra;, the
+     documentation in PDF and HTML is found in
+     <literal>idzebra-2.0-doc</literal>, and
+     <literal>idzebra-2.0-common</literal>
+     includes common essential &zebra; configuration files.
      </para>
      </para>
-   </sect2>
-   
+   </section>
+
  
  
-   <sect2 id="componentindexer">
-    <title>Zebra Indexer</title>
+   <section id="componentindexer">
+    <title>&zebra; Indexer</title>
      <para>
       The  <command>zebraidx</command>
      <para>
       The  <command>zebraidx</command>
-     indexing maintenance utility 
+     indexing maintenance utility
       loads external filter modules used for indexing data records of
       different type, and creates, updates and drops databases and
       indexes according to the rules defined in the filter modules.
       loads external filter modules used for indexing data records of
       different type, and creates, updates and drops databases and
       indexes according to the rules defined in the filter modules.
-    </para>    
-    <para>    
-     The Debian  package <literal>idzebra1.4-utils</literal> contains
+    </para>
+    <para>
+     The Debian  package <literal>idzebra-2.0-utils</literal> contains
       the  <command>zebraidx</command> utility.
      </para>
       the  <command>zebraidx</command> utility.
      </para>
-   </sect2>
+   </section>
  
  
-   <sect2 id="componentsearcher">
-    <title>Zebra Searcher/Retriever</title>
+   <section id="componentsearcher">
+    <title>&zebra; Searcher/Retriever</title>
      <para>
      <para>
-     This is the executable which runs the Z39.50/SRU/SRW server and
+     This is the executable which runs the &acro.z3950;/&acro.sru;/&acro.srw; server and
       glues together the core libraries and the filter modules to one
       glues together the core libraries and the filter modules to one
-     great Information Retrieval server application. 
-    </para>    
-    <para>    
-     The Debian  package <literal>idzebra1.4-utils</literal> contains
+     great Information Retrieval server application.
+    </para>
+    <para>
+     The Debian  package <literal>idzebra-2.0-utils</literal> contains
       the  <command>zebrasrv</command> utility.
      </para>
       the  <command>zebrasrv</command> utility.
      </para>
-   </sect2>
+   </section>
  
  
-   <sect2 id="componentyazserver">
-    <title>YAZ Server Frontend</title>
+   <section id="componentyazserver">
+    <title>&yaz; Server Frontend</title>
      <para>
      <para>
-     The YAZ server frontend is 
-     a full fledged stateful Z39.50 server taking client
-     connections, and forwarding search and scan requests to the 
-     Zebra core indexer.
+     The &yaz; server frontend is
+     a full fledged stateful &acro.z3950; server taking client
+     connections, and forwarding search and scan requests to the
+     &zebra; core indexer.
      </para>
      <para>
      </para>
      <para>
-     In addition to Z39.50 requests, the YAZ server frontend acts
+     In addition to &acro.z3950; requests, the &yaz; server frontend acts
       as HTTP server, honoring
       as HTTP server, honoring
-      <ulink url="http://www.loc.gov/standards/sru/srw/">SRW</ulink> 
-     SOAP requests, and  
-     <ulink url="http://www.loc.gov/standards/sru/">SRU</ulink> 
-     REST requests. Moreover, it can
-     translate incoming 
-     <ulink url="http://www.loc.gov/standards/sru/cql/">CQL</ulink>
+      <ulink url="&url.sru;">&acro.sru; &acro.soap;</ulink>
+     requests, and
+     &acro.sru; &acro.rest;
+     requests. Moreover, it can
+     translate incoming
+     <ulink url="&url.cql;">&acro.cql;</ulink>
       queries to
       queries to
-     <ulink url="http://indexdata.com/yaz/doc/tools.tkl#PQF">PQF</ulink>
+     <ulink url="&url.yaz.pqf;">&acro.pqf;</ulink>
        queries, if
        queries, if
-     correctly configured. 
+     correctly configured.
      </para>
      <para>
      </para>
      <para>
-     <ulink url="http://www.indexdata.com/yaz">YAZ</ulink>
-     is an Open Source  
+     <ulink url="&url.yaz;">&yaz;</ulink>
+     is an Open Source
       toolkit that allows you to develop software using the
       toolkit that allows you to develop software using the
-     ANSI Z39.50/ISO23950 standard for information retrieval.
-     It is packaged in the Debian packages     
+     &acro.ansi; &acro.z3950;/ISO23950 standard for information retrieval.
+     It is packaged in the Debian packages
       <literal>yaz</literal> and <literal>libyaz</literal>.
      </para>
       <literal>yaz</literal> and <literal>libyaz</literal>.
      </para>
-   </sect2>
-   
-   <sect2 id="componentmodules">
+   </section>
+
+   <section id="componentmodules">
      <title>Record Models and Filter Modules</title>
      <para>
      <title>Record Models and Filter Modules</title>
      <para>
-     The hard work of knowing <emphasis>what</emphasis> to index, 
+     The hard work of knowing <emphasis>what</emphasis> to index,
       <emphasis>how</emphasis> to do it, and <emphasis>which</emphasis>
       part of the records to send in a search/retrieve response is
       <emphasis>how</emphasis> to do it, and <emphasis>which</emphasis>
       part of the records to send in a search/retrieve response is
-     implemented in 
+     implemented in
       various filter modules. It is their responsibility to define the
       exact indexing and record display filtering rules.
       </para>
       <para>
       The virtual Debian package
       various filter modules. It is their responsibility to define the
       exact indexing and record display filtering rules.
       </para>
       <para>
       The virtual Debian package
-     <literal>libidzebra1.4-modules</literal> installs all base filter
-     modules. 
-    </para>
-
-   <sect3 id="componentmodulestext">
-    <title>TEXT Record Model and Filter Module</title>
-    <para>
-      Plain ASCII text filter. TODO: add information here.
-     <!--
-     <literal>text module missing as deb file<literal>
-     -->
+     <literal>libidzebra-2.0-modules</literal> installs all base filter
+     modules.
      </para>
      </para>
-   </sect3>
  
  
-   <sect3 id="componentmodulesgrs">
-    <title>GRS Record Model and Filter Modules</title>
-    <para>
-    The GRS filter modules described in 
-    <xref linkend="record-model-grs"/>
-    are all based on the Z39.50 specifications, and it is absolutely
-    mandatory to have the reference pages on BIB-1 attribute sets on
-    you hand when configuring GRS filters. The GRS filters come in
-    different flavors, and a short introduction is needed here.
-    GRS filters of various kind have also been called ABS filters due
-    to the <filename>*.abs</filename> configuration file suffix.
+   <section id="componentmodulesdom">
+    <title>&acro.dom; &acro.xml; Record Model and Filter Module</title>
+     <para>
+      The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as
+      internal data model, and can thus parse, index, and display
+      any &acro.xml; document.
      </para>
      <para>
      </para>
      <para>
-     The <emphasis>grs.danbib</emphasis> filter is developed for 
-      DBC DanBib records.
-      DanBib is the Danish Union Catalogue hosted by DBC
-      (Danish Bibliographic Center). This filter is found in the
-      Debian package
-     <literal>libidzebra1.4-mod-grs-danbib</literal>.
+      A parser for binary &acro.marc; records based on the ISO2709 library
+      standard is provided, it transforms these to the internal
+      &acro.marcxml; &acro.dom; representation.
      </para>
      <para>
      </para>
      <para>
-      The <emphasis>grs.marc</emphasis> and 
-      <emphasis>grs.marcxml</emphasis> filters are suited to parse and
-      index binary and XML versions of traditional library MARC records 
-      based on the ISO2709 standard. The Debian package for both
-      filters is 
-     <literal>libidzebra1.4-mod-grs-marc</literal>.
+      The internal &acro.dom; &acro.xml; representation can be fed into four
+      different pipelines, consisting of arbitrarily many successive
+      &acro.xslt; transformations; these are for
+     <itemizedlist>
+       <listitem><para>input parsing and initial
+          transformations,</para></listitem>
+       <listitem><para>indexing term extraction
+          transformations</para></listitem>
+       <listitem><para>transformations before internal document
+          storage, and </para></listitem>
+       <listitem><para>retrieve transformations from storage to output
+          format</para></listitem>
+      </itemizedlist>
      </para>
      <para>
      </para>
      <para>
-      GRS TCL scriptable filters for extensive user configuration come
-     in two flavors: a regular expression filter 
-     <emphasis>grs.regx</emphasis> using TCL regular expressions, and
-     a general scriptable TCL filter called 
-     <emphasis>grs.tcl</emphasis>        
-     are both included in the 
-     <literal>libidzebra1.4-mod-grs-regx</literal> Debian package.
+      The &acro.dom; &acro.xml; filter pipelines use &acro.xslt; (and if  supported on
+      your platform, even &acro.exslt;), it brings thus full &acro.xpath;
+      support to the indexing, storage and display rules of not only
+      &acro.xml; documents, but also binary &acro.marc; records.
      </para>
      <para>
      </para>
      <para>
-      A general purpose SGML filter is called
-     <emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
-     but planned to be in the  
-     <literal>libidzebra1.4-mod-grs-sgml</literal> Debian package.
+      Finally, the &acro.dom; &acro.xml; filter allows for static ranking at index
+      time, and to to sort hit lists according to predefined
+      static ranks.
      </para>
      <para>
      </para>
      <para>
-      The Debian  package 
-      <literal>libidzebra1.4-mod-grs-xml</literal> includes the 
-      <emphasis>grs.xml</emphasis> filter which uses <ulink
-      url="http://expat.sourceforge.net/">Expat</ulink> to 
-      parse records in XML and turn them into IDZebra's internal GRS node
-      trees. Have also a look at the Alvis XML/XSLT filter described in
-      the next session.
-    </para>
-   </sect3>
+      Details on the experimental &acro.dom; &acro.xml; filter are found in
+      <xref linkend="record-model-domxml"/>.
+      </para>
+     <para>
+      The Debian package <literal>libidzebra-2.0-mod-dom</literal>
+      contains the &acro.dom; filter module.
+     </para>
+    </section>
+
+   <section id="componentmodulesalvis">
+    <title>ALVIS &acro.xml; Record Model and Filter Module</title>
+     <note>
+      <para>
+        The functionality of this record model has been improved and
+        replaced by the &acro.dom; &acro.xml; record model. See
+        <xref linkend="componentmodulesdom"/>.
+      </para>
+     </note>
  
  
-   <sect3 id="componentmodulesalvis">
-    <title>ALVIS Record Model and Filter Module</title>
       <para>
       <para>
-      The Alvis filter for XML files is an XSLT based input
-      filter. 
-      It indexes element and attribute content of any thinkable XML format
-      using full XPATH support, a feature which the standard Zebra
-      GRS SGML and XML filters lacked. The indexed documents are
-      parsed into a standard XML DOM tree, which restricts record size
+      The Alvis filter for &acro.xml; files is an &acro.xslt; based input
+      filter.
+      It indexes element and attribute content of any thinkable &acro.xml; format
+      using full &acro.xpath; support, a feature which the standard &zebra;
+      &acro.grs1; &acro.sgml; and &acro.xml; filters lacked. The indexed documents are
+      parsed into a standard &acro.xml; &acro.dom; tree, which restricts record size
        according to availability of memory.
      </para>
      <para>
        according to availability of memory.
      </para>
      <para>
-      The Alvis filter 
-      uses XSLT display stylesheets, which let
-      the Zebra DB administrator associate multiple, different views on
-      the same XML document type. These views are chosen on-the-fly in
+      The Alvis filter
+      uses &acro.xslt; display stylesheets, which let
+      the &zebra; DB administrator associate multiple, different views on
+      the same &acro.xml; document type. These views are chosen on-the-fly in
        search time.
       </para>
      <para>
        In addition, the Alvis filter configuration is not bound to the
        search time.
       </para>
      <para>
        In addition, the Alvis filter configuration is not bound to the
-      arcane  BIB-1 Z39.50 library catalogue indexing traditions and
+      arcane  &acro.bib1; &acro.z3950; library catalogue indexing traditions and
        folklore, and is therefore easier to understand.
      </para>
      <para>
        Finally, the Alvis  filter allows for static ranking at index
        time, and to to sort hit lists according to predefined
        static ranks. This imposes no overhead at all, both
        folklore, and is therefore easier to understand.
      </para>
      <para>
        Finally, the Alvis  filter allows for static ranking at index
        time, and to to sort hit lists according to predefined
        static ranks. This imposes no overhead at all, both
-      search and indexing perform still 
+      search and indexing perform still
        <emphasis>O(1)</emphasis> irrespectively of document
        <emphasis>O(1)</emphasis> irrespectively of document
-      collection size. This feature resembles Googles pre-ranking using
-      their Pagerank algorithm.
+      collection size. This feature resembles Google's pre-ranking using
+      their PageRank algorithm.
      </para>
      <para>
      </para>
      <para>
-      Details on the experimental Alvis XSLT filter are found in 
+      Details on the experimental Alvis &acro.xslt; filter are found in
        <xref linkend="record-model-alvisxslt"/>.
        </para>
       <para>
        <xref linkend="record-model-alvisxslt"/>.
        </para>
       <para>
-      The Debian package <literal>libidzebra1.4-mod-alvis</literal>
+      The Debian package <literal>libidzebra-2.0-mod-alvis</literal>
        contains the Alvis filter module.
       </para>
        contains the Alvis filter module.
       </para>
-    </sect3>
+    </section>
+
+   <section id="componentmodulesgrs">
+    <title>&acro.grs1; Record Model and Filter Modules</title>
+     <note>
+      <para>
+        The functionality of this record model has been improved and
+        replaced by the &acro.dom; &acro.xml; record model. See
+        <xref linkend="componentmodulesdom"/>.
+      </para>
+     </note>
+    <para>
+    The &acro.grs1; filter modules described in
+    <xref linkend="grs"/>
+    are all based on the &acro.z3950; specifications, and it is absolutely
+    mandatory to have the reference pages on &acro.bib1; attribute sets on
+    you hand when configuring &acro.grs1; filters. The GRS filters come in
+    different flavors, and a short introduction is needed here.
+    &acro.grs1; filters of various kind have also been called ABS filters due
+    to the <filename>*.abs</filename> configuration file suffix.
+    </para>
+    <para>
+      The <emphasis>grs.marc</emphasis> and
+      <emphasis>grs.marcxml</emphasis> filters are suited to parse and
+      index binary and &acro.xml; versions of traditional library &acro.marc; records
+      based on the ISO2709 standard. The Debian package for both
+      filters is
+     <literal>libidzebra-2.0-mod-grs-marc</literal>.
+    </para>
+    <para>
+      &acro.grs1; TCL scriptable filters for extensive user configuration come
+     in two flavors: a regular expression filter
+     <emphasis>grs.regx</emphasis> using TCL regular expressions, and
+     a general scriptable TCL filter called
+     <emphasis>grs.tcl</emphasis>
+     are both included in the
+     <literal>libidzebra-2.0-mod-grs-regx</literal> Debian package.
+    </para>
+    <para>
+      A general purpose &acro.sgml; filter is called
+     <emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
+     but planned to be in the
+     <literal>libidzebra-2.0-mod-grs-sgml</literal> Debian package.
+    </para>
+    <para>
+      The Debian  package
+      <literal>libidzebra-2.0-mod-grs-xml</literal> includes the
+      <emphasis>grs.xml</emphasis> filter which uses <ulink
+      url="&url.expat;">Expat</ulink> to
+      parse records in &acro.xml; and turn them into ID&zebra;'s internal &acro.grs1; node
+      trees. Have also a look at the Alvis &acro.xml;/&acro.xslt; filter described in
+      the next session.
+    </para>
+   </section>
  
  
-   <sect3 id="componentmodulessafari">
+   <section id="componentmodulestext">
+    <title>TEXT Record Model and Filter Module</title>
+    <para>
+      Plain ASCII text filter. TODO: add information here.
+    </para>
+   </section>
+
+    <!--
+   <section id="componentmodulessafari">
      <title>SAFARI Record Model and Filter Module</title>
      <para>
       SAFARI filter module TODO: add information here.
      <title>SAFARI Record Model and Filter Module</title>
      <para>
       SAFARI filter module TODO: add information here.
-     <!--
-     <literal>safari module missing as deb file<literal>
-     -->
      </para>
      </para>
-   </sect3>
+   </section>
+    -->
  
  
-   </sect2>
+   </section>
  
  
-  </sect1>
+  </section>
  
  
  
  
-  <sect1 id="architecture-workflow">
+  <section id="architecture-workflow">
     <title>Indexing and Retrieval Workflow</title>
  
    <para>
     <title>Indexing and Retrieval Workflow</title>
  
    <para>
@@ -341,11 +390,11 @@
  
     <itemizedlist>
      <listitem>
  
     <itemizedlist>
      <listitem>
-     
+
       <para>
        When records are accessed by the system, they are represented
       <para>
        When records are accessed by the system, they are represented
-      in their local, or native format. This might be SGML or HTML files,
-      News or Mail archives, MARC records. If the system doesn't already
+      in their local, or native format. This might be &acro.sgml; or HTML files,
+      News or Mail archives, &acro.marc; records. If the system doesn't already
        know how to read the type of data you need to store, you can set up an
        input filter by preparing conversion rules based on regular
        expressions and possibly augmented by a flexible scripting language
        know how to read the type of data you need to store, you can set up an
        input filter by preparing conversion rules based on regular
        expressions and possibly augmented by a flexible scripting language
@@ -372,124 +421,209 @@
       <para>
        Before transmitting records to the client, they are first
        converted from the internal structure to a form suitable for exchange
       <para>
        Before transmitting records to the client, they are first
        converted from the internal structure to a form suitable for exchange
-      over the network - according to the Z39.50 standard.
+      over the network - according to the &acro.z3950; standard.
       </para>
      </listitem>
  
     </itemizedlist>
  
    </para>
       </para>
      </listitem>
  
     </itemizedlist>
  
    </para>
-  </sect1>
+  </section>
  
  
-
-<!--
-  <sect1 id="architecture-querylanguage">
-   <title>Query Languages</title>
-   
+  <section id="special-retrieval">
+   <title>Retrieval of &zebra; internal record data</title>
     <para>
     <para>
-
-http://www.loc.gov/z3950/agency/document.html
-
-    PQF and BIB-1 stuff to be explained
-    <ulink url="http://www.loc.gov/z3950/agency/defns/bib1.html">
-     http://www.loc.gov/z3950/agency/defns/bib1.html</ulink> 
-
-     <ulink url="http://www.loc.gov/z3950/agency/bib1.html">
-     http://www.loc.gov/z3950/agency/bib1.html</ulink> 
-
-     http://www.loc.gov/z3950/agency/markup/13.html
-    
-  </para>
-  </sect1>
-
-
-These attribute types are recognized regardless of attribute set. Some are recognized for search, others for scan.
-
-Search
-
-Type   Name    Version
-7      Embedded Sort   1.1
-8      Term Set        1.1
-9      Rank weight     1.1
-9      Approx Limit    1.4
-10     Term Ref        1.4
-
-Embedded Sort
-
-The embedded sort is a way to specify sort within a query - thus removing the need to send a Sort Request separately. It is both faster and does not require clients that deal with the Sort Facility.
-
-The value after attribute type 7 is 1=ascending, 2=descending.. The attributes+term (APT) node is separate from the rest and must be @or'ed. The term associated with APT is the level .. 0=primary sort, 1=secondary sort etc.. Example:
-
-Search for water, sort by title (ascending):
-
-  @or @attr 1=1016 water @attr 7=1 @attr 1=4 0
-
-Search for water, sort by title ascending, then date descending:
-
-  @or @or @attr 1=1016 water @attr 7=1 @attr 1=4 0 @attr 7=2 @attr 1=30 1
-
-Term Set
-
-The Term Set feature is a facility that allows a search to store hitting terms in a "pseudo" resultset; thus a search (as usual) + a scan-like facility. Requires a client that can do named result sets since the search generates two result sets. The value for attribute 8 is the name of a result set (string). The terms in term set are returned as SUTRS records.
-
-Seach for u in title, right truncated.. Store result in result set named uset.
-
-  @attr 5=1 @attr 1=4 @attr 8=uset u
-
-The model as one serious flaw.. We don't know the size of term set.
-
-Rank weight
-
-Rank weight is a way to pass a value to a ranking algorithm - so that one APT has one value - while another as a different one.
-
-Search for utah in title with weight 30 as well as any with weight 20.
-
-  @attr 2=102 @or @attr 9=30 @attr 1=4 utah @attr 9=20 utah
-
-Approx Limit
-
-Newer Zebra versions normally estemiates hit count for every APT (leaf) in the query tree. These hit counts are returned as part of the searchResult-1 facility.
-
-By setting a limit for the APT we can make Zebra turn into approximate hit count when a certain hit count limit is reached. A value of zero means exact hit count.
-
-We are intersted in exact hit count for a, but for b we allow estimates for 1000 and higher..
-
-  @and a @attr 9=1000 b
-
-This facility clashes with rank weight! Fortunately this is a Zebra 1.4 thing so we can change this without upsetting anybody!
-
-Term Ref
-
-Zebra supports the searchResult-1 facility.
-
-If attribute 10 is given, that specifies a subqueryId value returned as part of the search result. It is a way for a client to name an APT part of a query.
-
-Scan
-
-Type   Name    Version
-8      Result set narrow       1.3
-9      Approx Limit    1.4
-
-Result set narrow
-
-If attribute 8 is given for scan, the value is the name of a result set. Each hit count in scan is @and'ed with the result set given.
-
-Approx limit
-
-The approx (as for search) is a way to enable approx hit counts for scan hit counts. However, it does NOT appear to work at the moment.
-
-
- AdamDickmeiss - 19 Dec 2005
-
-
--->
-
-
- </chapter> 
-
- <!-- Keep this Emacs mode comment at the end of the file
-Local variables:
-mode: nxml
-End:
--->
-
+    Starting with <literal>&zebra;</literal> version 2.0.5 or newer, it is
+    possible to use a special element set which has the prefix
+    <literal>zebra::</literal>.
+   </para>
+   <para>
+    Using this element will, regardless of record type, return
+    &zebra;'s internal index structure/data for a record.
+    In particular, the regular record filters are not invoked when
+    these are in use.
+    This can in some cases make the retrieval faster than regular
+    retrieval operations (for &acro.marc;, &acro.xml; etc).
+   </para>
+   <table id="special-retrieval-types">
+    <title>Special Retrieval Elements</title>
+    <tgroup cols="2">
+     <thead>
+      <row>
+       <entry>Element Set</entry>
+       <entry>Description</entry>
+       <entry>Syntax</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry><literal>zebra::meta::sysno</literal></entry>
+       <entry>Get &zebra; record system ID</entry>
+       <entry>&acro.xml; and &acro.sutrs;</entry>
+      </row>
+      <row>
+       <entry><literal>zebra::data</literal></entry>
+       <entry>Get raw record</entry>
+       <entry>all</entry>
+      </row>
+      <row>
+       <entry><literal>zebra::meta</literal></entry>
+       <entry>Get &zebra; record internal metadata</entry>
+       <entry>&acro.xml; and &acro.sutrs;</entry>
+      </row>
+      <row>
+       <entry><literal>zebra::index</literal></entry>
+       <entry>Get all indexed keys for record</entry>
+       <entry>&acro.xml; and &acro.sutrs;</entry>
+      </row>
+      <row>
+       <entry>
+       <literal>zebra::index::</literal><replaceable>f</replaceable>
+       </entry>
+       <entry>
+       Get indexed keys for field <replaceable>f</replaceable> for record
+       </entry>
+       <entry>&acro.xml; and &acro.sutrs;</entry>
+      </row>
+      <row>
+       <entry>
+       <literal>zebra::index::</literal><replaceable>f</replaceable>:<replaceable>t</replaceable>
+       </entry>
+       <entry>
+       Get indexed keys for field <replaceable>f</replaceable>
+         and type <replaceable>t</replaceable> for record
+       </entry>
+       <entry>&acro.xml; and &acro.sutrs;</entry>
+      </row>
+      <row>
+       <entry>
+       <literal>zebra::snippet</literal>
+       </entry>
+       <entry>
+       Get snippet for record for one or more indexes (f1,f2,..).
+       This includes a phrase from the original
+       record at the point where a match occurs (for a query). By default
+       give terms before - and after are included in the snippet. The
+       matching terms are enclosed within element
+       <literal>&lt;s&gt;</literal>. The snippet facility requires
+       Zebra 2.0.16 or later.
+       </entry>
+       <entry>&acro.xml; and &acro.sutrs;</entry>
+      </row>
+      <row>
+       <entry>
+       <literal>zebra::facet::</literal><replaceable>f1</replaceable>:<replaceable>t1</replaceable>,<replaceable>f2</replaceable>:<replaceable>t2</replaceable>,..
+       </entry>
+       <entry>
+       Get facet of a result set. The facet result is returned
+       as if it was a normal record, while in reality is a
+       recap of most "important" terms in a result set for the fields
+       given.
+       The facet facility first appeared in Zebra 2.0.20.
+       </entry>
+       <entry>&acro.xml;</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+   <para>
+    For example, to fetch the raw binary record data stored in the
+    zebra internal storage, or on the filesystem, the following
+    commands can be issued:
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::data
+      Z> s 1+1
+      Z> format sutrs
+      Z> s 1+1
+      Z> format usmarc
+      Z> s 1+1
+    </screen>
+    </para>
+   <para>
+    The special
+    <literal>zebra::data</literal> element set name is
+    defined for any record syntax, but will always fetch
+    the raw record data in exactly the original form. No record syntax
+    specific transformations will be applied to the raw record data.
+   </para>
+   <para>
+    Also, &zebra; internal metadata about the record can be accessed:
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::meta::sysno
+      Z> s 1+1
+    </screen>
+    displays in <literal>&acro.xml;</literal> record syntax only internal
+    record system number, whereas
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::meta
+      Z> s 1+1
+    </screen>
+    displays all available metadata on the record. These include system
+    number, database name,  indexed filename,  filter used for indexing,
+    score and static ranking information and finally bytesize of record.
+   </para>
+   <para>
+    Sometimes, it is very hard to figure out what exactly has been
+    indexed how and in which indexes. Using the indexing stylesheet of
+    the Alvis filter, one can at least see which portion of the record
+    went into which index, but a similar aid does not exist for all
+    other indexing filters.
+   </para>
+   <para>
+    The special
+    <literal>zebra::index</literal> element set names are provided to
+    access information on per record indexed fields. For example, the
+    queries
+    <screen>
+      Z> f @attr 1=title my
+      Z> format sutrs
+      Z> elements zebra::index
+      Z> s 1+1
+    </screen>
+    will display all indexed tokens from all indexed fields of the
+    first record, and it will display in <literal>&acro.sutrs;</literal>
+    record syntax, whereas
+    <screen>
+      Z> f @attr 1=title my
+      Z> format xml
+      Z> elements zebra::index::title
+      Z> s 1+1
+      Z> elements zebra::index::title:p
+      Z> s 1+1
+    </screen>
+    displays in <literal>&acro.xml;</literal> record syntax only the content
+      of the zebra string index <literal>title</literal>, or
+      even only the type <literal>p</literal> phrase indexed part of it.
+   </para>
+   <note>
+    <para>
+     Trying to access numeric <literal>&acro.bib1;</literal> use
+     attributes or trying to access non-existent zebra intern string
+     access points will result in a Diagnostic 25: Specified element set
+     'name not valid for specified database.
+    </para>
+   </note>
+  </section>
+
+ </chapter>
+
+ <!-- Keep this comment at the end of the file
+ Local variables:
+ mode: sgml
+ sgml-omittag:t
+ sgml-shorttag:t
+ sgml-minimize-attributes:nil
+ sgml-always-quote-attributes:t
+ sgml-indent-step:1
+ sgml-indent-data:t
+ sgml-parent-document: "idzebra.xml"
+ sgml-local-catalogs: nil
+ sgml-namecase-general:t
+ End:
+ -->