<para>
As mentioned earlier, &zebra; places few restrictions on the type of
data that you can index and manage. Generally, whatever the form of
<para>
As mentioned earlier, &zebra; places few restrictions on the type of
data that you can index and manage. Generally, whatever the form of
indexing maintenance utility, and the <command>zebrasrv</command>
information query and retrieval server. Both are using some of the
same main components, which are presented here.
indexing maintenance utility, and the <command>zebrasrv</command>
information query and retrieval server. Both are using some of the
same main components, which are presented here.
The virtual Debian package <literal>idzebra-2.0</literal>
installs all the necessary packages to start
working with &zebra; - including utility programs, development libraries,
The virtual Debian package <literal>idzebra-2.0</literal>
installs all the necessary packages to start
working with &zebra; - including utility programs, development libraries,
<section id="componentcore">
<title>Core &zebra; Libraries Containing Common Functionality</title>
<para>
The core &zebra; module is the meat of the <command>zebraidx</command>
indexing maintenance utility, and the <command>zebrasrv</command>
information query and retrieval server binaries. Shortly, the core
<section id="componentcore">
<title>Core &zebra; Libraries Containing Common Functionality</title>
<para>
The core &zebra; module is the meat of the <command>zebraidx</command>
indexing maintenance utility, and the <command>zebrasrv</command>
information query and retrieval server binaries. Shortly, the core
<variablelist>
<varlistentry>
<term>Dynamic Loading</term>
<listitem>
<para>of external filter modules, in case the application is
not compiled statically. These filter modules define indexing,
<variablelist>
<varlistentry>
<term>Dynamic Loading</term>
<listitem>
<para>of external filter modules, in case the application is
not compiled statically. These filter modules define indexing,
construction of hit lists according to boolean combinations
of simpler searches. Fast performance is achieved by careful
use of index structures, and by evaluation specific index hit
construction of hit lists according to boolean combinations
of simpler searches. Fast performance is achieved by careful
use of index structures, and by evaluation specific index hit
components call resorting/re-ranking algorithms on the hit
sets. These might also be pre-sorted not only using the
assigned document ID's, but also using assigned static rank
components call resorting/re-ranking algorithms on the hit
sets. These might also be pre-sorted not only using the
assigned document ID's, but also using assigned static rank
- <para>
- The Debian package <literal>libidzebra-2.0</literal>
- contains all run-time libraries for &zebra;, the
- documentation in PDF and HTML is found in
+ <para>
+ The Debian package <literal>libidzebra-2.0</literal>
+ contains all run-time libraries for &zebra;, the
+ documentation in PDF and HTML is found in
<literal>idzebra-2.0-doc</literal>, and
<literal>idzebra-2.0-common</literal>
includes common essential &zebra; configuration files.
</para>
</section>
<literal>idzebra-2.0-doc</literal>, and
<literal>idzebra-2.0-common</literal>
includes common essential &zebra; configuration files.
</para>
</section>
<section id="componentindexer">
<title>&zebra; Indexer</title>
<para>
The <command>zebraidx</command>
<section id="componentindexer">
<title>&zebra; Indexer</title>
<para>
The <command>zebraidx</command>
loads external filter modules used for indexing data records of
different type, and creates, updates and drops databases and
indexes according to the rules defined in the filter modules.
loads external filter modules used for indexing data records of
different type, and creates, updates and drops databases and
indexes according to the rules defined in the filter modules.
The Debian package <literal>idzebra-2.0-utils</literal> contains
the <command>zebraidx</command> utility.
</para>
The Debian package <literal>idzebra-2.0-utils</literal> contains
the <command>zebraidx</command> utility.
</para>
<para>
This is the executable which runs the &acro.z3950;/&acro.sru;/&acro.srw; server and
glues together the core libraries and the filter modules to one
<para>
This is the executable which runs the &acro.z3950;/&acro.sru;/&acro.srw; server and
glues together the core libraries and the filter modules to one
The Debian package <literal>idzebra-2.0-utils</literal> contains
the <command>zebrasrv</command> utility.
</para>
The Debian package <literal>idzebra-2.0-utils</literal> contains
the <command>zebrasrv</command> utility.
</para>
toolkit that allows you to develop software using the
&acro.ansi; &acro.z3950;/ISO23950 standard for information retrieval.
toolkit that allows you to develop software using the
&acro.ansi; &acro.z3950;/ISO23950 standard for information retrieval.
<emphasis>how</emphasis> to do it, and <emphasis>which</emphasis>
part of the records to send in a search/retrieve response is
<emphasis>how</emphasis> to do it, and <emphasis>which</emphasis>
part of the records to send in a search/retrieve response is
various filter modules. It is their responsibility to define the
exact indexing and record display filtering rules.
</para>
<para>
The virtual Debian package
<literal>libidzebra-2.0-modules</literal> installs all base filter
various filter modules. It is their responsibility to define the
exact indexing and record display filtering rules.
</para>
<para>
The virtual Debian package
<literal>libidzebra-2.0-modules</literal> installs all base filter
</para>
<section id="componentmodulesdom">
<title>&acro.dom; &acro.xml; Record Model and Filter Module</title>
<para>
The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as
</para>
<section id="componentmodulesdom">
<title>&acro.dom; &acro.xml; Record Model and Filter Module</title>
<para>
The &acro.dom; &acro.xml; filter uses a standard &acro.dom; &acro.xml; structure as
- internal data model, and can thus parse, index, and display
+ internal data model, and can thus parse, index, and display
any &acro.xml; document.
</para>
<para>
A parser for binary &acro.marc; records based on the ISO2709 library
standard is provided, it transforms these to the internal
any &acro.xml; document.
</para>
<para>
A parser for binary &acro.marc; records based on the ISO2709 library
standard is provided, it transforms these to the internal
</para>
<para>
The internal &acro.dom; &acro.xml; representation can be fed into four
different pipelines, consisting of arbitrarily many successive
</para>
<para>
The internal &acro.dom; &acro.xml; representation can be fed into four
different pipelines, consisting of arbitrarily many successive
<itemizedlist>
<listitem><para>input parsing and initial
transformations,</para></listitem>
<itemizedlist>
<listitem><para>input parsing and initial
transformations,</para></listitem>
<xref linkend="componentmodulesdom"/>.
</para>
</note>
<para>
The Alvis filter for &acro.xml; files is an &acro.xslt; based input
<xref linkend="componentmodulesdom"/>.
</para>
</note>
<para>
The Alvis filter for &acro.xml; files is an &acro.xslt; based input
It indexes element and attribute content of any thinkable &acro.xml; format
using full &acro.xpath; support, a feature which the standard &zebra;
&acro.grs1; &acro.sgml; and &acro.xml; filters lacked. The indexed documents are
It indexes element and attribute content of any thinkable &acro.xml; format
using full &acro.xpath; support, a feature which the standard &zebra;
&acro.grs1; &acro.sgml; and &acro.xml; filters lacked. The indexed documents are
uses &acro.xslt; display stylesheets, which let
the &zebra; DB administrator associate multiple, different views on
the same &acro.xml; document type. These views are chosen on-the-fly in
uses &acro.xslt; display stylesheets, which let
the &zebra; DB administrator associate multiple, different views on
the same &acro.xml; document type. These views are chosen on-the-fly in
Finally, the Alvis filter allows for static ranking at index
time, and to to sort hit lists according to predefined
static ranks. This imposes no overhead at all, both
Finally, the Alvis filter allows for static ranking at index
time, and to to sort hit lists according to predefined
static ranks. This imposes no overhead at all, both
<emphasis>O(1)</emphasis> irrespectively of document
collection size. This feature resembles Google's pre-ranking using
their PageRank algorithm.
</para>
<para>
<emphasis>O(1)</emphasis> irrespectively of document
collection size. This feature resembles Google's pre-ranking using
their PageRank algorithm.
</para>
<para>
<xref linkend="grs"/>
are all based on the &acro.z3950; specifications, and it is absolutely
mandatory to have the reference pages on &acro.bib1; attribute sets on
<xref linkend="grs"/>
are all based on the &acro.z3950; specifications, and it is absolutely
mandatory to have the reference pages on &acro.bib1; attribute sets on
<literal>libidzebra-2.0-mod-grs-marc</literal>.
</para>
<para>
&acro.grs1; TCL scriptable filters for extensive user configuration come
<literal>libidzebra-2.0-mod-grs-marc</literal>.
</para>
<para>
&acro.grs1; TCL scriptable filters for extensive user configuration come
- a general scriptable TCL filter called
- <emphasis>grs.tcl</emphasis>
- are both included in the
+ a general scriptable TCL filter called
+ <emphasis>grs.tcl</emphasis>
+ are both included in the
<literal>libidzebra-2.0-mod-grs-regx</literal> Debian package.
</para>
<para>
A general purpose &acro.sgml; filter is called
<emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
<literal>libidzebra-2.0-mod-grs-regx</literal> Debian package.
</para>
<para>
A general purpose &acro.sgml; filter is called
<emphasis>grs.sgml</emphasis>. This filter is not yet packaged,
parse records in &acro.xml; and turn them into ID&zebra;'s internal &acro.grs1; node
trees. Have also a look at the Alvis &acro.xml;/&acro.xslt; filter described in
the next session.
</para>
</section>
parse records in &acro.xml; and turn them into ID&zebra;'s internal &acro.grs1; node
trees. Have also a look at the Alvis &acro.xml;/&acro.xslt; filter described in
the next session.
</para>
</section>
<para>
When records are accessed by the system, they are represented
in their local, or native format. This might be &acro.sgml; or HTML files,
<para>
When records are accessed by the system, they are represented
in their local, or native format. This might be &acro.sgml; or HTML files,
- as if it was a normal record, while in reality is a
- recap of most "important" terms in a result set for the fields
+ as if it was a normal record, while in reality is a
+ recap of most "important" terms in a result set for the fields
- The special
- <literal>zebra::data</literal> element set name is
- defined for any record syntax, but will always fetch
+ The special
+ <literal>zebra::data</literal> element set name is
+ defined for any record syntax, but will always fetch
displays all available metadata on the record. These include system
number, database name, indexed filename, filter used for indexing,
score and static ranking information and finally bytesize of record.
displays all available metadata on the record. These include system
number, database name, indexed filename, filter used for indexing,
score and static ranking information and finally bytesize of record.
indexed how and in which indexes. Using the indexing stylesheet of
the Alvis filter, one can at least see which portion of the record
went into which index, but a similar aid does not exist for all
indexed how and in which indexes. Using the indexing stylesheet of
the Alvis filter, one can at least see which portion of the record
went into which index, but a similar aid does not exist for all
</para>
<para>
The special
<literal>zebra::index</literal> element set names are provided to
access information on per record indexed fields. For example, the
</para>
<para>
The special
<literal>zebra::index</literal> element set names are provided to
access information on per record indexed fields. For example, the
</screen>
will display all indexed tokens from all indexed fields of the
first record, and it will display in <literal>&acro.sutrs;</literal>
</screen>
will display all indexed tokens from all indexed fields of the
first record, and it will display in <literal>&acro.sutrs;</literal>
displays in <literal>&acro.xml;</literal> record syntax only the content
of the zebra string index <literal>title</literal>, or
even only the type <literal>p</literal> phrase indexed part of it.
displays in <literal>&acro.xml;</literal> record syntax only the content
of the zebra string index <literal>title</literal>, or
even only the type <literal>p</literal> phrase indexed part of it.