<chapter id="introduction">
-<title>Introduction</title>
-
-<sect1>
-<title>Overview</title>
-
-<para>
-The Zebra system is a fielded free-text indexing and retrieval engine with a
-Z39.50 frontend. You can use any commercial or freeware Z39.50 client
-to access data stored in Zebra.
-</para>
-
-<para>
-The Zebra server is our first step towards the development of a fully
-configurable, open information system. Eventually, it will be paired
-off with a powerful Z39.50 client to support complex information
-management tasks within almost any application domain. We're making
-the server available now because it's no fun to be in the open
-information retrieval business all by yourself. We want to allow
-people with interesting data to make their things
-available in interesting ways, without having to start out
-by implementing yet another protocol stack from scratch.
-</para>
-
-<para>
-This document is an introduction to the Zebra system. It will tell you
-how to compile the software, and how to prepare your first database.
-It also explains how the server can be configured to give you the
-functionality that you need.
-</para>
-
-<para>
-If you find the software interesting, you should join the support
-mailing-list by sending email to <literal>zebra-request@indexdata.dk</literal>.
-</para>
-
-</sect1>
-
-<sect1 id="features">
-<title>Features</title>
-
-<para>
-This is a list of some of the most important features of the
-system.
-</para>
-
-<para>
-
-<itemizedlist>
-<listitem>
-
-<para>
-Supports updating - records can be added and deleted without
-rebuilding the index from scratch.
-The update procedure is tolerant to crashes or hard interrupts
-during register updating - registers can be reconstructed following a crash.
-Registers can be safely updated even while users are accessing the server.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Supports large databases - files for indices, etc. can be
-automatically partitioned over multiple disks.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Supports arbitrarily complex records - base input format is an
-SGML-like syntax which allows nested (structured) data elements, as
-well as variant forms of data.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Supports random storage formats. A system of input filters driven by
-regular expressions allows you to easily process most ASCII-based
-data formats. SGML, ISO2709 (MARC), and raw text are also supported.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Supports boolean queries as well as relevance-ranking (free-text)
-searching. Right truncation and masking in terms are supported, as
-well as full regular expressions.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Supports multiple concrete syntaxes
-for record exchange (depending on the configuration): GRS-1, SUTRS,
-ISO2709 (*MARC). Records can be mapped between record syntaxes and
-schema on the fly.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Supports approximate matching in registers (ie. spelling mistakes,
-etc).
-
-</para>
-</listitem>
-
-</itemizedlist>
-
-</para>
-
-<para>
-Protocol support:
-</para>
-
-<para>
-
-<itemizedlist>
-<listitem>
-
-<para>
-Protocol facilities: Init, Search, Retrieve, Browse and Sort.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Piggy-backed presents are honored in the search-request.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Named result sets are supported.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Easily configured to support different application profiles, with
-tables for attribute sets, tag sets, and abstract syntaxes.
-Additional tables control facilities such as element mappings to
-different schema (eg., GILS-to-USMARC).
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Complex composition specifications using Espec-1 are partially
-supported (simple element requests only).
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Element Set Names are defined using the Espec-1 capability of the
-system, and are given in configuration files as simple element
-requests (and possibly variant requests).
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Some variant support (not fully implemented yet).
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Using the YAZ toolkit for the protocol implementation, the
-server can utilise a plug-in XTI/mOSI implementation (not included) to
-provide SR services over an OSI stack, as well as Z39.50 over TCP/IP.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Zebra runs on most Unix-like systems as well as Windows NT - a binary
-distribution for Windows NT is forthcoming - so far, the installation
-requires MSVC++ to compile the system (we use version 5.0).
-
-</para>
-</listitem>
-
-</itemizedlist>
-
-</para>
-
-</sect1>
-
-<sect1 id="future">
-<title>Future Work</title>
-
-<para>
-This is a beta-release of the software, to allow you to look at
-it - try it out, and assess whether it can be of use to you.
-</para>
-
-<para>
-These are some of the plans that we have for the software in the near
-and far future, approximately ordered after their relative importance.
-Items marked with an
-asterisk will be implemented before the
-last beta release.
-</para>
-
-<para>
-
-<itemizedlist>
-<listitem>
-
-<para>
-*Complete the support for variants.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-*Finalize the data element <emphasis>include</emphasis> facility
-to support multimedia data elements in records.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Add more sophisticated relevance ranking mechanisms. Add support for soundex
-and stemming. Add relevance <emphasis remap="it">feedback</emphasis> support.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Complete EXPLAIN support.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Add support for very large records by implementing segmentation and/or
-variant pieces.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-Support the Item Update extended service of the protocol.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
-We want to add a management system that allows you to
-control your databases and configuration tables from a graphical
-interface. We'll probably use Tcl/Tk to stay platform-independent.
-
-</para>
-</listitem>
-
-</itemizedlist>
-
-</para>
-
-<para>
-Programmers thrive on user feedback. If you are interested in a facility that
-you don't see mentioned here, or if there's something you think we
-could do better, please drop us a mail. If you think it's all really
-neat, you're welcome to drop us a line saying that, too. You'll find
-contact info at the end of this file.
-</para>
-
-</sect1>
+ <!-- $Id: introduction.xml,v 1.43 2007-02-02 11:10:08 marc Exp $ -->
+ <title>Introduction</title>
+
+ <section id="overview">
+ <title>Overview</title>
+
+ <para>
+ &zebra; is a free, fast, friendly information management system. It can
+ index records in &xml;/&sgml;, &marc;, e-mail archives and many other
+ formats, and quickly find them using a combination of boolean
+ searching and relevance ranking. Search-and-retrieve applications can
+ be written using &api;s in a wide variety of languages, communicating
+ with the &zebra; server using industry-standard information-retrieval
+ protocols or web services.
+ </para>
+ <para>
+ &zebra; is licensed Open Source, and can be
+ deployed by anyone for any purpose without license fees. The C source
+ code is open to anybody to read and change under the GPL license.
+ </para>
+ <para>
+ &zebra; is a networked component which acts as a reliable &z3950; server
+ for both record/document search, presentation, insert, update and
+ delete operations. In addition, it understands the &sru; family of
+ webservices, which exist in &rest; &get;/&post; and truly &soap; flavors.
+ </para>
+ <para>
+ &zebra; is available as MS Windows 2003 Server (32 bit) self-extracting
+ package as well as GNU/Debian Linux (32 bit and 64 bit) precompiled
+ packages. It has been deployed successfully on other Unix systems,
+ including Sun Sparc, HP Unix, and many variants of Linux and BSD
+ based systems.
+ </para>
+ <para>
+ <ulink url="http://www.indexdata.com/zebra/">http://www.indexdata.com/zebra/</ulink>
+ <ulink url="http://ftp.indexdata.dk/pub/zebra/win32/">http://ftp.indexdata.dk/pub/zebra/win32/</ulink>
+ <ulink url="http://ftp.indexdata.dk/pub/zebra/debian/">http://ftp.indexdata.dk/pub/zebra/debian/</ulink>
+ </para>
+
+ <para>
+ <ulink url="http://indexdata.dk/zebra/">&zebra;</ulink>
+ is a high-performance, general-purpose structured text
+ indexing and retrieval engine. It reads records in a
+ variety of input formats (eg. email, &xml;, &marc;) and provides access
+ to them through a powerful combination of boolean search
+ expressions and relevance-ranked free-text queries.
+ </para>
+
+ <para>
+ &zebra; supports large databases (tens of millions of records,
+ tens of gigabytes of data). It allows safe, incremental
+ database updates on live systems. Because &zebra; supports
+ the industry-standard information retrieval protocol, &z3950;,
+ you can search &zebra; databases using an enormous variety of
+ programs and toolkits, both commercial and free, which understand
+ this protocol. Application libraries are available to allow
+ bespoke clients to be written in Perl, C, C++, Java, Tcl, Visual
+ Basic, Python, &php; and more - see the
+ <ulink url="&url.zoom;">&zoom; web site</ulink>
+ for more information on some of these client toolkits.
+ </para>
+
+ <para>
+ This document is an introduction to the &zebra; system. It explains
+ how to compile the software, how to prepare your first database,
+ and how to configure the server to give you the
+ functionality that you need.
+ </para>
+ </section>
+
+ <section id="features">
+ <title>&zebra; Features Overview</title>
+
+
+ <table id="table-features-overview" frame="top">
+ <title>&zebra; Features Overview</title>
+ <tgroup cols="4">
+ <thead>
+ <row>
+ <entry>Feature</entry>
+ <entry>Availability</entry>
+ <entry>Notes</entry>
+ <entry>Reference</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Boolean query language</entry>
+ <entry>&cql; and &rpn;/&pqf;</entry>
+ <entry>The type-1 Reverse Polish Notation (&rpn;)
+ and it's textual representation Prefix Query Format (&pqf;) are
+ supported. The Common Query Language (&cql;) can be configured as
+ a mapping from &cql; to &rpn;/&pqf;</entry>
+ <entry><xref linkend="querymodel-query-languages-pqf"/>
+ <xref linkend="querymodel-cql-to-pqf"/></entry>
+ </row>
+ <row>
+ <entry>Operation types</entry>
+ <entry> &z3950;/&sru; explain, search, and scan</entry>
+ <entry></entry>
+ <entry><xref linkend="querymodel-operation-types"/></entry>
+ </row>
+ <row>
+ <entry>Recursive boolean query tree</entry>
+ <entry>&cql; and &rpn;/&pqf;</entry>
+ <entry>Both &cql; and &rpn;/&pqf; allow atomic query parts (&apt;) to
+ be combined into complex boolean query trees</entry>
+ <entry><xref linkend="querymodel-rpn-tree"/></entry>
+ </row>
+ <row>
+ <entry>Large databases</entry>
+ <entry>64 file pointers assure that register files can extend
+ the 2 GB limit. Logical files can be
+ automatically partitioned over multiple disks, thus allowing for
+ large databases.</entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Complex semi-structured Documents</entry>
+ <entry>&xml; and &grs1; Documents</entry>
+ <entry>Both &xml; and &grs1; documents exhibit a &dom; like internal
+ representation allowing for complex indexing and display rules</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Database updates</entry>
+ <entry>live, incremental updates</entry>
+ <entry>Robust updating - records can be added and deleted ``on the fly''
+ without rebuilding the index from scratch.
+ Records can be safely updated even while users are accessing
+ the server.
+ The update procedure is tolerant to crashes or hard interrupts
+ during database updating - data can be reconstructed following
+ a crash.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Input document formats</entry>
+ <entry>&xml;, &sgml;, Text, ISO2709 (&marc;)</entry>
+ <entry>
+ A system of input filters driven by
+ regular expressions allows most ASCII-based
+ data formats to be easily processed.
+ &sgml;, &xml;, ISO2709 (&marc;), and raw text are also
+ supported.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Relevance ranking</entry>
+ <entry>TF-IDF like</entry>
+ <entry>Relevance-ranking of free-text queries is supported
+ using a TF-IDF like algorithm.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Document storage</entry>
+ <entry>Index-only, Key storage, Document storage</entry>
+ <entry>Data can be, and usually is, imported
+ into &zebra;'s own storage, but &zebra; can also refer to
+ external files, building and maintaining indexes of "live"
+ collections.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Regular expression matching</entry>
+ <entry>Regexp </entry>
+ <entry>Full regular expression matching and "approximate
+ matching" (eg. spelling mistake corrections) are handled.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Search truncation</entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Remote update</entry>
+ <entry>&z3950; extended services</entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Supported Platforms</entry>
+ <entry>UNIX, Linux, Windows (NT/2000/2003/XP)</entry>
+ <entry>&zebra; is written in portable C, so it runs on most
+ Unix-like systems as well as Windows (NT/2000/2003/XP). Binary
+ distributions are
+ available for GNU/Debian Linux and Windows</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>&z3950;</entry>
+ <entry>&z3950; protocol support</entry>
+ <entry> Protocol facilities: Init, Search, Present (retrieval),
+ Segmentation (support for very large records), Delete, Scan
+ (index browsing), Sort, Close and support for the ``update''
+ Extended Service to add or replace an existing &xml;
+ record. Piggy-backed presents are honored in the search
+ request. Named result sets are supported.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Record Syntaxes</entry>
+ <entry></entry>
+ <entry> Multiple record syntaxes
+ for data retrieval: &grs1;, &sutrs;,
+ &xml;, ISO2709 (&marc;), etc. Records can be mapped between record syntaxes
+ and schemas on the fly.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry>Web Service support</entry>
+ <entry>&sru_gps;</entry>
+ <entry> The protocol operations <literal>explain</literal>,
+ <literal>searchRetrieve</literal> and <literal>scan</literal>
+ are supported. <ulink url="&url.cql;">&cql;</ulink> to internal
+ query model &rpn; conversion is supported. Extended RPN queries
+ for search/retrieve and scan are supported.</entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ <row>
+ <entry></entry>
+ <entry></entry>
+ <entry></entry>
+ <entry><xref linkend=""/></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+
+
+ </section>
+
+ <section id="introduction-apps">
+ <title>References and &zebra; based Applications</title>
+ <para>
+ &zebra; has been deployed in numerous applications, in both the
+ academic and commercial worlds, in application domains as diverse
+ as bibliographic catalogues, geospatial information, structured
+ vocabulary browsing, government information locators, civic
+ information systems, environmental observations, museum information
+ and web indexes.
+ </para>
+ <para>
+ Notable applications include the following:
+ </para>
+
+
+ <section id="koha-ils">
+ <title>Koha free open-source ILS</title>
+ <para>
+ <ulink url="http://www.koha.org/">Koha</ulink> is a full-featured
+ open-source ILS, initially developed in
+ New Zealand by Katipo Communications Ltd, and first deployed in
+ January of 2000 for Horowhenua Library Trust. It is currently
+ maintained by a team of software providers and library technology
+ staff from around the globe.
+ </para>
+ <para>
+ <ulink url="http://liblime.com/">LibLime</ulink>,
+ a company that is marketing and supporting Koha, adds in
+ the new release of Koha 3.0 the &zebra;
+ database server to drive its bibliographic database.
+ </para>
+ <para>
+ In early 2005, the Koha project development team began looking at
+ ways to improve &marc; support and overcome scalability limitations
+ in the Koha 2.x series. After extensive evaluations of the best
+ of the Open Source textual database engines - including MySQL
+ full-text searching, PostgreSQL, Lucene and Plucene - the team
+ selected &zebra;.
+ </para>
+ <para>
+ "&zebra; completely eliminates scalability limitations, because it
+ can support tens of millions of records." explained Joshua
+ Ferraro, LibLime's Technology President and Koha's Project
+ Release Manager. "Our performance tests showed search results in
+ under a second for databases with over 5 million records on a
+ modest i386 900Mhz test server."
+ </para>
+ <para>
+ "&zebra; also includes support for true boolean search expressions
+ and relevance-ranked free-text queries, both of which the Koha
+ 2.x series lack. &zebra; also supports incremental and safe
+ database updates, which allow on-the-fly record
+ management. Finally, since &zebra; has at its heart the &z3950;
+ protocol, it greatly improves Koha's support for that critical
+ library standard."
+ </para>
+ <para>
+ Although the bibliographic database will be moved to &zebra;, Koha
+ 3.0 will continue to use a relational SQL-based database design
+ for the 'factual' database. "Relational database managers have
+ their strengths, in spite of their inability to handle large
+ numbers of bibliographic records efficiently," summed up Ferraro,
+ "We're taking the best from both worlds in our redesigned Koha
+ 3.0.
+ </para>
+ <para>
+ See also LibLime's newsletter article
+ <ulink url="http://www.liblime.com/newsletter/2006/01/features/koha-earns-its-stripes/">
+ Koha Earns its Stripes</ulink>.
+ </para>
+ </section>
+
+ <section id="emilda-ils">
+ <title>Emilda open source ILS</title>
+ <para>
+ <ulink url="http://www.emilda.org/">Emilda</ulink>
+ is a complete Integrated Library System, released under the
+ GNU General Public License. It has a
+ full featured Web-OPAC, allowing comprehensive system management
+ from virtually any computer with an Internet connection, has
+ template based layout allowing anyone to alter the visual
+ appearance of Emilda, and is
+ &xml; based language for fast and easy portability to virtually any
+ language.
+ Currently, Emilda is used at three schools in Espoo, Finland.
+ </para>
+ <para>
+ As a surplus, 100% &marc; compatibility has been achieved using the
+ &zebra; Server from Index Data as backend server.
+ </para>
+ </section>
+
+ <section id="reindex-ils">
+ <title>ReIndex.Net web based ILS</title>
+ <para>
+ <ulink url="http://www.reindex.net/index.php?lang=en">Reindex.net</ulink>
+ is a netbased library service offering all
+ traditional functions on a very high level plus many new
+ services. Reindex.net is a comprehensive and powerful WEB system
+ based on standards such as &xml; and &z3950;.
+ updates. Reindex supports &marc21;, dan&marc; eller Dublin Core with
+ UTF8-encoding.
+ </para>
+ <para>
+ Reindex.net runs on GNU/Debian Linux with &zebra; and Simpleserver
+ from Index
+ Data for bibliographic data. The relational database system
+ Sybase 9 &xml; is used for
+ administrative data.
+ Internally &marcxml; is used for bibliographical records. Update
+ utilizes &z3950; extended services.
+ </para>
+ </section>
+
+ <section id="dads-article-database">
+ <title>DADS - the DTV Article Database
+ Service</title>
+ <para>
+ DADS is a huge database of more than ten million records, totalling
+ over ten gigabytes of data. The records are metadata about academic
+ journal articles, primarily scientific; about 10% of these
+ metadata records link to the full text of the articles they
+ describe, a body of about a terabyte of information (although the
+ full text is not indexed.)
+ </para>
+ <para>
+ It allows students and researchers at DTU (Danmarks Tekniske
+ Universitet, the Technical College of Denmark) to find and order
+ articles from multiple databases in a single query. The database
+ contains literature on all engineering subjects. It's available
+ on-line through a web gateway, though currently only to registered
+ users.
+ </para>
+ <para>
+ More information can be found at
+ <ulink url="http://www.dtv.dk/"/> and
+ <ulink url="http://dads.dtv.dk"/>
+ </para>
+ </section>
+
+ <section id="infonet-eprints">
+ <title>Infonet Eprints</title>
+ <para>
+ The InfoNet Eprints service from the
+ <ulink url="http://www.dtv.dk/">
+ Technical Knowledge Center of Denmark</ulink>
+ provides access to documents stored in
+ eprint/preprint servers and institutional research archives around
+ the world. The service is based on Open Archives Initiative metadata
+ harvesting of selected scientific archives around the world. These
+ open archives offer free and unrestricted access to their contents.
+ </para>
+ <para>
+ Infonet Eprints currently holds 1.4 million records from 16 archives.
+ The online search facility is found at
+ <ulink url="http://preprints.cvt.dk"/>.
+ </para>
+ </section>
+
+ <section id="alvis-project">
+ <title>Alvis</title>
+ <para>
+ The <ulink url="http://www.alvis.info/alvis/">Alvis</ulink> EU
+ project run under the 6th Framework (IST-1-002068-STP)
+ is building a semantic-based peer-to-peer search engine. A
+ consortium of eleven partners from six different European
+ Community countries plus Switzerland and China contribute
+ with expertise in a broad range of specialties including network
+ topologies, routing algorithms, linguistic analysis and
+ bioinformatics.
+ </para>
+ <para>
+ The &zebra; information retrieval indexing machine is used inside
+ the Alvis framework to
+ manage huge collections of natural language processed and
+ enhanced &xml; data, coming from a topic relevant web crawl.
+ In this application, &zebra; swallows and manages 37GB of &xml; data
+ in about 4 hours, resulting in search times of fractions of
+ seconds.
+ </para>
+ </section>
+
+
+ <section id="uls">
+ <title>ULS (Union List of Serials)</title>
+ <para>
+ The M25 Systems Team
+ has created a union catalogue for the periodicals of the
+ twenty-one constituent libraries of the University of London and
+ the University of Westminster
+ (<ulink url="http://www.m25lib.ac.uk/ULS/"/>).
+ They have achieved this using an
+ unusual architecture, which they describe as a
+ ``non-distributed virtual union catalogue''.
+ </para>
+ <para>
+ The member libraries send in data files representing their
+ periodicals, including both brief bibliographic data and summary
+ holdings. Then 21 individual &z3950; targets are created, each
+ using &zebra;, and all mounted on the single hardware server.
+ The live service provides a web gateway allowing &z3950; searching
+ of all of the targets or a selection of them. &zebra;'s small
+ footprint allows a relatively modest system to comfortably host
+ the 21 servers.
+ </para>
+ <para>
+ More information can be found at
+ <ulink url="http://www.m25lib.ac.uk/ULS/"/>
+ </para>
+ </section>
+
+ <section id="nli">
+ <title>NLI-&z3950; - a Natural Language Interface for Libraries</title>
+ <para>
+ Fernuniversität Hagen in Germany have developed a natural
+ language interface for access to library databases.
+ <!-- <ulink
+ url="http://ki212.fernuni-hagen.de/nli/NLIintro.html"/> -->
+ In order to evaluate this interface for recall and precision, they
+ chose &zebra; as the basis for retrieval effectiveness. The &zebra;
+ server contains a copy of the GIRT database, consisting of more
+ than 76000 records in &sgml; format (bibliographic records from
+ social science), which are mapped to &marc; for presentation.
+ </para>
+ <para>
+ (GIRT is the German Indexing and Retrieval Testdatabase. It is a
+ standard German-language test database for intelligent indexing
+ and retrieval systems. See
+ <ulink url="http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm"/>)
+ </para>
+ <para>
+ Evaluation will take place as part of the TREC/CLEF campaign 2003
+ <ulink url="http://clef.iei.pi.cnr.it"/>.
+ <!-- or <ulink url="http://www4.eurospider.ch/CLEF/"/> -->
+ </para>
+ <para>
+ For more information, contact Johannes Leveling
+ <email>Johannes.Leveling@FernUni-Hagen.De</email>
+ </para>
+ </section>
+
+ <section id="various-web-indexes">
+ <title>Various web indexes</title>
+ <para>
+ &zebra; has been used by a variety of institutions to construct
+ indexes of large web sites, typically in the region of tens of
+ millions of pages. In this role, it functions somewhat similarly
+ to the engine of google or altavista, but for a selected intranet
+ or a subset of the whole Web.
+ </para>
+ <para>
+ For example, Liverpool University's web-search facility (see on
+ the home page at
+ <ulink url="http://www.liv.ac.uk/"/>
+ and many sub-pages) works by relevance-searching a &zebra; database
+ which is populated by the Harvest-NG web-crawling software.
+ </para>
+ <para>
+ For more information on Liverpool university's intranet search
+ architecture, contact John Gilbertson
+ <email>jgilbert@liverpool.ac.uk</email>
+ </para>
+ <para>
+ Kang-Jin Lee
+ has recently modified the Harvest web indexer to use &zebra; as
+ its native repository engine. His comments on the switch over
+ from the old engine are revealing:
+ <blockquote>
+ <para>
+ The first results after some testing with &zebra; are very
+ promising. The tests were done with around 220,000 SOIF files,
+ which occupies 1.6GB of disk space.
+ </para>
+ <para>
+ Building the index from scratch takes around one hour with &zebra;
+ where [old-engine] needs around five hours. While [old-engine]
+ blocks search requests when updating its index, &zebra; can still
+ answer search requests.
+ [...]
+ &zebra; supports incremental indexing which will speed up indexing
+ even further.
+ </para>
+ <para>
+ While the search time of [old-engine] varies from some seconds
+ to some minutes depending how expensive the query is, &zebra;
+ usually takes around one to three seconds, even for expensive
+ queries.
+ [...]
+ &zebra; can search more than 100 times faster than [old-engine]
+ and can process multiple search requests simultaneously
+ </para>
+ <para>
+ I am very happy to see such nice software available under GPL.
+ </para>
+ </blockquote>
+ </para>
+ </section>
+ </section>
+
+
+ <section id="introduction-support">
+ <title>Support</title>
+ <para>
+ You can get support for &zebra; from at least three sources.
+ </para>
+ <para>
+ First, there's the &zebra; web site at
+ <ulink url="&url.idzebra;"/>,
+ which always has the most recent version available for download.
+ If you have a problem with &zebra;, the first thing to do is see
+ whether it's fixed in the current release.
+ </para>
+ <para>
+ Second, there's the &zebra; mailing list. Its home page at
+ <ulink url="&url.idzebra.mailinglist;"/>
+ includes a complete archive of all messages that have ever been
+ posted on the list. The &zebra; mailing list is used both for
+ announcements from the authors (new
+ releases, bug fixes, etc.) and general discussion. You are welcome
+ to seek support there. Join by filling the form on the list home page.
+ </para>
+ <para>
+ Third, it's possible to buy a commercial support contract, with
+ well defined service levels and response times, from Index Data.
+ See
+ <ulink url="&url.indexdata.support;"/>
+ for details.
+ </para>
+ </section>
+
+
+ <section id="future">
+ <title>Future Directions</title>
+
+ <para>
+ These are some of the plans that we have for the software in the near
+ and far future, ordered approximately as we expect to work on them.
+ </para>
+
+ <para>
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Improved support for &xml; in search and retrieval. Eventually,
+ the goal is for &zebra; to pull double duty as a flexible
+ information retrieval engine and high-performance &xml;
+ repository. The recent addition of XPath searching is one
+ example of the kind of enhancement we're working on.
+ </para>
+ <para>
+ There is also the experimental <literal>ALVIS &xslt;</literal>
+ &xml; input filter, which unleashes the full power of &dom; based
+ &xslt; transformations during indexing and record retrieval. Work
+ on this filter has been sponsored by the ALVIS EU project
+ <ulink url="http://www.alvis.info/alvis/"/>. We expect this filter to
+ mature soon, as it is planned to be included in the version 2.0
+ release of &zebra;.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Finalisation and documentation of &zebra;'s C programming
+ &api;, allowing updates, database management and other functions
+ not readily expressed in &z3950;. We will also consider
+ exposing the &api; through &soap;.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Improved free-text searching. We're first and foremost octet jockeys and
+ we're actively looking for organisations or people who'd like
+ to contribute experience in relevance ranking and text
+ searching.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </para>
+
+ <para>
+ Programmers thrive on user feedback. If you are interested in a
+ facility that you don't see mentioned here, or if there's something
+ you think we could do better, please drop us a mail. Better still,
+ implement it and send us the patches.
+ </para>
+ <para>
+ If you think it's all really neat, you're welcome to drop us a line
+ saying that, too. You can email us on
+ <email>info@indexdata.dk</email>
+ or check the contact info at the end of this manual.
+ </para>
+
+ </section>
</chapter>
+ <!-- Keep this comment at the end of the file
+ Local variables:
+ mode: sgml
+ sgml-omittag:t
+ sgml-shorttag:t
+ sgml-minimize-attributes:nil
+ sgml-always-quote-attributes:t
+ sgml-indent-step:1
+ sgml-indent-data:t
+ sgml-parent-document: "zebra.xml"
+ sgml-local-catalogs: nil
+ sgml-namecase-general:t
+ End:
+ -->