[Zebralist] indexing streams
quinn at indexdata.com
Fri Jun 30 06:56:27 CEST 2006
This one gave me a good chuckle.
Eric, I'm not 100% sure how well this is described in the manual today
-- especially in whatever version you have, because this stuff is
changing all the time and the CVS version is requiring both new
functionality and new documentation all the time.. but below is an
overview I gathered together a while back, trying to piece together the
element of how to do remote updating using Z39.50 (and hence using the
ZOOM API). The example given is in PHP.. Mike may be able to produce an
example in Perl if it isn't obvious.
PS: When it says the record update has to be XML formatted, in the case
of MARC, this means MARCXML. It's a temporary but persistent source of
annoyance for some that Zebra can't ingest good old ISO2709 via Z39.50
extended services -- only via the command line...
SH: This is an attempt to summarize what I can glean from various
emails, looking at code, NEWS-entries and documentation about using
extended services to support remote updating. If this can be validated,
it should probably be worked into the documentation and/or a new
whitepaper on embedding Zebra. PLEASE HELP ME CORRECT/FILL IN DETAILS
In the abstract (protocol level)
These are the main input parameters to an update
The extended service action type. Should be set to 'update'.
According to the source code, Zebra recognizes 4 actions:
* recordInsert. Will fail if the record already exists.
* recordReplace. Will fail if the record does not exist.
* recordDelete. Will fail if the record does not exist.
* specialUpdate. Will insert or update the record as needed.
Presently, this has to be XML formatted.
Presently, this has to be set to XML. It would be cool to use this to
carry information about record types or similar to Zebra, but because it
is represented as an OID, we would need some form of proprietary mapping
scheme between record type strings and OIDs.
However, as a minimum, it would be extremely useful to enable people to
use MARC21, assuming grs.marcxml.marc21 as a record type.
This is a client-supplied, opaque record identifier. The client software
is responsible for assigning these to records. Providing one of these is
optional. If a record is inserted, it is given this identifier. XX ARE
THESE VISIBLE IN RETRIEVAL RECORDS?? ARE THEY SEARCHABLE??
This is Zebra's internal system number. It is an error to provide one of
these when creating a new record. When retrieving existing records, the
ID number is returned in the field /*/id:idzebra/localnumber,
xmlns:id="http://www.indexdata.dk/zebra/". You can search for internal
record numbers by setting @attr
(internal record IDs)
If none of the above two types of identifiers are supplied, Zebra will
attempt to use record identifiers derived from the record.. see
http://www.indexdata.dk/zebra/doc/generic-ids.tkl (XX IS THIS CORRECT?)
The name of the database to which this operation should be applied.
Other remote operations related to updating
Configuration options related to updating
(the documentation for the Zebra configuration file is at
Zebra will not, by default, allow just any client to update records
(wouldn't that be something). In order to support this, you need to
identify specific users (with associated access credentials) and
permission levels to support updating.
Use the setting
to *only* allow the user 'admin' to update Zebra. You can associate a
password with user perm by creating a password file. Use the setting
to load a password file named 'passwords'. Use a tool like 'htpasswd' to
maintain the encrypted passwords.
You can still allow anonymous users to *search* your database by adding
To tell Zebra to store your records internally, set
The record type should be set to:
In order to support modifications/deletion of records, set
To enable shadow indexing (which ought to be extra important for this
type of updates), set
shadow: directoryname: size (e.g. 1000M)
Doing it in PHP
$record = '<record><title>A fine specimen of a record</title></record>';
$options = array(
'action' => 'recordInsert',
'syntax' => 'xml',
'record' => $record,
'databaseName' => 'mydatabase'
yaz_es($yaz, 'update', $options);
yaz_es($yaz, 'commit', array());
if ($error = yaz_error($yaz))
Doing it in Perl
Mike, would you be able to quickly provide some example code here?
Eric Lease Morgan wrote:
> On Jun 29, 2006, at 9:22 PM, Joshua Ferraro wrote:
>>> Does Zebra allow you to index streams of text, or does it just index
>> You're looking for Net::Z3950::ZOOM
> Oops! It looks as if I could use the ZOOM::Package class to create,
> drop, commit, and update databases/indexes. Fun! Z39.50 gets richer
> and richer.
Sebastian Hammer, Index Data
quinn at indexdata.com www.indexdata.com
Ph: (603) 209-6853
More information about the Zebralist