[Zebralist] indexing streams

Sebastian Hammer quinn at indexdata.com
Fri Jun 30 06:56:27 CEST 2006


This one gave me a good chuckle.

Eric, I'm not 100% sure how well this is described in the manual today 
-- especially in whatever version you have, because this stuff is 
changing all the time and the CVS version is requiring both new 
functionality and new documentation all the time.. but below is an 
overview I gathered together a while back, trying to piece together the 
element of how to do remote updating using Z39.50 (and hence using the 
ZOOM API). The example given is in PHP.. Mike may be able to produce an 
example in Perl if it isn't obvious.  

PS: When it says the record update has to be XML formatted, in the case 
of MARC, this means MARCXML. It's a temporary but persistent source of 
annoyance for some that Zebra can't ingest good old ISO2709 via Z39.50 
extended services -- only via the command line...

--------

SH: This is an attempt to summarize what I can glean from various 
emails, looking at code, NEWS-entries and documentation about using 
extended services to support remote updating. If this can be validated, 
it should probably be worked into the documentation and/or a new 
whitepaper on embedding Zebra. PLEASE HELP ME CORRECT/FILL IN DETAILS 
I'VE MISSED.


      In the abstract (protocol level)

These are the main input parameters to an update


        Type

The extended service action type. Should be set to 'update'.


        Action

According to the source code, Zebra recognizes 4 actions:

    * recordInsert. Will fail if the record already exists.
    * recordReplace. Will fail if the record does not exist.
    * recordDelete. Will fail if the record does not exist.
    * specialUpdate. Will insert or update the record as needed.


        Record

Presently, this has to be XML formatted.


        Syntax

Presently, this has to be set to XML. It would be cool to use this to 
carry information about record types or similar to Zebra, but because it 
is represented as an OID, we would need some form of proprietary mapping 
scheme between record type strings and OIDs.

However, as a minimum, it would be extremely useful to enable people to 
use MARC21, assuming grs.marcxml.marc21 as a record type.


        recordIdOpaque

This is a client-supplied, opaque record identifier. The client software 
is responsible for assigning these to records. Providing one of these is 
optional. If a record is inserted, it is given this identifier. XX ARE 
THESE VISIBLE IN RETRIEVAL RECORDS?? ARE THEY SEARCHABLE??


        recordIdNumber

This is Zebra's internal system number. It is an error to provide one of 
these when creating a new record. When retrieving existing records, the 
ID number is returned in the field /*/id:idzebra/localnumber, 
xmlns:id="http://www.indexdata.dk/zebra/". You can search for internal 
record numbers by setting @attr


        (internal record IDs)

If none of the above two types of identifiers are supplied, Zebra will 
attempt to use record identifiers derived from the record.. see 
http://www.indexdata.dk/zebra/doc/generic-ids.tkl (XX IS THIS CORRECT?)


        databaseName

The name of the database to which this operation should be applied.


      Other remote operations related to updating


      Configuration options related to updating

(the documentation for the Zebra configuration file is at 
http://www.indexdata.dk/zebra/doc/configuration-file.tkl).

Zebra will not, by default, allow just any client to update records 
(wouldn't that be something). In order to support this, you need to 
identify specific users (with associated access credentials) and 
permission levels to support updating.

Use the setting

perm.admin: rw

to *only* allow the user 'admin' to update Zebra. You can associate a 
password with user perm by creating a password file. Use the setting

dbaccess.c: passwords

to load a password file named 'passwords'. Use a tool like 'htpasswd' to 
maintain the encrypted passwords.

You can still allow anonymous users to *search* your database by adding 
the setting:

perm.anonymous: r

To tell Zebra to store your records internally, set

storeData: 1

The record type should be set to:

recordType: grs.xml

In order to support modifications/deletion of records, set

storeKeys: 1

To enable shadow indexing (which ought to be extra important for this 
type of updates), set

shadow: directoryname: size (e.g. 1000M)


      Doing it in PHP

$record = '<record><title>A fine specimen of a record</title></record>';

$options = array(
    'action' => 'recordInsert',
    'syntax' => 'xml',
    'record' => $record,
    'databaseName' => 'mydatabase'
);

yaz_es($yaz, 'update', $options);
yaz_es($yaz, 'commit', array());
yaz_wait();

if ($error = yaz_error($yaz))
    echo "$error";
    


      Doing it in Perl

Mike, would you be able to quickly provide some example code here?



--Seb

Eric Lease Morgan wrote:

>
> On Jun 29, 2006, at 9:22 PM, Joshua Ferraro wrote:
>
>>> Does Zebra allow you to index streams of text, or does it just index
>>> files?
>>
>>
>> You're looking for Net::Z3950::ZOOM
>
>
>
> Oops! It looks as if I could use the ZOOM::Package class to create,  
> drop, commit, and update databases/indexes. Fun! Z39.50 gets richer  
> and richer.
>

-- 
Sebastian Hammer, Index Data
quinn at indexdata.com   www.indexdata.com
Ph: (603) 209-6853




More information about the Zebralist mailing list