[Ex-plain] Harvesting algorithm

Alan Kent ajk at mds.rmit.edu.au
Fri Apr 26 10:53:20 CEST 2002


On Thu, Apr 25, 2002 at 12:15:30PM +0100, Mike Taylor wrote:
> > So, a harvester should:
> > * Collect ZeeRex records from other databases
> > * If the authoratitive flag is set,
> > 	Set the flag to false
> > 	Add the z3950r: URL into the aggregatedFrom element
> > * I should always set/update the dateAggregated element(?)
> > * Change the 'id' attribute to something locally unique(?)
> > * Save new record in my local database
> > * (If record is for IR-ZeeRex-1/IR-Explain---1 then add to list of
> >   databases to harvest from)
> 
> Sounds perfect so far.

I just realized one possible fault. If Index Data (for example) creates
synthesized records for all the sites in automatically probes, should it
set authorative to true? It feels wrong. Even worse, if set to true,
then if the library (or whoever) writes their own authorative record,
then which do I keep?

So I think I cannot assume that there will always be an authortative
record for a database. This requires a bit of fiddling in the algorithm.
(Too tired to do it just now.)

Alan




More information about the Ex-plain mailing list