[Zebralist] help with pqfquery with datetime on or betweencomparison

Merijn van den Kroonenberg merijn at web2all.nl
Thu Dec 4 09:27:54 CET 2008


Quite possibly this may be the same problem I ran into myself.
See: http://bugzilla.indexdata.dk/show_bug.cgi?id=2128
You can test this easily by setting the truncmax value to a high value.

Merijn

----- Original Message ----- 
From: "Walter McGinnis" <walter at katipo.co.nz>
To: "Zebra Information Server" <zebralist at lists.indexdata.dk>
Sent: Wednesday, December 03, 2008 11:25 PM
Subject: [Zebralist] help with pqfquery with datetime on or 
betweencomparison


> Hi everyone,
>
> I thought I had this one sorted, but apparently not.
>
> In the past I set up an OAI PMH Repository based on the Ruby OAI gem
> and bolting on my own ZoomDbWrapper provider class for querying my
> Zebra instances with PQF.  After asking this list about how to do
> datetime comparisons and digging around in 
> http://github.com/kete/kete/tree/master/zebradb/conf/cql2pqf.txt#Relation
>  Info for reference, I thought I had concocted a pretty good "give me
> records on this beginning datetime or between it  and this end
> datetime or on the end datetime" query.
>
> Real world experience has proven me wrong.  Or at least I think my
> query is the problem.  Anyway, on to the troubleshooting...
>
> Yaz version:
>
> $ dpkg --list yaz
> ...
> Name
> Version                                             Description
> +++-===================================================-
> ===================================================-
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ========================================================================
> ii  yaz                                                 2.1.48-1
>
>
> $ dpkg --list idzebra*
> ...
> Name
> Version                                             Description
> +++-===================================================-
> ===================================================-
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> =
> ========================================================================
> ii  idzebra-2.0                                         2.0.10-1
>
> Quite long in the tooth.  However, I haven't had the time to work
> through some incompatibilities between our existing Zebra configs and
> the later YAZ/Zebra versions yet (if anyone feels like helping with
> that process, I would love it).
>
> Now to the meat of the thing.  When an OAI PMH harvester makes a
> request where it doesn't specify a date range three queries actually
> happen based on the oai identifier index (12) and the oai datestamp
> index (1012):
>
> * one query to get the earliest record from repository, for example:
>
> Z> f @or @not @attr 1=12 "oai:horowhenua.kete.net.nz:" @attr 1=12
> "Bootstrap" @attr 7=1 @attr 1=1012  0
> Sent searchRequest.
> Received SearchResponse.
> Search was a success.
> Number of hits: 17181, setno 2
> SearchResult-1: term=oai cnt=17100, term=horowhenua cnt=1, term=kete
> cnt=1, term=net cnt=1, term=nz cnt=1, term=Bootstrap cnt=1
> records returned: 0
> Elapsed: 0.214545
> Z> show 1
> Sent presentRequest (1+1).
> Records: 1
> [public]Record type: XML
> <?xml version="1.0" encoding="UTF-8"?>
> <record xmlns="http://www.openarchives.org/OAI/2.0/">
>   <header>
>     <identifier>oai:horowhenua.kete.net.nz:site:StillImage:7</
> identifier>
>     <datestamp>2007-02-18T13:56:38Z</datestamp>
>   </header>
>   <metadata>
> ...
>
>
> * one query to get the latest record from repository (these two could
> probably be done in one go in the future, but for not a big
> performance hit), for example:
>
> Z> f @or @not @attr 1=12 "oai:horowhenua.kete.net.nz:" @attr 1=12
> "Bootstrap" @attr 7=2 @attr 1=1012  0
> Sent searchRequest.
> Received SearchResponse.
> Search was a success.
> Number of hits: 17181, setno 1
> SearchResult-1: term=oai cnt=17100, term=horowhenua cnt=1, term=kete
> cnt=1, term=net cnt=1, term=nz cnt=1, term=Bootstrap cnt=1
> records returned: 0
> Elapsed: 0.365962
> Z> show 1
> Sent presentRequest (1+1).
> Records: 1
> [public]Record type: XML
> <?xml version="1.0" encoding="UTF-8"?>
> <record xmlns="http://www.openarchives.org/OAI/2.0/">
>       <header>
>         <identifier>oai:horowhenua.kete.net.nz:site:Topic:2067</
> identifier>
>         <datestamp>2008-12-02T23:01:37Z</datestamp>
>       </header>
>       <metadata>
> ...
>
> Notice we exclude any "Bootstrap" records (not real content).  The
> number of hits is consistent between the two queries, since only the
> sorting order is changed.
>
> * a third query to get the records between the earliest datetime and
> the latest datetime, we set these so as to give a consistent
> resumption token
>
> Z> f @and @attr 1=12 "oai:horowhenua.kete.net.nz:" @and @attr 2=4
> @attr 1=1012 @attr 4=5 "2007-02-18 13:56:38" @attr 2=2 @attr 1=1012
> @attr 4=5 "2008-12-02 23:01:37"
> Sent searchRequest.
> Received SearchResponse.
> Search was a success.
> Number of hits: 10119, setno 3
> Result Set Status: subset
> SearchResult-1: term=oai cnt=15400, term=horowhenua cnt=1, term=kete
> cnt=1, term=net cnt=1, term=nz cnt=1, term=2007-02-18 13:56:38
> cnt=10100, term=2008-12-02 23:01:37 cnt=10100
> records returned: 0
> Elapsed: 6.635536
> Z> show 1
> Sent presentRequest (1+1).
> Records: 1
> [public]Record type: XML
> <?xml version="1.0" encoding="UTF-8"?>
> <record xmlns="http://www.openarchives.org/OAI/2.0/">
>   <header>
>     <identifier>oai:horowhenua.kete.net.nz:site:Topic:3</identifier>
>     <datestamp>2007-05-03T20:25:08Z</datestamp>
>   </header>
>   <metadata>
> ...
>
> Notice that the Number hits drops by over 7k.  Hmm, I probably add
> some sorting to that query as well.
>
> Any ideas as to why I'm missing out on so many records?
>
> Cheers,
> Walter
>
>
> _______________________________________________
> Zebralist mailing list
> Zebralist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/zebralist
> 




More information about the Zebralist mailing list