[Zebralist] help with pqfquery with datetime on or betweencomparison

Walter McGinnis walter at katipo.co.nz
Thu Dec 4 22:50:05 CET 2008


On Dec 4, 2008, at 9:27 PM, Merijn van den Kroonenberg wrote:

> Quite possibly this may be the same problem I ran into myself.
> See: http://bugzilla.indexdata.dk/show_bug.cgi?id=2128
> You can test this easily by setting the truncmax value to a high  
> value.
>
> Merijn

Thanks for the tip Merijn.  That seems to have done the trick for my  
site with 17k+ records that was only returning 10k.  Interestingly I  
seem to have found another issue and it may still have to do with my  
query, but perhaps it is a  bug.

To demonstrate, here are some queries fresh checkout of Kete from  
master branch with yaz-2.1.56 and zebra 2.0.6 (might be 2.0.18):

Z> f @attr 13=1000000000 @attr 1=12 "oai:kete:"
f @attr 13=1000000000 @attr 1=12 "oai:kete:"
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 18, setno 2
SearchResult-1: term=oai cnt=18, term=kete cnt=18
records returned: 0
Elapsed: 0.000784

and

Z> f @attr 1=12 "oai:kete:"
f @attr 1=12 "oai:kete:"
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 18, setno 3
SearchResult-1: term=oai cnt=18, term=kete cnt=18
records returned: 0
Elapsed: 0.000728

These are correct number of hits for all records.  With the small  
number of items, that @attr 13... is not necessary.

Now I do a query with a "on or after the datetime of the earliest  
record in the repository":

Z> f @or @and @attr 1=12 "oai:kete:" @attr 2=4 @attr 1=1012 @attr 4=5  
"2008-12-04 21:02:37+1200" @attr 7=1 @attr 1=1012  0
f @or @and @attr 1=12 "oai:kete:" @attr 2=4 @attr 1=1012 @attr 4=5  
"2008-12-04 21:02:37+1200" @attr 7=1 @attr 1=1012  0
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 17, setno 4
SearchResult-1: term=oai cnt=18, term=kete cnt=1, term=2008-12-04  
21:02:37+1200 cnt=17
records returned: 0
Elapsed: 0.001264

17 results.  This is also correct, it excludes the bootstrap record,  
thus is only 17.

So let's try a "on or after the datetime of the earliest record but  
before or on the datetime of the last record in the repository":

Z> f @or @and @attr 1=12 "oai:kete:" @and @attr 2=4 @attr 1=1012 @attr  
4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr 1=1012 @attr 4=5  
"2008-12-04 21:07:03+1200" @attr 7=1 @attr 1=1012  0
f @or @and @attr 1=12 "oai:kete:" @and @attr 2=4 @attr 1=1012 @attr  
4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr 1=1012 @attr 4=5  
"2008-12-04 21:07:03+1200" @attr 7=1 @attr 1=1012  0
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 0, setno 5
SearchResult-1: term=oai cnt=18, term=kete cnt=18, term=2008-12-04  
21:02:37+1200 cnt=6, term=2008-12-04 21:07:03+1200 cnt=1
records returned: 0
Elapsed: 0.005240

0 results.  Yoicks.  Looks like something wrong with the "between or  
on the date of the last record" part of the query.  Adding "@attr  
13=1000000000" made no difference in this case.  One wrinkle of this  
testing is that ALL records were added TODAY.  The datetimes passed to  
Zebra are consistent, I think with the UTC it has stored for the  
records, but that might be an issue.

Lets do some datetime adjustments to see if that makes any  
difference.  First I'll bump up the time by an hour:

Z> f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr  
2=4 @attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-04 22:07:03+1200" @attr 7=1 @attr 1=1012  0
f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr 2=4  
@attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-04 22:07:03+1200" @attr 7=1 @attr 1=1012  0
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 0, setno 7
SearchResult-1: term=oai cnt=18, term=kete cnt=18, term=2008-12-04  
21:02:37+1200 cnt=6, term=2008-12-04 22:07:03+1200 cnt=1
records returned: 0
Elapsed: 0.001724

No difference.   How about the next day:

Z> f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr  
2=4 @attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-05 21:07:03+1200" @attr 7=1 @attr 1=1012  0
f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr 2=4  
@attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-05 21:07:03+1200" @attr 7=1 @attr 1=1012  0
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 17, setno 8
SearchResult-1: term=oai cnt=18, term=kete cnt=1, term=2008-12-04  
21:02:37+1200 cnt=17, term=2008-12-05 21:07:03+1200 cnt=17
records returned: 0
Elapsed: 0.002688

There we go.  So that is 24 hours after the last record's datestamp.   
Makes one suspicious of the syncing of the datetimes of the records.  
Kete feeds Zebra is UTC time, no offset.  The "on or before" datetime  
that is in our record we pass to Zebra, i.e the latest datetime in the  
repository, literally matches 2008-12-04 22:07:03+1200 if you take off  
the "+1200 offset.

So what if I add only 12 hours (i.e. what the offset is):

Z> f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr  
2=4 @attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-05 09:07:03+1200" @attr 7=1 @attr 1=1012  0
f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr 2=4  
@attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-05 09:07:03+1200" @attr 7=1 @attr 1=1012  0
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 17, setno 9
SearchResult-1: term=oai cnt=18, term=kete cnt=1, term=2008-12-04  
21:02:37+1200 cnt=17, term=2008-12-05 09:07:03+1200 cnt=17
records returned: 0
Elapsed: 0.002386

Ok, maybe that is it.  What if I test that theory by changing the end  
time to only 5 hours difference, something that shouldn't return  
records if that theory is correct:

Z> f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr  
2=4 @attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-05 02:07:03+1200" @attr 7=1 @attr 1=1012  0
f @attr 13=1000000000 @or @and @attr 1=12 "oai:kete:" @and @attr 2=4  
@attr 1=1012 @attr 4=5 "2008-12-04 21:02:37+1200" @attr 2=2 @attr  
1=1012 @attr 4=5 "2008-12-05 02:07:03+1200" @attr 7=1 @attr 1=1012  0
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 17, setno 10
SearchResult-1: term=oai cnt=18, term=kete cnt=1, term=2008-12-04  
21:02:37+1200 cnt=17, term=2008-12-05 02:07:03+1200 cnt=17
records returned: 0
Elapsed: 0.001856

Still returns 17 records even though, by the theory that the times are  
out of sync by the amount of the offset, they shouldn't.  Weird.

Actually what I, and a colleague, have found is that seems to have to  
do with being the next day that solves things.

Is there anything about my query that make you think that I would have  
to bump up the date to the next day to work?

All help appreciated.

Cheers,
Walter






More information about the Zebralist mailing list