[Yazlist] pazpar2, solr and truncation

Thomas Fox thomas.fox at seitenbau.com
Tue Jan 7 09:15:00 UTC 2014


I'm not sure how the current state in solr is, but the state some time ago was that you can configure solr such that it allows leading wildcard searches, but with normal indexing it takes a long time [1]. The default, however, was not allowing queries starting with a wildcard.
To improve performance, one can reverse-index the search terms [2] at index time, which technically turns the query starting with a wildcard in a query ending with a wildcards. (This does not help for queries starting and ending with a wildcard, of course).

So to sum up, left truncation is certainly possible with solr, but I'm not sure how much configuration work it needs.

     Thomas

[1] http://lucene.472066.n3.nabble.com/Wildcards-at-the-Beginning-of-a-Search-td505007.html
[2] http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
    and http://solr.pl/en/2011/10/10/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-solr-reversedwildcardfilter-%E2%80%93-lets-optimize-wildcard-queries-part-8

Dennis Schafroth wrote:

> As far as I recall, Solr did not support left truncation in earlier versions, so we didn’t implemented it.
> If Solr supports it now, it should trivial to add.
>
> cheers, 
> :-Dennis

On 03 Jan 2014, at 10:32 , Thomas Fox <thomas.fox at seitenbau.com> wrote:

> Thanks, Adam, for your quick reply.
> 
> Using more experimentation, I finally managed it to work using t=z (this is the one setting I apparently left out in yesterday's testing :-( ).
> Just to document the other settings I have tried: Both t=l,r,b and t=x fail, because the PQF cannot be converted to SOLR syntax.
> 
> For t=l,r,b still only right truncation (america?) works, but not left truncation (?america) or both (?merica?):
> The latter two give HTTP 417: Malformed parameter value and the logs says (for ?america)
> 10:02:48-03/01 pazpar2 00b72a33de7f0000 [log] Client localhost:8080/solr/biblio/select: CCL query: ?america limit:
> 10:02:48-03/01 pazpar2 00b72a33de7f0000 [loglevel] returning NO log bit for 'odr'
> 10:02:48-03/01 pazpar2 00b72a33de7f0000 [log] Session 1: PQF for Client localhost:8080/solr/biblio/select: @attr 5=2 @attr 1=title america
> 10:02:48-03/01 pazpar2 00b72a33de7f0000 [warn] Failed to generate SOLR query, code=-1
> 
> For t=x and searching for *merica*, the logs say
> 
> 10:19:20-03/01 pazpar2 00d7006dd37f0000 [log] Session 1: PQF for Client localhost:8080/solr/biblio/select: @attr 5=102 @attr 1=title .*merica.*
> 10:19:20-03/01 pazpar2 00d7006dd37f0000 [warn] Failed to generate SOLR query, code=-1
> 
>  Thanks again for your help and sorry for bothering you,
> 
>      Thomas
> 
> ----- Ursprüngliche Mail -----
> Von: "Adam Dickmeiss" <adam at indexdata.dk>
> An: yazlist at lists.indexdata.dk
> Gesendet: Donnerstag, 2. Januar 2014 17:55:30
> Betreff: Re: [Yazlist] pazpar2, solr and truncation
> 
> On 01/02/2014 11:56 AM, Thomas Fox wrote:
>> Dear pazpar2 experts,
>> 
>> I have a question regarding pazpar2 and truncation.
>> My pazpar2 intallation is configured to query a solr index, and I'd like to allow truncation everywhere in the queries.
>> 
>> Upon reading [1], I have managed to get right truncation to work, e.g. using the configuration
>>   <set name="pz:cclmap:term" value="1=title t=r"/>
>> and issuing the pazpqr2 command (%3F is URL encoding for ?)
>>   http://.../search.pz2?session=8&command=search&query=america%3F
>> pazpar2 retrieves the titles which contain the right-truncated word america (e.g. america, american etc) (which is expected)
>> When not using a wildcard in the query, e.g
>>   http://.../search.pz2?session=8&command=search&query=america
>> pazpar2 retrieves all titles which contain the word america, but not american etc... (which is also expected)
>> 
>> However, I do not get any other truncation mode from [2] to work.
>> Expecially, if I set
>>   <set name="pz:cclmap:term" value="1=title t=b"/>
>> and re-issue the pazpar2 command
>>   http://.../search.pz2?session=8&command=search&query=america%3F
>> pazpar2 gives me a HTTP error 417 and says "Malformed parameter value" in the response.
>> All I can find in the pazpar2 logs is
>>   ...
>>   ...[warn] Session 1: Client localhost:8080/solr/biblio/select: Failed to parse CCL query 'america?'
>>   ...
>>   ...[warn] HTTP 417 Malformed parameter value: query
>>   ...[log] Response: 0.00074 7 /search.pz2?session=1&command=search&query=america%3F
>>   ...
>> 
>> Just experimenting, I get the same behaviour with t=l and querying for ?merica, in fact the only truncation mode I get working is t=r.
>> 
>> Is there anything I can do about this ?
> t=b
> would only allow ?america?
> 
> use
> 
> t=l,r,b
> 
> to support all three combos. See also 
> http://www.indexdata.com/yaz/doc/tools.html#ccl.special.attribute.combos
> 
> Guess the manual should have some examples of that.. :-)
> 
> See
>> 
>>    Thanks in advance,
>> 
>>       Thomas
>> 
>> [1] http://lists.indexdata.dk/pipermail/yazlist/2009-January/002638.html
>> [2] http://www.indexdata.com/yaz/doc/tools.html Table 7.1 and 7.2
>> 
>> 
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>> 
> 
> 
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
> 
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist


_______________________________________________
Yazlist mailing list
Yazlist at lists.indexdata.dk
http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist



More information about the Yazlist mailing list