[Yazlist] author and other merging in pazpar2
porst at sub.uni-goettingen.de
Mon Jun 4 19:04:05 CEST 2012
in many cases pazpar2’s merging feature works very well and greatly improves search results by creating fewer of them.
Recently, however, I've started to work with a broader range of databases which contain records created in different systems (and presumably using different cataloguing rules). As a result, I often find 'obviously' duplicate records which pazpar2 did not manage to merge.
On a closer look, the software is right to do so because that is how it is configured. I now attempted to make pazpar2 merge records more aggressively by
1) only using the first author (as I found the $e 'aut' qualifier to be frequently unreliable in Marc 700 fields, thus cannot reliably detect further authors.
2) only use the author’s last name (i.e. the part before the comma)
3) use the content of the title-complete field (from tmarc.xsl) rather than the title and title-remainder fields (as different cataloguing rules or MARC export conversions can lead to a different distribution of the text to the $a $b subfields of 245.
So far I have been quite happy with the results of that which are achieved by creating additional pz:metadata fields just for merging. 
If you have tried similar approaches (or know cases where this heuristic is likely to fail), I’d be interested to discuss that!
 I just created an additional XSL which I append to pazpar2’s XSL chain to create the new fields:
SUB Göttingen, Bibliotheksinformationssysteme
Zimmer 2.38 . Platz der Göttinger Sieben 1 . D-37073 Gö . +49-551-39x4255
More information about the Yazlist