X-Git-Url: http://lists.indexdata.dk/cgi-bin?a=blobdiff_plain;f=doc%2Fbook.xml;h=e223d7ef1b34cac5c8818709a750e8ebfcf66950;hb=3e762d9cf53e8ed8049d43879a32c0e72ad68dc5;hp=9f96d75a39d3f3fdad14e70eb20d3451c304428d;hpb=687d1431ba75a222ee963d3b9c54efa4ba4f1599;p=pazpar2-moved-to-github.git diff --git a/doc/book.xml b/doc/book.xml index 9f96d75..e223d7e 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -156,16 +156,25 @@
Connectors to non-standard databases - If you wish to connect to commercial or other databases which do not - support open standards, please contact Index Data on - info@indexdata.com. We have a - proprietary framework for building connectors that enable Pazpar2 - to access - thousands of online databases, in addition to the vast number of catalogs - and online services that support the Z39.50/SRU/SRW/SOLR protocols. + If you need to access commercial or open access resources that don't support + Z39.50 or SRU, one approach would be to use a tool like SimpleServer to build a + gateway. An easier option is to use Index Data's MasterKey Connect + service, which will expose virtually any resource + through Z39.50/SRU, dead easy to integrate with Pazpar2. + The service is hosted, so all you have to do is to let us + know which resources you are interested in, and we operate the gateways, + or Connectors for you for a low annual charge. + Types of resources supported include + commercial databases, free online resources, and even local resources; + almost anything that can be accessed through a web-facing user + interface can be accessed in this way. + Contact info@indexdata.com for more information. + See for an example.
- +
A note on the name Pazpar2 @@ -648,76 +657,6 @@ §-ajaxdev; -
- Connecting to non-standard resources - - Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it - is concerned, all resources speak Z39.50, its webservices derivatives, - SRU/SRW and SOLR servers exposing Lucene indexes. It is, however, equipped - to handle a broad range of different server behavior, through - configurable query mapping and record normalization. If you develop - configuration, stylesheets, etc., for a new type of resources, we - encourage you to share your work. But you can also use Pazpar2 to - connect to hundreds of resources that do not support standard - protocols. - - - - For a growing number of resources, Z39.50 is all you need. Over the - last few years, a number of commercial, full-text resources have - implemented Z39.50. These can be used through Pazpar2 with little or - no effort. Resources that use non-standard record formats will - require a bit of XSLT work, but that's all. - - - - But what about resources that don't support Z39.50 at all? - Some resources might support OpenSearch, private, XML/HTTP-based - protocols, or something else entirely. - Some databases exist only as web user interfaces and - will require screen-scraping. Still others exist only as static - files, or perhaps as databases supporting the OAI-PMH protocol. - There is hope! Read on. - - - - Index Data continues to advocate the support of open standards. We - work with database vendors to support standards, so you don't have - to worry about programming against non-standard services. We also - provide tools (see SimpleServer) - which make it comparatively easy to build gateways against servers - with non-standard behavior. Again, we encourage you to share any - work you do in this direction. - - - - But the bottom line is that working with non-standard resources in - metasearching is really, really hard. If you want to build a - project with Pazpar2, and you need access to resources with - non-standard interfaces, we can help. We run gateways to more than - 2,000 popular, commercial databases and other resources, - making it simple - to plug them directly into Pazpar2. For a small annual fee per - database, we can help you establish connections to your licensed - resources. Meanwhile, you can help! If you build your own - standards-compliant gateways, host them for others, or share the - code! And tell your vendors that they can save everybody money and - increase the appeal of their resources by supporting standards. - - - - There are those who will ask us why we are using Z39.50 as our - switchboard language rather than a different protocol. Basically, - we believe that Z39.50 is presently the most widely implemented - information retrieval protocol that has the level of functionality - required to support a good metasearching experience (structured - searching, structured, well-defined results). It is also compact and - efficient, and there is a very broad range of tools available to - implement it. - -
-
Unicode Compliance @@ -825,6 +764,111 @@
+
+ Relevance ranking + + Pazpar2 uses a variant of the fterm frequency–inverse document frequency + (Tf-idf) ranking algorithm. + + + The Tf-part is straightforward to calculate and is based on the + documents that Pazpar2 fetches. The idf-part, however, is more tricky + since the corpus at hand is ONLY the relevant documents and not + irrelevant ones. Pazpar2 does not have the full corpus -- only the + documents that match a particular search. + + + Computatation of the Tf-part is based on the normalized documents. + The length, the position and terms are thus normalized at this point. + Also the computation if performed for each document received from the + target - before merging takes place. The result of a TF-compuation is + added to the TF-total of a cluster. Thus, if a document occurs twice, + then the TF-part is doubled. That, however, can be adjusted, because the + TF-part may be divided by the number of documents in a cluster. + + + The algorithm used by Pazpar2 has two phases. In phase one + Pazpar2 computes a tf-array .. This is being done as records are + fetched form the database. In this case, the rank weigth + w, the and rank tweaks lead, + follow and length. + + + 0) + w[i] += w[i] * follow / (1+log2(d) + // length: length of field (number of terms that is) + if (length strategy is "linear") + tf[i] += w[i] / length; + else if (length strategy is "log") + tf[i] += w[i] / log2(length); + else if (length strategy is "none") + tf[i] += w[i]; + ]]> + + In phase two, the idf-array is computed and the final score + is computed. This is done for each cluster as part of each show command. + The rank tweak cluster is in use here. + + 0) + idf[i] = log(1 + doctotal / dococcur[i]) + else + idf[i] = 0; + + relevance = 0; + for i = 1, .., N: (each term) + if (cluster is "yes") + tf[i] = tf[i] / cluster_size; + relevance += 100000 * tf[i] / idf[i]; + ]]> +
+ +
+ Pazpar2 and MasterKey Connect + + MasterKey Connect is a hosted connector, or gateway, service that exposes + whatever searchable resources you need. Since the service exposes all + resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the + service. In particular, since all connectors expose basically the same core + behavior, it is a good use of Pazpar2's mechanism for managing default + behaviors across similar databases. + + + After installation of Pazpar2, the directory + /etc/pazpar2/settings/mkc (location may + vary depending on installation preferences) contains an example setup that + searches two different resources through a MasterKey Connect demo account. + The file mkc.xml contains default parameters that will work for all + MasterKey Connect resources (if you decide to become a customer of the + service, you will substitute your own account credentials for + the guest/guest). The other files contain specific information about + a couple of demonstration resources. + + + + To play with the demo, just create a symlink from + /etc/pazpar2/services-enabled/default.xml + to /etc/pazpar2/services-available/mkc.xml. + And restart Pazpar2. You should now be able to search the two demo + resources using JSDemo or any user interface of your choice. + If you are interested in learning more about MasterKey Connect, or to + try out the service for free against your favorite online resource, just + contact us at info@indexdata.com. + +