X-Git-Url: http://lists.indexdata.dk/cgi-bin?a=blobdiff_plain;f=doc%2Fbook.xml;h=e223d7ef1b34cac5c8818709a750e8ebfcf66950;hb=3e762d9cf53e8ed8049d43879a32c0e72ad68dc5;hp=9f96d75a39d3f3fdad14e70eb20d3451c304428d;hpb=687d1431ba75a222ee963d3b9c54efa4ba4f1599;p=pazpar2-moved-to-github.git
diff --git a/doc/book.xml b/doc/book.xml
index 9f96d75..e223d7e 100644
--- a/doc/book.xml
+++ b/doc/book.xml
@@ -156,16 +156,25 @@
Connectors to non-standard databases
- If you wish to connect to commercial or other databases which do not
- support open standards, please contact Index Data on
- info@indexdata.com. We have a
- proprietary framework for building connectors that enable Pazpar2
- to access
- thousands of online databases, in addition to the vast number of catalogs
- and online services that support the Z39.50/SRU/SRW/SOLR protocols.
+ If you need to access commercial or open access resources that don't support
+ Z39.50 or SRU, one approach would be to use a tool like SimpleServer to build a
+ gateway. An easier option is to use Index Data's MasterKey Connect
+ service, which will expose virtually any resource
+ through Z39.50/SRU, dead easy to integrate with Pazpar2.
+ The service is hosted, so all you have to do is to let us
+ know which resources you are interested in, and we operate the gateways,
+ or Connectors for you for a low annual charge.
+ Types of resources supported include
+ commercial databases, free online resources, and even local resources;
+ almost anything that can be accessed through a web-facing user
+ interface can be accessed in this way.
+ Contact info@indexdata.com for more information.
+ See for an example.
-
+
A note on the name Pazpar2
@@ -648,76 +657,6 @@
§-ajaxdev;
-
- Connecting to non-standard resources
-
- Pazpar2 uses Z39.50 as its switchboard language -- i.e. as far as it
- is concerned, all resources speak Z39.50, its webservices derivatives,
- SRU/SRW and SOLR servers exposing Lucene indexes. It is, however, equipped
- to handle a broad range of different server behavior, through
- configurable query mapping and record normalization. If you develop
- configuration, stylesheets, etc., for a new type of resources, we
- encourage you to share your work. But you can also use Pazpar2 to
- connect to hundreds of resources that do not support standard
- protocols.
-
-
-
- For a growing number of resources, Z39.50 is all you need. Over the
- last few years, a number of commercial, full-text resources have
- implemented Z39.50. These can be used through Pazpar2 with little or
- no effort. Resources that use non-standard record formats will
- require a bit of XSLT work, but that's all.
-
-
-
- But what about resources that don't support Z39.50 at all?
- Some resources might support OpenSearch, private, XML/HTTP-based
- protocols, or something else entirely.
- Some databases exist only as web user interfaces and
- will require screen-scraping. Still others exist only as static
- files, or perhaps as databases supporting the OAI-PMH protocol.
- There is hope! Read on.
-
-
-
- Index Data continues to advocate the support of open standards. We
- work with database vendors to support standards, so you don't have
- to worry about programming against non-standard services. We also
- provide tools (see SimpleServer)
- which make it comparatively easy to build gateways against servers
- with non-standard behavior. Again, we encourage you to share any
- work you do in this direction.
-
-
-
- But the bottom line is that working with non-standard resources in
- metasearching is really, really hard. If you want to build a
- project with Pazpar2, and you need access to resources with
- non-standard interfaces, we can help. We run gateways to more than
- 2,000 popular, commercial databases and other resources,
- making it simple
- to plug them directly into Pazpar2. For a small annual fee per
- database, we can help you establish connections to your licensed
- resources. Meanwhile, you can help! If you build your own
- standards-compliant gateways, host them for others, or share the
- code! And tell your vendors that they can save everybody money and
- increase the appeal of their resources by supporting standards.
-
-
-
- There are those who will ask us why we are using Z39.50 as our
- switchboard language rather than a different protocol. Basically,
- we believe that Z39.50 is presently the most widely implemented
- information retrieval protocol that has the level of functionality
- required to support a good metasearching experience (structured
- searching, structured, well-defined results). It is also compact and
- efficient, and there is a very broad range of tools available to
- implement it.
-
-
-
Unicode Compliance
@@ -825,6 +764,111 @@
+
+ Relevance ranking
+
+ Pazpar2 uses a variant of the fterm frequencyâinverse document frequency
+ (Tf-idf) ranking algorithm.
+
+
+ The Tf-part is straightforward to calculate and is based on the
+ documents that Pazpar2 fetches. The idf-part, however, is more tricky
+ since the corpus at hand is ONLY the relevant documents and not
+ irrelevant ones. Pazpar2 does not have the full corpus -- only the
+ documents that match a particular search.
+
+
+ Computatation of the Tf-part is based on the normalized documents.
+ The length, the position and terms are thus normalized at this point.
+ Also the computation if performed for each document received from the
+ target - before merging takes place. The result of a TF-compuation is
+ added to the TF-total of a cluster. Thus, if a document occurs twice,
+ then the TF-part is doubled. That, however, can be adjusted, because the
+ TF-part may be divided by the number of documents in a cluster.
+
+
+ The algorithm used by Pazpar2 has two phases. In phase one
+ Pazpar2 computes a tf-array .. This is being done as records are
+ fetched form the database. In this case, the rank weigth
+ w, the and rank tweaks lead,
+ follow and length.
+
+
+ 0)
+ w[i] += w[i] * follow / (1+log2(d)
+ // length: length of field (number of terms that is)
+ if (length strategy is "linear")
+ tf[i] += w[i] / length;
+ else if (length strategy is "log")
+ tf[i] += w[i] / log2(length);
+ else if (length strategy is "none")
+ tf[i] += w[i];
+ ]]>
+
+ In phase two, the idf-array is computed and the final score
+ is computed. This is done for each cluster as part of each show command.
+ The rank tweak cluster is in use here.
+
+ 0)
+ idf[i] = log(1 + doctotal / dococcur[i])
+ else
+ idf[i] = 0;
+
+ relevance = 0;
+ for i = 1, .., N: (each term)
+ if (cluster is "yes")
+ tf[i] = tf[i] / cluster_size;
+ relevance += 100000 * tf[i] / idf[i];
+ ]]>
+
+
+
+ Pazpar2 and MasterKey Connect
+
+ MasterKey Connect is a hosted connector, or gateway, service that exposes
+ whatever searchable resources you need. Since the service exposes all
+ resources using Z39.50 (or SRU), it is easy to set up Pazpar2 to use the
+ service. In particular, since all connectors expose basically the same core
+ behavior, it is a good use of Pazpar2's mechanism for managing default
+ behaviors across similar databases.
+
+
+ After installation of Pazpar2, the directory
+ /etc/pazpar2/settings/mkc (location may
+ vary depending on installation preferences) contains an example setup that
+ searches two different resources through a MasterKey Connect demo account.
+ The file mkc.xml contains default parameters that will work for all
+ MasterKey Connect resources (if you decide to become a customer of the
+ service, you will substitute your own account credentials for
+ the guest/guest). The other files contain specific information about
+ a couple of demonstration resources.
+
+
+
+ To play with the demo, just create a symlink from
+ /etc/pazpar2/services-enabled/default.xml
+ to /etc/pazpar2/services-available/mkc.xml.
+ And restart Pazpar2. You should now be able to search the two demo
+ resources using JSDemo or any user interface of your choice.
+ If you are interested in learning more about MasterKey Connect, or to
+ try out the service for free against your favorite online resource, just
+ contact us at info@indexdata.com.
+
+