From: Sebastian Hammer Date: Fri, 19 Jan 2007 18:28:08 +0000 (+0000) Subject: Updated documentation. This update may be unstable, as I can't presently test on... X-Git-Tag: stable.27032007~52 X-Git-Url: http://lists.indexdata.dk/?a=commitdiff_plain;h=8f48376798d4b43d962726ef68f547cbd471d670;p=pazpar2-moved-to-github.git Updated documentation. This update may be unstable, as I can't presently test on my laptop. --- diff --git a/doc/book.xml b/doc/book.xml index 7d28253..4ec781e 100644 --- a/doc/book.xml +++ b/doc/book.xml @@ -9,165 +9,369 @@ %common; ]> - + - - Pazpar2 - User's Guide and Reference - - SebastianHammer - - - ©right-year; - Index Data - - - - Pazpar2 - High-performance, user-interface independent, metasearching - middleware featuring record merging, relevance ranking, and faceted search - results. - - - This document is a guide and reference to Pazpar version &version;. - - - - - - - - - - - - - - - - Introduction - - Pazpar2 is a stand-alone package which implements - the best we know to do in terms of the core metasearching - functionality; that is, searching a number of databases in parallel, - merging, and analyzing the results. Additional functionality such as - user management, attractive displays are expected to be implemented by - applications that use pazpar2. Pazpar2 is user interface independent. - Its functionality is exposed through a simple REST-style webservice API, - designed to be simple to use from an Ajax-anbled browser, from a - higher-level server-side language like PHP or Java, or even from a Flash - application. - - - Once you launch a search in pazpar2, the operation continues behind the - scenes. Pazpar2 connects to servers, carries out searches, and - retrieves, deduplicates, and stores results internally. Your application - code may periodically inquire about the status of an ongoing operation, - and ask to see records or other result set facets. - - - Pazpar2 is designed to be highly configurable. Incoming records are - normalized to XML/UTF-8, and then further normalized using XSLT to a - simple internal representation that is suitable for analysis. By - providing XSLT stylesheets for different kinds of result records, you - can tune pazpar2 to work against different kinds of information - retrieval servers. Finally, metadata is extracted, in a configurable - way, from this internal record, to support display, merging, ranking, - result set facets, and sorting. Pazpar2 is not bound to a specific model - of metadata, such as DublinCore or MARC -- by providing the right - configuration, it can work with a number of different kinds of data in - support of many different applications. - - - Pazpar2 is designed to be efficient and scalable. You can set it up to - search several hundred targets in parallel, or you can use it to support - hundreds of concurrent users. It is implemented with the same attention - to performance and economy that we use in our indexing engines, so that - you can focus on building your application. You can devote all of your - attention to usability and let pazpar2 do what it does best -- search. - - - - - Pazpar2 License - To be decided and written. - - - - Installation - - Pazpar2 depends on the following tools/libraries: - - YAZ - - - The popular Z39.50 toolkit for the C language. YAZ must be - compiled with Libxml2/Libxslt support. - - - - - - - In order to compile Pazpar2 an ANSI C compiler is - required. The requirements should be the same as for YAZ. - - -
- Installation on Unix (from Source) + + Pazpar2 - User's Guide and Reference + + SebastianHammer + + + ©right-year; + Index Data + + + + Pazpar2 is a high-performance, user interface-independent, data + model-independent metasearching + middleware featuring merging, relevance ranking, record sorting, + and faceted results. + + + This document is a guide and reference to Pazpar version &version;. + + + + + + + + + + + + + + + + Introduction - Here is a quick step-by-step guide on how to compile the - tools that Pazpar2 uses. Only few systems have none of the required - tools binary packages. If, for example, Libxml2/libxslt are already - installed as development packages use these. + Pazpar2 is a stand-alone metasearch client with a webservice API, designed + to be used either from a browser-based client (JavaScript, Flash, Java, + etc.), from from server-side code, or any combination of the two. + Pazpar2 is a highly optimized client designed to + search many resources in parallel. It implements record merging, + relevance-ranking and sorting by arbitrary data content, and facet + analysis for browsing purposes. It is designed to be data model + independent, and is capable of working with MARC, DublinCore, or any + other XML-structured response format -- XSLT is used to normalize and extract + data from retrieval records for display and analysis. It can be used + against any server which supports the Z39.50 protocol. Proprietary + backend modules can be used to support a large number of other protocols + (please contact Index Data for further information about this). - - Ensure that the development libraries + header files are - available on your system before compiling Pazpar2. For installation - of YAZ, refer to the YAZ installation chapter. + Additional functionality such as + user management, attractive displays are expected to be implemented by + applications that use pazpar2. Pazpar2 is user interface independent. + Its functionality is exposed through a simple REST-style webservice API, + designed to be simple to use from an Ajax-enbled browser, Flash + animation, Java applet, etc., or from a higher-level server-side language + like PHP or Java. Because session information can be shared between + browser-based logic and your server-side scripting, there is tremendous + flexibility in how you implement your business logic on top of pazpar2. - - gunzip -c pazpar2-version.tar.gz|tar xf - - cd pazpar2-version - ./configure - make - su - make install - -
- -
- Installation on Debian GNU/Linux - All dependencies for Pazpar2 are available as - Debian - packages for the sarge (stable in 2005) and etch (testing in 2005) - distributions. + Once you launch a search in pazpar2, the operation continues behind the + scenes. Pazpar2 connects to servers, carries out searches, and + retrieves, deduplicates, and stores results internally. Your application + code may periodically inquire about the status of an ongoing operation, + and ask to see records or other result set facets. Result become + available immediately, and it is easy to build end-user interfaces which + feel extremely responsive, even when searching more than 100 servers + concurrently. - The procedures for Debian based systems, such as - Ubuntu is probably similar + Pazpar2 is designed to be highly configurable. Incoming records are + normalized to XML/UTF-8, and then further normalized using XSLT to a + simple internal representation that is suitable for analysis. By + providing XSLT stylesheets for different kinds of result records, you + can tune pazpar2 to work against different kinds of information + retrieval servers. Finally, metadata is extracted, in a configurable + way, from this internal record, to support display, merging, ranking, + result set facets, and sorting. Pazpar2 is not bound to a specific model + of metadata, such as DublinCore or MARC -- by providing the right + configuration, it can work with a number of different kinds of data in + support of many different applications. - - apt-get install libyaz-dev - - With these packages installed, the usual configure + make - procedure can be used for Pazpar2 as outlined in - . + Pazpar2 is designed to be efficient and scalable. You can set it up to + search several hundred targets in parallel, or you can use it to support + hundreds of concurrent users. It is implemented with the same attention + to performance and economy that we use in our indexing engines, so that + you can focus on building your application, without worrying about the + details of metasearch logic. You can devote all of your attention to + usability and let pazpar2 do what it does best -- metasearch. + + + If you wish to connect to commercial or other databases which do not + support open standards, please contact Index Data. We have a licensing + agreement with a third party vendor which will enable pazpar2 to access + thousands of online databases, in addition the vast number of catalogs + and online services that support the Z39.50 protocol. + + + Pazpar2 is our attempt to re-think the traditional paradigms for + implementing and deploying metasearch logic, with an uncompromising + approach to performance, and attempting to make maximum use of the + capabilities of modern browsers. The demo user interface that + accompanies the distribution is but one example. If you think of new + ways of using pazpar2, we hope you'll share them with us, and if we + can provide assistance with regards to training, design, programming, + integration with different backends, hosting, or support, please don't + hesitate to contact us. If you'd like to see functionality in pazpar2 + that is not there today, please don't hesitate to contact us. It may + already be in our development pipeline, or there might be a + possibility for you to help out by sponsoring development time or + code. Either way, get in touch and we will give you straight answers. + + + Enjoy! + + + + + + Pazpar2 License + To be decided and written. + + + + Installation + + Pazpar2 depends on the following tools/libraries: + + YAZ + + + The popular Z39.50 toolkit for the C language. YAZ must be + compiled with Libxml2/Libxslt support. + + + + -
-
- - - Reference - - The material in this chapter is drawn directly from the individual - manual entries. + In order to compile Pazpar2 an ANSI C compiler is + required. The requirements should be the same as for YAZ. - - &manref; - + +
+ Installation on Unix (from Source) + + Here is a quick step-by-step guide on how to compile the + tools that Pazpar2 uses. Only few systems have none of the required + tools binary packages. If, for example, Libxml2/libxslt are already + installed as development packages use these. + + + + Ensure that the development libraries + header files are + available on your system before compiling Pazpar2. For installation + of YAZ, refer to the YAZ installation chapter. + + + gunzip -c pazpar2-version.tar.gz|tar xf - + cd pazpar2-version + ./configure + make + su + make install + +
+ +
+ Installation on Debian GNU/Linux + + All dependencies for Pazpar2 are available as + Debian + packages for the sarge (stable in 2005) and etch (testing in 2005) + distributions. + + + The procedures for Debian based systems, such as + Ubuntu is probably similar + + + apt-get install libyaz-dev + + + With these packages installed, the usual configure + make + procedure can be used for Pazpar2 as outlined in + . + +
+ + + + Using pazpar2 + + This chapter provides a general introduction to the use and deployment of pazpar2. + + +
+ Pazpar2 and your systems architecture + + Pazpar2 is designed to provide asynchronous, behind-the-scenes + metasearching functionality to your application, exposing this + functionality using a simple webservice API that can be accessed + from any number of development environments. In particular, it is + possible to combine pazpar2 either with your server-side dynamic + website scripting, with scripting or code running in the browser, or + with any combination of the two. Pazpar2 is an excellent tool for + building advanced, Ajax-based user interfaces for metasearch + functionality, but it isn't a requirement -- you can choose to use + pazpar2 entirely as a backend to your regular server-side scripting. + When you do use pazpar2 in conjunction + with browser scripting (JavaScript/Ajax, Flash, applets, etc.), there are + special considerations. + + + + Pazpar2 implements a simple but efficient HTTP server, and it is + designed to interact directly with scripting running in the browser + for the best possible performance, and to limit overhead when + several browser clients generate numerous webservice requests. + However, it is still desirable to use a conventional webserver, + such as Apache, to serve up graphics, HTML documents, and + server-side scripting. Because the security sandbox environment of + most browser-side programming environments only allows communication + with the server from which the enclosing HTML page or object + originated, pazpar2 is designed so that it can act as a transparent + proxy in front of an existing webserver (see for details). In this mode, all regular + HTTP requests are transparently passed through to your webserver, + while pazpar2 only intercepts search-related webservice requests. + + + + If you want to expose your combined service on port 80, you can + either run your regular webserver on a different port, a different + server, or a different IP address associated with the same server. + + + + Sometimes, it may be necessary to implement functionality on your + regular webserver that makes use of search results, for example to + implement data import functionality, emailing results, history + lists, personal citation lists, interlibrary loan functionality + ,etc. Fortunately, it is simple to exchange information between + pazpar2, your browser scripting, and backend server-side scripting. + You can send a session ID and possibly a record ID from your browser + code to your server code, and from there use pazpar2s webservice API + to access result sets or individual records. You could even 'hide' + all of pazpar2s functionality between your own API implemented on + the server-side, and access that from the browser or elsewhere. The + possibilities are just about endless. + +
+ +
+ Your data model + + Pazpar2 does not have a preconceived model of what makes up a data + model. There are no assumption that records have specific fields or + that they are organized in any particular way. The only assumption + is that data comes packaged in a form that the software can work + with (presently, that means XML or MARC), and that you can provide + the necessary information to massage it into pazpar2's internal + record abstraction. + + + + Handling retrieval records in pazpar2 is a two-step process. First, + you decide which data elements of the source record you are + interested in, and you specify any desired massaging or combining of + elements using an XSLT stylesheet (MARC records are automatically + normalized to MARCXML before this step). If desired, you can run + multiple XSLT stylesheets in series to accomplish this, but the + output of the last one should be a representation of the record in a + schema that pazpar2 understands. + + + + The intermediate, internal representation of the record looks like + this: + + + The Shining + + King, Stephen + + ebook + + + +]]> + + As you can see, there isn't much to it. There are really only a few + important elements to this file. + + + + Elements should belong to the namespace + http://www.indexdata.com/pazpar2/1.0. If the root node contains the + attribute 'mergekey', then every record that generates the same + merge key (normalized for case differences, white space, and + truncation) will be joined into a cluster. In other words, you + decide how records are merged. If you don't include a merge key, + records are never merged. The 'metadata' elements provide the meat + of the elements -- the content. the 'type' attribute is used to + match each element against processing rules that determine what + happens to the data element next. + + + + The next processing step is the extraction of metadata from the + intermediate representation of the record. This is governed by the + 'metadata' elements in the 'service' section of the configuration + file. See for details. The metadata + in the retrieval record ultimately drives merging, sorting, ranking, + the extraction of browse facets, and display, all configurable. + +
+ +
+ Client development + + You can use pazpar2 from any environment that allows you to use + webservices. The initial goal of the software was to support + Ajax-based applications, but there literally are no limits to what + you can do. You can use pazpar2 from Javascript, Flash, Java, etc., + on the browser side, and from any development environment on the + server side, and you can pass session tokens and record IDs freely + around between these environments to build sophisticated applications. + Use your imagination. + + + + The webservice API of pazpar2 is described in detail in . + + + + In brief, you use the 'init' command to create a session, a + temporary workspace which carries information about the current + search. You start a new search using the 'search' command. Once the + search has been started, you can follow its progress using the + 'stat', 'bytarget', 'termlist', or 'show' commands. Detailed records + can be fetched using the 'record' command. + +
+
+ + + Reference + + + The material in this chapter is drawn directly from the individual + manual entries. + + + &manref; +
+ Pazpar2 @@ -31,8 +31,284 @@ DESCRIPTION - + + The pazpar2 configuration file, together with any referenced XSLT files, + govern pazpar2's behavior as a client, and control the normalization and + extraction of data elements from incoming result records, for the + purposes of merging, sorting, facet analysis, and display. + + + + The file is specified using the option -f on the pazpar2 command line. + There is not presently a way to reload the configuration file without + restarting pazpar2, although this will most likely be added some time + in the future. + + + FORMAT + + The configuration file is XML-structured. It must be valid XML. All + elements specific to pazpar2 should belong to the namespace + "http://www.indexdata.com/pazpar2/1.0" (this is assumed in the + following examples). The root element is named 'pazpar2'. Under the + root element are a number of elements which group categories of + information. The categories are described below. + + + server + + This section governs overall behavior of the client. The data + elements are described below. + + + + listen + + + Configures the webservice -- this controls how you can connect + to pazpar2 from your browser or server-side code. The + attributes 'host' and 'port' control the binding of the + server. The 'host' attribute can be used to bind the server to + a secondary IP address of your system, enabling you to run + pazpar2 on port 80 alongside a conventional web server. You + can override this setting on the command lineusing the option -h. + + + + + + proxy + + + If this item is given, pazpar2 will forward all incoming HTTP + requests that do not contain the filename 'search.pz2' to the + host and port specified using the 'host' and 'port' + attributes. This functionality is crucial if you wish to use + pazpar2 in conjunction with browser-based code (JS, Flash, + applets, etc.) which operates in a security sandbox. Such code + can only connect to the same server from which the enclosing + HTML page originated. Pazpar2s proxy functionality enables you + to host all of the main pages (plus images, CSS, etc) of your + application on a conventional webserver, while efficiently + processing webservice requests for metasearch status, results, + etc. + + + + + + service + + + This nested element controls the behavior of pazpar2 with + respect to your data model. In pazpar2, incoming records are + normalized, using XSLT, into an internal representation (see + the retrievalprofile secion. + The 'service' section controls the further processing and + extraction of data from the internal representation, primarily + through the 'metdata' sub-element. + + + + metadata + + One of these elements is required for every data element in + the internal representation of the record (see + . It governs + subsequent processing as pertains to sorting, relevance + ranking, merging, and display of data elements. It supports + the following attributes: + + + + name + + + This is the name of the data element. It is matched + against the 'type' attribute of the 'metadata' element + in the normalized record. A warning is produced if + metdata elements with an unknown name are found in the + normalized record. This name is also used to represent + data elements in the records returned by the + webservice API, and to name sort lists and browse + facets. + + + + + type + + + The type of data element. This value governs any + normalization or special processing that might take + place on an element. Possible values are 'generic' + (basic string), 'year' (a range is computed if + multiple years are found in the record). Note: This + list is likely to increase in the future. + + + + + brief + + + If this is set to 'yes', then the data element is + includes in brief records in the webservice API. Note + that this only makes sense for metadata elements that + are merged (see below). The default value is 'no'. + + + + + sortkey + + + Specifies that this data element is to be used for + sorting. The possible values are 'numeric' (numeric + value), 'skiparticle' (string; skip common, leading + articles), and 'no' (no sorting). The default value is + 'no'. + + + + + rank + + + Specifies that this element is to be used to help rank + records against the user's query (when ranking is + requested). The value is an integer, used as a + multiplier against the basic TF*IDF score. A value of + 1 is the base, higher values give additional weight to + elements of this type. The default is '0', which + excludes this element from the rank calculation. + + + + + termlist + + + Specifies that this element is to be used as a + termlist, or browse facet. Values are tabulated from + incoming records, and a highscore of values (with + their associated frequency) is made available to the + client through the webservice API. The possible values + are 'yes' and 'no' (default). + + + + + merge + + + This governs whether, and how elements are extracted + from individual records and merged into cluster + records. The possible values are: 'unique' (include + all unique elements), 'longest' (include only the + longest element (strlen), 'range' (calculate a range + of values across al matching records), 'all' (include + all elements), or 'no' (don't merge; this is the + default); + + + + + + + + + + + + + + At the moment, this directive is ignored; there is one global + CCL-mapping file which governs the mapping of queries to Z39.50 + type-1. This file is located in etc/default.bib. This will change + shortly. + + + + + + Note: In the present version, there is a single retrieval + profile. However, in a future release, it will be possible to + associate unique retrieval profiles with different targets, or to + generate retrieval profiles using XSLT from the ZeeRex description of + a target. + + + + The following data elements are recognized for the retrievalprofile + directive: + + + + requestsyntax + + + This element specifies the request syntax to be used in queries. It only + makes sense for Z39.50-type targets. + + + + + nativesyntax + + + This element specifies the native syntax and encoding of the + result records. The default is XML. The following attributes + are defined: + + + name + + + The name of the syntax. Currently recognized values are + 'iso2709' (MARC), and 'xml'. + + + + + format + + + The format, or schema, to be expected. Default is + 'marc21'. + + + + + encoding + + + The encoding of the response record. Typical values for + MARC records are 'marc8' (general MARC-8), 'marc8s' + (MARC-8, but maps to precomposed UTF-8 characters, more + suitable for use in web browsers), 'latin1'. + + + + + mapto + + + Specifies the flavor of MARCXML to map results to. + Default is 'marcxml'. 'marcxchange' is also possible, and + useful for Danish DANMARC records. + + + + + + + + + + OPTIONS diff --git a/doc/pazpar2_protocol.xml b/doc/pazpar2_protocol.xml index 537d98d..404f6c3 100644 --- a/doc/pazpar2_protocol.xml +++ b/doc/pazpar2_protocol.xml @@ -8,7 +8,7 @@ %common; ]> - + Pazpar2 @@ -27,12 +27,13 @@ DESCRIPTION Webservice requests are any that refer to filename "search.pz2". Arguments - are GET-style parameters. Argument 'command' is required and specifies - command. Any request not recognized as a webservice request as described, - are forwarded to the HTTP server specified in configuration. - This way, the webserver can host the user interface (itself dynamic - or static HTML), and AJAX-style calls can be used from JS to interact - with the search logic. + are GET-style parameters. Argument 'command' is always required and specifies + the operation to perform. Any request not recognized as a webservice + request is forwarded to the HTTP server specified in the configuration + using the proxy setting. + This way, a regular webserver can host the user interface (itself dynamic + or static HTML), and AJAX-style calls can be used from JS (or any other client-based + scripting environment) to interact with the search logic in pazpar2. Each command is described in sub sections to follow. @@ -108,7 +109,7 @@ Example: Response: @@ -123,7 +124,7 @@ search.pz2?session=2044502273&command=search&query=computer stat - Provides status of ongoing search. Parameters: + Provides status information about an ongoing search. Parameters: @@ -147,7 +148,7 @@ search.pz2?session=2044502273&command=stat 3 7 -- Total hitcount - 7 -- Total number of records fetched + 7 -- Total number of records fetched in last query 1 -- Total number of associated clients 0 -- Number of disconnected clients 0 -- Number of clients in connecting state @@ -180,7 +181,7 @@ search.pz2?session=2044502273&command=stat start First record to show - 0-indexed. - + @@ -196,33 +197,47 @@ search.pz2?session=2044502273&command=stat block - If block is set, the command will hang until there are records ready + If block is set to 1, the command will hang until there are records ready to display. Use this to show first records rapidly without requiring rapid polling. + + sort + + + Specifies sort criteria. The argument is a comma-separated list + (no whitespace allowed) of sort fields, with the highest-priority + field first. A sort field may be followed by a colon followed by + the number '0' or '1', indicating whether results should be sorted in + increasing or decreasing order according to that field. 0==Decreasing is + the default. + + + + Example: Output: OK - 3 - 6 - 7 - 0 - 2 + 3 -- How many clients are still working + 6 -- Number of merged records + 7 -- Total of all hitcounts + 0 -- The start number you requested + 2 -- Number of records retrieved How to program a computer, by Jack Collins - 2 - 6 + 2 -- Number of merged records + 6 -- Record ID for this record @@ -243,6 +258,15 @@ search.pz2?session=2044502273&command=show&start=0&num=2 + session + + + Session ID + + + + + id @@ -326,14 +350,61 @@ Output: library2.mcmaster.ca - 11734 - Client_Idle - 0 + 11734 -- Number of hits + Client_Idle -- See the description of 'bytarget' below + 0 -- Z39.50 diagnostic codes ]]> + + + bytarget + + Returns information about the status of each active client. Parameters: + + + + session + + + Session Id. + + + + + + + Example: + + + Example output: + + + OK + + z3950.loc.gov/voyager/ + 10000 + 0 + 65 + Client_Presenting + + + + + + The following client states are defined: Client_Connecting, + Client_Connected, Client_Idle, Client_Initializing, Client_Searching, + Client_Searching, Client_Presenting, Client_Error, Client_Failed, + Client_Disconnected, Client_Stopped. + + +