1 <chapter id="proxy-reference">
2 <title>Proxy Reference</title>
3 <section id="proxy-operation">
4 <title>Operating Environment</title>
6 The YAZ proxy is a single program. After startup it spawns
7 a child process (except on Windows or if option -X is given).
8 The child process is the core of the proxy and it handles all
9 communication with clients and servers. The parent process
10 will restart the child process if it dies unexpectedly and report
11 the reason. For options for YAZ proxy,
12 see <xref linkend="proxy-usage"/>.
15 As an option the proxy may change user identity to a less priviledged
19 <section id="proxy-target">
20 <title>Specifying the Backend Server</title>
22 When the proxy receives a Z39.50 Initialize Request from a Z39.50
23 client, it determines the backend server by the following rules:
26 <para>If the <literal>InitializeRequest</literal> PDU from the
28 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
30 <literal>1.2.840.10003.10.1000.81.1</literal>, then the
31 contents of that element specify the server to be used, in the
32 usual YAZ address format (typically
33 <literal>tcp:<parameter>hostname</parameter>:<parameter>port</parameter></literal>)
35 <ulink url="http://www.indexdata.dk/yaz/doc/comstack.addresses.tkl"
36 >the Addresses section of the YAZ manual</ulink>.
41 <para>Otherwise, the Proxy uses the default server, if one was
42 specified in the proxy configuration file. See
43 <xref linkend="proxy-config-target"/>.
48 <para>Otherwise, the Proxy uses the default server, if one was
49 specified on the command-line with the <literal>-t</literal>
54 <para>Otherwise, the proxy closes the connection with
61 <section id="proxy-keepalive">
62 <title>Keep-alive Facility</title>
64 The keep-alive is a facility where the proxy keeps the connection to the
65 backend server - even if the client closes the connection to the proxy.
68 If a new or another client connects to the proxy again and requests the
69 same backend it will be reassigned to this backend. In this case, the
70 proxy sends an initialize response directly to the client and an
71 initialize handshake with the backend is omitted.
74 When a client reconnects, query and record caching works better, if the
75 proxy assigns it to the same backend as before. And the result set
76 (if any) is re-used. To achieve this, Index Data defined a session
77 cookie which identifies the backend session.
80 The cookie is defined by the client and is sent as part of the
81 Initialize Request and passed in an
82 <link linkend="otherinfo-encoding"><literal>otherInfo</literal></link>
83 element with OID <literal>1.2.840.10003.10.1000.81.2</literal>.
86 Clients that do not send a cookie as part of the initialize request
87 may still better performance, since the init handshake is saved.
90 Refer to <xref linkend="proxy-config-keepalive"/> on how to setup
91 configuration parameters for keepalive.
95 <section id="proxy-config-file">
96 <title>Proxy Configuration File</title>
98 The Proxy may read a configuration file using option
99 <literal>-c</literal> followed by the filename of a config file.
102 The config file is XML based. The YAZ proxy must be compiled
103 with <ulink url="http://www.xmlsoft.org/">libxml2</ulink> and
104 <ulink url="http://xmlsoft.org/XSLT/">libXSLT</ulink> support in
105 order for the config file facility to be enabled.
108 <para>To check for a config file to be well-formed, the yazproxy may
109 be invoked without specifying a listening port, i.e.
111 yazproxy -c myconfig.xml
113 If this does not produce errors, the file is well-formed.
116 <section id="proxy-config-header">
117 <title>Proxy Configuration Header</title>
119 The proxy config file must have a root element called
120 <literal>proxy</literal>. All information except an optional XML
121 header must be stored within the <literal>proxy</literal> element.
124 <?xml version="1.0"?>
126 <!-- content here .. -->
130 <section id="proxy-config-target">
131 <title>target</title>
133 The element <literal>target</literal> which may be repeated zero
134 or more times with parent element <literal>proxy</literal> contains
135 information about each backend target.
136 The <literal>target</literal> element have two attributes:
137 <literal>name</literal> which holds the logical name of the backend
138 target (required) and <literal>default</literal> (optional) which
139 (when given) specifies that the backend target is the default target -
140 equivalent to command line option <literal>-t</literal>.
144 <?xml version="1.0"?>
146 <target name="server1" default="1">
147 <!-- description of server1 .. -->
149 <target name="server2">
150 <!-- description of server2 .. -->
156 <section id="proxy-config-url">
159 The <literal>url</literal> which may be repeated one or more times
160 should be the child of the <literal>target</literal> element.
161 The CDATA of <literal>url</literal> is the Z-URL of the backend.
164 Multiple <literal>url</literal> element may be used. In that case, then
165 a client initiates a session, the proxy chooses the URL with the lowest
166 number of active sessions, thereby distributing the load. It is
167 assumed that each URL represents the same database (data).
171 <section id="proxy-config-target-timeout">
172 <title>target-timeout</title>
174 The element <literal>target-timeout</literal> is the child of element
175 <literal>target</literal> and specifies the amount in seconds before
176 a target session is shut down.
179 This can also be specified on the command line by using option
180 <literal>-T</literal>. Refer to OPTIONS.
184 <section id="proxy-config-client-timeout">
185 <title>client-timeout</title>
187 The element <literal>client-timeout</literal> is the child of element
188 <literal>target</literal> and specifies the amount in seconds before
189 a client session is shut down.
192 This can also be specified on the command line by using option
193 <literal>-i</literal>. Refer to OPTIONS.
197 <section id="proxy-config-keepalive">
198 <title>keepalive</title>
199 <para>The <literal>keepalive</literal> element holds information about
200 the keepalive Z39.50 sessions. Keepalive sessions are proxy-to-backend
201 sessions that is no longer associated with a client session.
203 <para>The <literal>keepalive</literal> element which is the child of
204 the <literal>target</literal>holds two elements:
205 <literal>bandwidth</literal> and <literal>pdu</literal>.
206 The <literal>bandwidth</literal> is the maximum total bytes
207 transferred to/from the target. If a target session exceeds this
208 limit, it is shut down (and no longer kept alive).
209 The <literal>pdu</literal> is the maximum number of requests sent
210 to the target. If a target session exceeds this limit, it is
211 shut down. The idea of these two limits is that avoid very long
212 sessions that use resources in a backend (that leaks!).
215 The following sets maximum number of bytes transferred in a
216 target session to 1 MB and maxinum of requests to 400.
219 <bandwidth>1048576</bandwidth>
220 <retrieve>400</retrieve>
225 <section id="proxy-config-limit">
228 The <literal>limit</literal> section specifies bandwidth/pdu requests
229 limits for an active session.
230 The proxy records bandwidth/pdu requests during the last 60 seconds
231 (1 minute). The <literal>limit</literal> may include the
232 elements <literal>bandwidth</literal>, <literal>pdu</literal>,
233 and <literal>retrieve</literal>. The <literal>bandwidth</literal>
234 measures the number of bytes transferred within the last minute.
235 The <literal>pdu</literal> is the number of requests in the last
236 minute. The <literal>retrieve</literal> holds the maximum records to
237 be retrieved in one Present Request.
240 If a bandwidth/pdu limit is reached the proxy will postpone the
241 requests to the target and wait one or more seconds. The idea of the
242 limit is to ensure that clients that downloads hundreds or thousands of
243 records do not hurt other users.
246 The following sets maximum number of bytes transferred per minute to
247 500Kbytes and maximum number of requests to 40.
250 <bandwidth>524288</bandwidth>
251 <retrieve>40</retrieve>
257 Typically the limits for keepalive are much higher than
258 those for session minute average.
263 <section id="proxy-config-attribute">
264 <title>attribute</title>
266 The <literal>attribute</literal> element specifies accept or reject
267 or a particular attribute type, value pair.
268 Well-behaving targets will reject unsupported attributes on their
269 own. This feature is useful for targets that do not gracefully
270 handle unsupported attributes.
273 Attribute elements may be repeated. The proxy inspects the attribute
274 specifications in the order as specified in the configuration file.
275 When a given attribute specification matches a given attribute list
276 in a query, the proxy takes appropriate action (reject, accept).
279 If no attribute specifications matches the attribute list in a query,
283 The <literal>attribute</literal> element has two required attributes:
284 <literal>type</literal> which is the Attribute Type-1 type, and
285 <literal>value</literal> which is the Attribute Type-1 value.
286 The special value/type <literal>*</literal> matches any attribute
287 type/value. A value may also be specified as a list with each
288 value separated by comma, a value may also be specified as a
289 list: low value - dash - high value.
292 If attribute <literal>error</literal> is given, that holds a
293 Bib-1 diagnostic which is sent to the client if the particular
294 type, value is part of a query.
297 If attribute <literal>error</literal> is not given, the attribute
298 type, value is accepted and passed to the backend target.
301 A target that supports use attributes 1,4, 1000 through 1003 and
302 no other use attributes, could use the following rules:
304 <attribute type="1" value="1,4,1000-1003">
305 <attribute type="1" value="*" error="114"/>
313 <syntax type="xml" marcxml="1" stylesheet="MARC21slim2MODS.xsl"
314 identifier="http://www.loc.gov/mods"
316 <title>MODS v2</title>
322 <section id="proxy-config-syntax">
323 <title>syntax</title>
325 The <literal>syntax</literal> element specifies accept or reject
326 or a particular record syntax request from the client.
329 The <literal>syntax</literal> has one required attribute:
330 <literal>type</literal> which is the Preferred Record Syntax.
333 If attribute <literal>error</literal> is given, that holds a
334 Bib-1 diagnostic which is sent to the client if the particular
335 record syntax is part of a present - or search request.
338 If attribute <literal>error</literal> is not given, the record syntax
339 is accepted and passed to the backend target.
342 If attribute <literal>marcxml</literal> is given, the proxy will
343 perform MARC21 to MARCXML conversion. In this case the
344 <literal>type</literal> should be XML. The proxy will use
345 preferred record syntax USMARC/MARC21 against the backend target.
347 <para>To accept USMARC and offer MARCXML XML records but reject
348 all other requests the following configuration could be used:
351 <target name="mytarget">
352 <syntax type="usmarc"/>
353 <syntax type="xml" marcxml="1"/>
354 <syntax type="*" error="238"/>
361 <section id="proxy-config-explain">
362 <title>explain</title>
364 The <literal>explain</literal> element includes Explain information
365 for SRW/SRU about the server in the target section. This
366 information must have a <literal>serverInfo</literal> element
367 with a database that this target must be available as (URL path).
370 <explain xmlns="http://explain.z3950.org/dtd/2.0/">
372 <host>myhost.org</host>
374 <database>mydatabase</database>
376 <!-- remaining Explain stuff -->
380 In the above case, the SRW/SRU service is available as
381 <literal>http://myhost.org:8000/mydatabase</literal>.
386 <section id="proxy-config-cql2rpn">
387 <title>cql2rpn</title>
389 The CDATA of <literal>cql2rpn</literal> refers to CQL to a RPN conversion
390 file - for the server in the target section. This element
391 is required for SRW/SRU searches to operate against a Z39.50
392 server that doesn't support CQL. Most Z39.50 servers only support
393 Type-1/RPN so this is usually required.
394 See YAZ documentation for more information about the
395 <ulink url="http://indexdata.dk/yaz/doc/tools.tkl#tools.cql.pqf">CQL
396 to PQF</ulink> conversion. See also the
397 <filename>pqf.properties</filename> in the <filename>etc</filename>
398 (or <replaceable>prefix/share/yazproxy</replaceable>)
399 directory of the YAZ proxy.
403 <section id="proxy-config-preinit">
404 <title>preinit</title>
406 The element <literal>preinit</literal> is the child of element
407 <literal>target</literal> and specifies the number of spare
408 connection to a target. By default no spare connection are
409 created by the proxy. If the proxy uses a target exclusive or
410 a lot, the preinit session will ensure that target sessions
411 have been made before the client makes a connection and will therefore
412 reduce the connect-init handshake dramatically. Never set this to
417 <section id="proxy-config-max-clients">
418 <title>max-clients</title>
420 The element <literal>max-clients</literal> is the child of element
421 <literal>proxy</literal> and specifies the total number of
422 allowed connections to targets (all targets). If this limit
423 is reached the proxy will close the least recently used connection.
426 Note, that many Unix systems impose a system on the number of
427 open files allowed in a single process, typically in the
428 range 256 (Solaris) to 1024 (Linux).
429 The proxy uses 2 sockets per session + a few files
430 for logging. As a rule of thumb, ensure that 2*max-clients + 5
431 can be opened by the proxy process.
435 Using the <ulink url="http://www.gnu.org/software/bash/bash.html">
436 bash</ulink> shell, you can set the limit with
437 <literal>ulimit -n</literal><replaceable>no</replaceable>.
438 Use <literal>ulimit -a</literal> to display limits.
443 <section id="proxy-config-log">
446 The element <literal>log</literal> is the child of element
447 <literal>proxy</literal> and specifies what to be logged by the
451 Specify the log file with command-line option <literal>-l</literal>.
454 The text of the <literal>log</literal> element is a sequence of
455 options separated by white space. See the table below:
456 <table frame="top"><title>Logging options</title>
458 <colspec colwidth="1*"/>
459 <colspec colwidth="2*"/><thead>
461 <entry>Option</entry>
462 <entry>Description</entry>
467 <entry><literal>client-apdu</literal></entry>
469 Log APDUs as reported by YAZ for the
470 communication between the client and the proxy.
471 This facility is equivalent to the APDU logging that
472 happens when using option <literal>-a</literal>, however
473 this tells the proxy to log in the same file as given
474 by <literal>-l</literal>.
478 <entry><literal>server-apdu</literal></entry>
480 Log APDUs as reported by YAZ for the
481 communication between the proxy and the server (backend).
485 <entry><literal>clients-requests</literal></entry>
487 Log a brief description about requests transferred between
488 the client and the proxy. The name of the request and the size
489 of the APDU is logged.
493 <entry><literal>server-requests</literal></entry>
495 Log a brief description about requests transferred between
496 the proxy and the server (backend). The name of the request
497 and the size of the APDU is logged.
505 To log communication in details between the proxy and the backend, th
506 following configuration could be used:
508 <target name="mytarget">
509 <log>server-apdu server-requests</log>
517 <section id="query-cache">
518 <title>Query Caching</title>
520 Simple stateless clients often send identical Z39.50 searches
521 in a relatively short period of time (e.g. in order to produce a
522 results-list page, the next page,
523 a single full-record, etc). And for many targets, it's
524 much more expensive to produce a new result set than to
525 reuse an existing one.
528 The proxy tries to solve that by remembering the last query for each
529 backend target, so that if an identical query is received next, it
530 is turned into Present Requests rather than new Search Requests.
534 In a future we release will will probably allows for
535 an arbitrary-sized cache for targets supporting named result sets.
539 You can enable/disable query caching using option -o.
543 <section id="record-cache">
544 <title>Record Caching</title>
546 As an option, the proxy may also cache result set records for the
548 The proxy takes into account the Record Syntax and CompSpec.
549 The CompSpec includes simple element set names as well.
550 By default the cache is 200000 bytes per session.
554 <section id="query-validation">
555 <title>Query Validation</title>
557 The Proxy may also be configured to trap particular attributes in
558 Type-1 queries and send Bib-1 diagnostics back to the client without
559 even consulting the backend target. This facility may be useful if
560 a target does not properly issue diagnostics when unsupported attributes
565 <section id="record-validation">
566 <title>Record Syntax Validation</title>
568 The proxy may be configured to accept, reject or convert records.
569 When accepted, the target passes search/present requests to the
570 backend target under the assumption that the target can honor the
571 request (In fact it may not do that). When a record is rejected because
572 the record syntax is "unsupported" the proxy returns a diagnostic to the
573 client. Finally, the proxy may convert records.
576 The proxy can convert from MARC to MARCXML and thereby offer an
577 XML version of any MARC record as long as it is ISO2709 encoded.
578 If the proxy is compiled with libXSLT support it can also
583 <section id="other-optimizations">
584 <title>Other Optimizations</title>
586 We've had some plans to support global caching of result set records,
587 but this has not yet been implemented.
591 <section id="proxy-usage">
592 <title>Proxy Usage (man page)</title>
593 <refentry id="yazproxy-man">
598 <section id="otherinfo-encoding">
599 <title>OtherInformation Encoding</title>
601 The proxy uses the OtherInformation definition to carry
602 information about the target address and cookie.
605 OtherInformation ::= [201] IMPLICIT SEQUENCE OF SEQUENCE{
606 category [1] IMPLICIT InfoCategory OPTIONAL,
608 characterInfo [2] IMPLICIT InternationalString,
609 binaryInfo [3] IMPLICIT OCTET STRING,
610 externallyDefinedInfo [4] IMPLICIT EXTERNAL,
611 oid [5] IMPLICIT OBJECT IDENTIFIER}}
613 InfoCategory ::= SEQUENCE{
614 categoryTypeId [1] IMPLICIT OBJECT IDENTIFIER OPTIONAL,
615 categoryValue [2] IMPLICIT INTEGER}
618 The <literal>categoryTypeId</literal> is either
619 OID 1.2.840.10003.10.1000.81.1, 1.2.840.10003.10.1000.81.2
620 for proxy target and proxy cookie respectively. The
621 integer element <literal>category</literal> is set to 0.
622 The value proxy and cookie is stored in element
623 <literal>characterInfo</literal> of the <literal>information</literal>
629 <!-- Keep this comment at the end of the file
634 sgml-minimize-attributes:nil
635 sgml-always-quote-attributes:t
638 sgml-parent-document: "yazproxy.xml"
639 sgml-local-catalogs: nil
640 sgml-namecase-general:t