[Yazlist] yaz, XML and III's Z39.50 server
Mike Taylor
mike at miketaylor.org.uk
Wed Dec 14 16:31:33 CET 2005
> Date: Wed, 14 Dec 2005 09:26:57 -0500
> From: Godmar Back <godmar at gmail.com>
>
> I'm using yaz to access III's Millenium Z39.50 server, which,
> according to III's documentation, supports the retrieval of Z39.50
> in XML format.
"Smeagol lied."
The documentatio may _say_ this; but it's evidently making promises
that the software can't keep.
> This XML is not well-formed, because:
> - the opening dublin-core-simple element is missing.
> - characters such as < > are not escaped as entities within say the
> <title> element
> - (you can't see this:) there are \037 characters inside the results.
Yes.
> - am I correct that III's Z39.50 server is to blame for these
> mistakes and that the yaz tools simply reflect their ill-formed XML?
Exactly.
> Or is it possible that yaz misparses some of their results?
No. YAZ doesn't parse XML at all, just as it doesn't parse HTML, MARC
or GIF images. All of these are, from the perspective of the Z39.50
protocol that YAZ implements, just opaque streams of bytes.
For some record-types, such as MARC, yaz-client (the application) does
go on to parse the opaque chunks that YAZ (the Z39.50 toolkit) has
given it: that's how yaz-client displays MARC records. But for XML
records, it doesn't do that, it just emits whatever it's been given.
> - if III is to blame, what options does one have to extract XML from
> III"s systems using yaz tools?
In the general case, it can't be done. This server is equivalent to
shop that has a sign outside saying "We sell baked beans". You go
inside and buy a tin of beans, but when you get home and open it, you
find it's full of stewed prunes. No amount of post-processing will
reliably turn prunes into beans.
You _may_ be able to come up with some heuristic hacks that work, more
or less, most of the time, if you'er lucky. But the only real
solution is to get III to fix their server. Sorry.
_/|_ ___________________________________________________________________
/o ) \/ Mike Taylor <mike at miketaylor.org.uk> http://www.miketaylor.org.uk
)_v__/\ Taylor's law of Programming Probability: the theoretical
possibility of a catastrophic occurrence in your program can be
ignored if it's less likely than the entire installation being
wiped out by meteor strike.
More information about the Yazlist
mailing list