[Yazlist] yaz-marcdump option to convert MARC-8 to "combined UTF-8"
Larry E. Dixson
ldix at loc.gov
Thu Dec 13 19:36:24 CET 2007
I have a meeting in a few minutes, but attached is a more recent
version of the yaz-marcdump man page. This will help to answer
your question about option -l (in the case you cited -- changing
Leader/09 to an "a" (decimal 97). You will also see the possible
values for option -o.
Hope that's somewhat helpful.
On Thu, 13 Dec 2007, Tim Scott wrote:
> I'm wondering if there's a way to use yaz-marcdump to produce from a
> MARC-8 ISO2709 file a UTF-8 encoded MARC21 file without the diacritics
> simply becoming combining characters?
> As I wrote this, I thought maybe by using an intermediate XML file and
> then some other post-processor, and then reproducing the ISO2709 again.
> Off I hunted and found 'charlint.pl' and 'UnicodeData.txt'
> .. but found that charlint.pl complained about the UnicodeData.txt
> Reading data file, line 9000
> Reading data file, line 10000
> Problem with data file consistency, line 10478:
> 9FBB;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;.
> A quick search on Google was particularly fruitless, but I'm off to try
> harder tomorrow.
> I then wondered how I get the ISO2709 back again from the XML result, so
> I tried converting the marcxml that yaz-marcdump produced, eg:
> yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml -l 9=97 iso2709file
> ... xmlfile looks OK
> yaz-marcdump -o marc -l 9=97 xmlfile >output
> I've no idea what the "-l 9=97" does, I got this command from another
> The best manual page I could find for yaz-marcdump was on a French site
> .. and it doesn't appear to give me the answer.
> Is there a man page or something that would give me the options for
> yaz-marcdump to achieve either the whole thing or just the last XML 2
> ISO2709 part ?
> <OT> Has anyone got a working UnicodeData.txt [link] ? </OT>
> cc: Data Exchange file
Larry E. Dixson Internet: ldix at loc.gov
Network Development and MARC
Standards Office, LA327
Library of Congress Telephone: (202) 707-5807
Washington, D.C. 20540-4402 Fax: (202) 707-0115
-------------- next part --------------
yaz-marcdump - MARC record dump utility
yaz-marcdump [-i format] [-o format] [-f from] [-t to] [-l spec] [-v] [-c cfile] [file...]
yaz-marcdump reads MARC records from one or more files. It parses each record and supports
output in line-format, ISO2709, MARCXML, MarcXchange as well as Hex output.
This utility parses records ISO2709(raw MARC) as well as XML if that is structured as
As of YAZ 2.1.18, OAI-MARC is no longer supported. OAI-MARC is deprecated. Use MARCXML instead.
By default, each record is written to standard output in a line format with newline for
each field, $x for each subfield x. The output format may be changed with option -o,
yaz-marcdump can also be requested to perform character set conversion of each record.
Specifies input format. Must be one of marcxml, marc (ISO2709), line (line mode MARC).
Specifies output format. Must be one of marcxml, marc (ISO2709), line (line mode
Specify the character set from of the input MARC record. Should be used in conjunction with option -t.
Specify the character set of the output. Should be used in conjunction with option
Specify a simple modification string for MARC leader. The leaderspec is a list of
pos=value pairs, where pos is an integer offset (0 - 23) for leader. Value is either
a quoted string or an integer (character value in decimal). Pairs are comma
separated. For example, to set leader at offset 9 to a, use 9=a.
Writes more information about the parsing process. Useful if you have ill-formatted
ISO2709 records as input.
The following command converts MARC21/USMARC in MARC-8 encoding to MARC21/USMARC in UTF-8 encoding. Leader offset 9 is set to 'a'. Both input and output records are ISO2709 encoded.
yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw
yaz-marcdump -f UTF-8 -t MARC-8 -o marc -l 9=32 utf8.raw >marc21.raw
The same records may be converted to MARCXML instead in UTF-8:
yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml marc21.raw >marcxml.xml
More information about the Yazlist