[Yazlist] yaz-marcdump option to convert MARC-8 to "combined UTF-8"

Larry E. Dixson ldix at loc.gov
Thu Dec 13 19:36:24 CET 2007

I have a meeting in a few minutes, but attached is a more recent
version of the yaz-marcdump man page.  This will help to answer
your question about option -l (in the case you cited -- changing
Leader/09 to an "a" (decimal 97).  You will also see the possible
values for option -o.

Hope that's somewhat helpful.

On Thu, 13 Dec 2007, Tim Scott wrote:

> Hi,
> I'm wondering if there's a way to use yaz-marcdump to produce from a
> MARC-8 ISO2709 file a UTF-8 encoded MARC21 file without the diacritics
> simply becoming combining characters?
> As I wrote this, I thought maybe by using an intermediate XML file and
> then some other post-processor, and then reproducing the ISO2709 again.
> Off I hunted and found 'charlint.pl' and 'UnicodeData.txt'
>     http://dev.w3.org/cvsweb/charlint/
>     ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
>         respectivley...
> .. but found that charlint.pl complained about the UnicodeData.txt
> file:-
> [snip]
> Reading data file, line 9000
> Reading data file, line 10000
> Problem with data file consistency, line 10478:
>         9FBB;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;.
> ...
> A quick search on Google was particularly fruitless, but I'm off to try
> harder tomorrow.
> I then wondered how I get the ISO2709 back again from the XML result, so
> I tried converting the marcxml that yaz-marcdump produced, eg:
>     yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml -l 9=97 iso2709file
> >xmlfile
>         ... xmlfile looks OK
>     yaz-marcdump -o marc -l 9=97 xmlfile >output
> I've no idea what the "-l 9=97" does, I got this command from another
> forum.
> The best manual page I could find for yaz-marcdump was on a French site
> at:
>     http://pwet.fr/man/linux/commandes/yaz_marcdump
> .. and it doesn't appear to give me the answer.
> Is there a man page or something that would give me the options for
> yaz-marcdump to achieve either the whole thing or just the last XML 2
> ISO2709 part ?
> <OT> Has anyone got a working UnicodeData.txt [link] ? </OT>
> Thanks,
> Tim
> cc: Data Exchange file

yaz-marcdump - MARC record dump utility


	yaz-marcdump - MARC record dump utility


	yaz-marcdump [-i format] [-o format] [-f from] [-t to] [-l spec] [-v] [-c cfile] [file...]


	yaz-marcdump reads MARC records from one or more files.  It parses each record and supports

	output in line-format, ISO2709, MARCXML, MarcXchange as well as Hex output.

	This utility parses records ISO2709(raw MARC) as well as XML if that is structured as



	As of YAZ 2.1.18, OAI-MARC is no longer supported.  OAI-MARC is deprecated. Use MARCXML instead.  

	By default, each record is written to standard output in a line format with newline for 

	each field, $x for each subfield x.  The output format may be changed with option -o,

	yaz-marcdump can also be requested to perform character set conversion of each record.


	-i format

		Specifies input format. Must be one of marcxml, marc (ISO2709), line (line mode MARC).

	-o format

		Specifies output format. Must be one of marcxml, marc (ISO2709), line (line mode


	-f from 

		Specify the character set from of the input MARC record.  Should be used in conjunction with option -t.

	-t to

		Specify the character set of the output.  Should be used in conjunction with option


	-l leaderspec

		Specify a simple modification string for MARC leader. The leaderspec is a list of

		pos=value pairs, where pos is an integer offset (0 - 23) for leader. Value is either

		a quoted string or an integer (character value in decimal).  Pairs are comma

		separated. For example, to set leader at offset 9 to a, use 9=a.


		Writes more information about the parsing process.  Useful if you have ill-formatted

		ISO2709 records as input.


	The following command converts MARC21/USMARC in MARC-8 encoding to MARC21/USMARC in UTF-8 encoding. Leader offset 9 is set to 'a'.  Both input and output records are ISO2709 encoded.

		yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw

		yaz-marcdump -f UTF-8 -t MARC-8 -o marc -l 9=32 utf8.raw >marc21.raw

	The same records may be converted to MARCXML instead in UTF-8:

		yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml marc21.raw >marcxml.xml





