[Yazlist] yaz-marcdump option to convert MARC-8 to "combined UTF-8"

Tim Scott Tim.Scott at oclc.org
Fri Dec 14 17:31:34 CET 2007


Thank you for that Larry. I can now construct the command line to
theoretically convert the MARCXML back to ISO2709, but when I tried with
my 3 record file, it failed.

My commands are:-

  yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml -l 9=97 !src! >>!xml!
2>!rpt!
  yaz-marcdump -v -i marcxml -o marc !xml! >!dst! 2>>!rpt!

Looking at the "XML", it is not XML because it does not have an
encapsulating tag, instead it repeats the 'record' tag. Looking at the
Schema* it would appear that the file should be enclosed in
<collection>..</collection>.

So, I added commands to encapsulate the file accordingly:-
  echo ^<collection^>>!xml!
  yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml -l 9=97 !src! >>!xml!
2>!rpt!
  echo ^</collection^>>!xml!

but all I get from yaz-marcdump is:
	yaz_marc_read_xml failed

It would appear that yaz-marcdump does not write or read the
<collection> tag but without it I get, quite rightly:-
	<filename>:<line>: parser error : Extra content at the end of
the document
	<record xmlns="http://www.loc.gov/MARC21/slim">

Can anyone offer advice ?

Thanks very much in advance and Happy Holidays to everybody.

Regards,
Tim

* http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd

On Thu, 13 Dec 2007, Larry Dixson wrote:

Tim,
I have a meeting in a few minutes, but attached is a more recent version
of the yaz-marcdump man page.  This will help to answer your question
about option -l (in the case you cited -- changing
Leader/09 to an "a" (decimal 97).  You will also see the possible values
for option -o.

Hope that's somewhat helpful.
Larry

On Thu, 13 Dec 2007, Tim Scott wrote:

> Hi,
>  
> I'm wondering if there's a way to use yaz-marcdump to produce from a
> MARC-8 ISO2709 file a UTF-8 encoded MARC21 file without the diacritics

> simply becoming combining characters?
>  
> As I wrote this, I thought maybe by using an intermediate XML file and

> then some other post-processor, and then reproducing the ISO2709
again.
>  
> Off I hunted and found 'charlint.pl' and 'UnicodeData.txt'
>     http://dev.w3.org/cvsweb/charlint/
>     ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
>         respectivley...
>  
> .. but found that charlint.pl complained about the UnicodeData.txt
> file:-
> [snip]
> Reading data file, line 9000
> Reading data file, line 10000
> Problem with data file consistency, line 10478:
>         9FBB;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;.
> ...
>  
> A quick search on Google was particularly fruitless, but I'm off to 
> try harder tomorrow.
>  
> I then wondered how I get the ISO2709 back again from the XML result, 
> so I tried converting the marcxml that yaz-marcdump produced, eg:
>     yaz-marcdump -f MARC-8 -t UTF-8 -o marcxml -l 9=97 iso2709file
> >xmlfile
>         ... xmlfile looks OK
>     yaz-marcdump -o marc -l 9=97 xmlfile >output
>  
> I've no idea what the "-l 9=97" does, I got this command from another 
> forum.
>  
> The best manual page I could find for yaz-marcdump was on a French 
> site
> at:
>     http://pwet.fr/man/linux/commandes/yaz_marcdump
>  
> .. and it doesn't appear to give me the answer.
>  
> Is there a man page or something that would give me the options for 
> yaz-marcdump to achieve either the whole thing or just the last XML 2
> ISO2709 part ?
>  
> <OT> Has anyone got a working UnicodeData.txt [link] ? </OT>
>  
> Thanks,
> Tim
>  
> cc: Data Exchange file


------------------------------------------------------------
Larry E. Dixson                    Internet:    ldix at loc.gov
Network Development and MARC
   Standards Office, LA327
Library of Congress                Telephone: (202) 707-5807
Washington, D.C.  20540-4402       Fax:       (202) 707-0115



More information about the Yazlist mailing list