[Yazlist] Error in latest SICONV.C

Adam Dickmeiss adam at indexdata.dk
Thu Mar 22 19:10:45 CET 2007


Gary Anderson wrote:
> This is a different philosophy that I picked up from your earlier 
> emails.  Those indicated that I shouldn't be submitting subfield marks 
> and field marks to the converter.  The MARC-21 standard specifies that 
> fixed field and subfield strings are self-contained and ESC sequences in 
> MARC8 encoding don't cross field or subfield boundaries.  That is why I 
> suggested the fix the way I did.  Now you seem to be saying that we can 
> and should run the data in whatever size chunks we want until an EOF is 
EOF would be end of data field too.
> reached.  So, rather than relying on the library to give me back 
> self-contained data in conversions to and from MARC8, I have two 
> choices.  First, I can append a period to the end of every string I 
> convert and then kill it when the string comes back (that works, by the 
> way) or I can make a convert call then another convert call to flush the 
> buffers.  That seems a little more cumbersome and time consuming.
If two calls to iconv bothers you, make a wrapper or just change the 
source of YAZ.

/ Adam

> 
> Gary
> 
> Adam Dickmeiss wrote:
> 
>> Gary Anderson wrote:
>>
>>> Adam,
>>>
>>> I've run through the latest siconv.c from your cvs repository.  It 
>>> now has additional problems with converting UTF8 to MARC8.  I ran a 
>>> simplified test.  The input string was (hex bytes, ignore spaces):
>>>
>>> 61 E8 87 BA  E7 81 A3  E5 BE 97
>>>
>>> The output string was:
>>>
>>> 61 1B 24 31 21 54 2B 21 49 43
>>
>>
>> That might very well be with your test code, which I unfortunately 
>> don't know anything about.
>>
>> But I think I can guess what goes on: you did not flush the output 
>> buffers with yaz_iconv(cd, 0,0, ..) .
>>
>> Why is _separate_ flushing a good idea? What can't we call iconv once?
>>
>> Suppose we did not have it (implicit flushing for every call of iconv) 
>> and we want to convert a file. We read chunks from disk (say 2K at a 
>> time). We call iconv. iconv flushes every time. But when it flushes it 
>> assumes that the input is complete. Sometimes it ain't. There will be 
>> input sequences "crossing" block boundaries. And so we'll end up with 
>> errors. So we just read the whole in memory? No? Can mmap do it?
>>
>> The opposite is much more appealing . We read chunks of almost size we 
>> want. We don't care about the input.  We don't *know* the input. So we 
>> call iconv for each chunk . It's a streaming process. At EOF we flush.
>>
>> Of course, if it's a small and *complete* buffer in memory it might be 
>> ackward.. We'd have two calls instead of one. But I am sure you can 
>> make a wrapper yaz_iconv_all(cd, a, b, c, d); which calls 
>> yaz_iconv(cd, a, b, c, d) +     yaz_iconv(cd, 0, 0, c, d);
>> This is what your patch effectively does , I think.
>>
>> / Adam
>>
>>> Notice that only the first two hangul triples were converted and 
>>> output.  The ending ESC string was also not output.  This happened 
>>> because the normal flow of the logic does NOT call your flush 
>>> function.  In fact in siconv.c version $Id: siconv.c,v 1.37 
>>> 2007/03/20 21:37:32 adam Exp $, yaz_iconv exits at the point shown 
>>> below.
>>>
>>>    while (1)
>>>    {
>>>        unsigned long x;
>>>        size_t no_read;
>>>
>>>        if (cd->unget_x)
>>>        {
>>>            x = cd->unget_x;
>>>            no_read = cd->no_read_x;
>>>        }
>>>        else
>>>        {
>>>            if (*inbytesleft == 0)
>>>            {
>>>                r = *inbuf - inbuf0;
>>>                break;    <-------------------This is the exit point.
>>>            }
>>>
>>> I think what you really meant to do was this:
>>>    while (1)
>>>    {
>>>        unsigned long x;
>>>        size_t no_read;
>>>
>>>        if (cd->unget_x)
>>>        {
>>>            x = cd->unget_x;
>>>            no_read = cd->no_read_x;
>>>        }
>>>        else
>>>        {
>>>            if (*inbytesleft == 0)
>>>            {
>>>               if (cd->flush_handle)     // Proposed fix to correctly 
>>> clear the buffers
>>>                   r = (*cd->flush_handle)(cd, outbuf, outbytesleft);  
>>> // Proposed fix to correctly clear the buffers.
>>>                r = *inbuf - inbuf0;
>>>                break;    <-------------------This is the exit point.
>>>            }
>>>
>>> Anyway, I tried the code with this fix, and it works properly.  Is 
>>> this the correct fix, or should this problem be fixed another way?
>>>
>>> Gary
>>>
>>> _______________________________________________
>>> Yazlist mailing list
>>> Yazlist at lists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
> 
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist




More information about the Yazlist mailing list