[Yazlist] Error in latest SICONV.C

Gary Anderson ganderson at bslw.com
Thu Mar 22 18:37:19 CET 2007


This is a different philosophy that I picked up from your earlier 
emails.  Those indicated that I shouldn't be submitting subfield marks 
and field marks to the converter.  The MARC-21 standard specifies that 
fixed field and subfield strings are self-contained and ESC sequences in 
MARC8 encoding don't cross field or subfield boundaries.  That is why I 
suggested the fix the way I did.  Now you seem to be saying that we can 
and should run the data in whatever size chunks we want until an EOF is 
reached.  So, rather than relying on the library to give me back 
self-contained data in conversions to and from MARC8, I have two 
choices.  First, I can append a period to the end of every string I 
convert and then kill it when the string comes back (that works, by the 
way) or I can make a convert call then another convert call to flush the 
buffers.  That seems a little more cumbersome and time consuming.

Gary

Adam Dickmeiss wrote:

> Gary Anderson wrote:
>
>> Adam,
>>
>> I've run through the latest siconv.c from your cvs repository.  It 
>> now has additional problems with converting UTF8 to MARC8.  I ran a 
>> simplified test.  The input string was (hex bytes, ignore spaces):
>>
>> 61 E8 87 BA  E7 81 A3  E5 BE 97
>>
>> The output string was:
>>
>> 61 1B 24 31 21 54 2B 21 49 43
>
>
> That might very well be with your test code, which I unfortunately 
> don't know anything about.
>
> But I think I can guess what goes on: you did not flush the output 
> buffers with yaz_iconv(cd, 0,0, ..) .
>
> Why is _separate_ flushing a good idea? What can't we call iconv once?
>
> Suppose we did not have it (implicit flushing for every call of iconv) 
> and we want to convert a file. We read chunks from disk (say 2K at a 
> time). We call iconv. iconv flushes every time. But when it flushes it 
> assumes that the input is complete. Sometimes it ain't. There will be 
> input sequences "crossing" block boundaries. And so we'll end up with 
> errors. So we just read the whole in memory? No? Can mmap do it?
>
> The opposite is much more appealing . We read chunks of almost size we 
> want. We don't care about the input.  We don't *know* the input. So we 
> call iconv for each chunk . It's a streaming process. At EOF we flush.
>
> Of course, if it's a small and *complete* buffer in memory it might be 
> ackward.. We'd have two calls instead of one. But I am sure you can 
> make a wrapper yaz_iconv_all(cd, a, b, c, d); which calls 
> yaz_iconv(cd, a, b, c, d) +     yaz_iconv(cd, 0, 0, c, d);
> This is what your patch effectively does , I think.
>
> / Adam
>
>> Notice that only the first two hangul triples were converted and 
>> output.  The ending ESC string was also not output.  This happened 
>> because the normal flow of the logic does NOT call your flush 
>> function.  In fact in siconv.c version $Id: siconv.c,v 1.37 
>> 2007/03/20 21:37:32 adam Exp $, yaz_iconv exits at the point shown 
>> below.
>>
>>    while (1)
>>    {
>>        unsigned long x;
>>        size_t no_read;
>>
>>        if (cd->unget_x)
>>        {
>>            x = cd->unget_x;
>>            no_read = cd->no_read_x;
>>        }
>>        else
>>        {
>>            if (*inbytesleft == 0)
>>            {
>>                r = *inbuf - inbuf0;
>>                break;    <-------------------This is the exit point.
>>            }
>>
>> I think what you really meant to do was this:
>>    while (1)
>>    {
>>        unsigned long x;
>>        size_t no_read;
>>
>>        if (cd->unget_x)
>>        {
>>            x = cd->unget_x;
>>            no_read = cd->no_read_x;
>>        }
>>        else
>>        {
>>            if (*inbytesleft == 0)
>>            {
>>               if (cd->flush_handle)     // Proposed fix to correctly 
>> clear the buffers
>>                   r = (*cd->flush_handle)(cd, outbuf, outbytesleft);  
>> // Proposed fix to correctly clear the buffers.
>>                r = *inbuf - inbuf0;
>>                break;    <-------------------This is the exit point.
>>            }
>>
>> Anyway, I tried the code with this fix, and it works properly.  Is 
>> this the correct fix, or should this problem be fixed another way?
>>
>> Gary
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>
>
>
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ganderson.vcf
Type: text/x-vcard
Size: 235 bytes
Desc: not available
Url : http://lists.indexdata.dk/pipermail/yazlist/attachments/20070322/634d1d00/ganderson.vcf


More information about the Yazlist mailing list