[Yazlist] Error in latest SICONV.C
Adam Dickmeiss
adam at indexdata.dk
Thu Mar 22 19:10:45 CET 2007
Gary Anderson wrote:
> This is a different philosophy that I picked up from your earlier
> emails. Those indicated that I shouldn't be submitting subfield marks
> and field marks to the converter. The MARC-21 standard specifies that
> fixed field and subfield strings are self-contained and ESC sequences in
> MARC8 encoding don't cross field or subfield boundaries. That is why I
> suggested the fix the way I did. Now you seem to be saying that we can
> and should run the data in whatever size chunks we want until an EOF is
EOF would be end of data field too.
> reached. So, rather than relying on the library to give me back
> self-contained data in conversions to and from MARC8, I have two
> choices. First, I can append a period to the end of every string I
> convert and then kill it when the string comes back (that works, by the
> way) or I can make a convert call then another convert call to flush the
> buffers. That seems a little more cumbersome and time consuming.
If two calls to iconv bothers you, make a wrapper or just change the
source of YAZ.
/ Adam
>
> Gary
>
> Adam Dickmeiss wrote:
>
>> Gary Anderson wrote:
>>
>>> Adam,
>>>
>>> I've run through the latest siconv.c from your cvs repository. It
>>> now has additional problems with converting UTF8 to MARC8. I ran a
>>> simplified test. The input string was (hex bytes, ignore spaces):
>>>
>>> 61 E8 87 BA E7 81 A3 E5 BE 97
>>>
>>> The output string was:
>>>
>>> 61 1B 24 31 21 54 2B 21 49 43
>>
>>
>> That might very well be with your test code, which I unfortunately
>> don't know anything about.
>>
>> But I think I can guess what goes on: you did not flush the output
>> buffers with yaz_iconv(cd, 0,0, ..) .
>>
>> Why is _separate_ flushing a good idea? What can't we call iconv once?
>>
>> Suppose we did not have it (implicit flushing for every call of iconv)
>> and we want to convert a file. We read chunks from disk (say 2K at a
>> time). We call iconv. iconv flushes every time. But when it flushes it
>> assumes that the input is complete. Sometimes it ain't. There will be
>> input sequences "crossing" block boundaries. And so we'll end up with
>> errors. So we just read the whole in memory? No? Can mmap do it?
>>
>> The opposite is much more appealing . We read chunks of almost size we
>> want. We don't care about the input. We don't *know* the input. So we
>> call iconv for each chunk . It's a streaming process. At EOF we flush.
>>
>> Of course, if it's a small and *complete* buffer in memory it might be
>> ackward.. We'd have two calls instead of one. But I am sure you can
>> make a wrapper yaz_iconv_all(cd, a, b, c, d); which calls
>> yaz_iconv(cd, a, b, c, d) + yaz_iconv(cd, 0, 0, c, d);
>> This is what your patch effectively does , I think.
>>
>> / Adam
>>
>>> Notice that only the first two hangul triples were converted and
>>> output. The ending ESC string was also not output. This happened
>>> because the normal flow of the logic does NOT call your flush
>>> function. In fact in siconv.c version $Id: siconv.c,v 1.37
>>> 2007/03/20 21:37:32 adam Exp $, yaz_iconv exits at the point shown
>>> below.
>>>
>>> while (1)
>>> {
>>> unsigned long x;
>>> size_t no_read;
>>>
>>> if (cd->unget_x)
>>> {
>>> x = cd->unget_x;
>>> no_read = cd->no_read_x;
>>> }
>>> else
>>> {
>>> if (*inbytesleft == 0)
>>> {
>>> r = *inbuf - inbuf0;
>>> break; <-------------------This is the exit point.
>>> }
>>>
>>> I think what you really meant to do was this:
>>> while (1)
>>> {
>>> unsigned long x;
>>> size_t no_read;
>>>
>>> if (cd->unget_x)
>>> {
>>> x = cd->unget_x;
>>> no_read = cd->no_read_x;
>>> }
>>> else
>>> {
>>> if (*inbytesleft == 0)
>>> {
>>> if (cd->flush_handle) // Proposed fix to correctly
>>> clear the buffers
>>> r = (*cd->flush_handle)(cd, outbuf, outbytesleft);
>>> // Proposed fix to correctly clear the buffers.
>>> r = *inbuf - inbuf0;
>>> break; <-------------------This is the exit point.
>>> }
>>>
>>> Anyway, I tried the code with this fix, and it works properly. Is
>>> this the correct fix, or should this problem be fixed another way?
>>>
>>> Gary
>>>
>>> _______________________________________________
>>> Yazlist mailing list
>>> Yazlist at lists.indexdata.dk
>>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
>>
>>
>> _______________________________________________
>> Yazlist mailing list
>> Yazlist at lists.indexdata.dk
>> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
>>
>
> _______________________________________________
> Yazlist mailing list
> Yazlist at lists.indexdata.dk
> http://lists.indexdata.dk/cgi-bin/mailman/listinfo/yazlist
More information about the Yazlist
mailing list