[sylpheed:32471] [BUG] procmime_get_encoding_for_text_file returns wrong encoding

Nam Nguyen namn at bluemoon.com.vn
Mon Jun 30 01:55:47 JST 2008


Hello list

This snippet does not count 0x00 (which is used intensively in utf16
text files):

while ((len = fread(buf, sizeof(guchar), sizeof(buf), fp)) > 0) {
	guchar *p;
	gint i;

	for (p = buf, i = 0; i < len; ++p, ++i) {
		if (*p & 0x80)
			++octet_chars;
	}
	total_len += len;
}

IMHO, it should count the number of unprintable characters (ascii code
less than 32 and greater than 126, see
http://www.pcmag.com/encyclopedia_term/0,2542,t=hex+chart&i=44217,00.asp)

So, the if clause should be changed to:

if (*p < 32 || *p > 126)

The same applies to procmime_get_encoding_for_str.

On a side note, for text files, I guess probably the first 8K bytes are
more than enough to tell its encoding. These functions are at best
estimators anyway.

Cheers
Nam


More information about the Sylpheed mailing list