Small fix for libnjb to transfer Chinese tags

Jul 17, 2007 gentoo cjk

I once mentioned that some Chinese characters are missing in Creative ZenMirco when using Amarok + libnjb. I checked the libnjb-2.4.4 source code and found that the text codec conversion from UTF-8/ISO8859-1 to UCS2 big-endian is home-brew instead of the standard libiconv, maybe the libiconv is overkill since only three codecs are really needed. According to the specification of UTF-8

U-00000000 – U-0000007F:	0xxxxxxx
U-00000080 – U-000007FF:	110xxxxx 10xxxxxx
U-00000800 – U-0000FFFF:	1110xxxx 10xxxxxx 10xxxxxx
U-00010000 –U-001FFFFF:	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 – U-03FFFFFF:	111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 – U-7FFFFFFF:	1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

where the libnjb developers made a small mistake, so all 0x80xx characters are categorized as abnormal.

if (numbytes == 2 && str[i+1] > 0x80) {
... ...
} else if (numbytes == 3 && str[i+1] > 0x80 && str[i+2] > 0x80) {
... ...
} else {
/* Abnormal string character, just skip */

Here is a patch against libnjb-2.2.4, you may adapt it to libnjb-2.2.5 as well. For Gentoo users’ convenience, here is the ebuild.

I am just curious why this bug has been here for such a long time. How many Chinese Linux/BSD users take UTF-8 as the locale, transfer music to Creative ZenMicro? Are these three factors are really small that make this almost never gonna happen?