Small fix for libnjb to transfer Chinese tags
gentoo cjkI once mentioned that
some Chinese characters are missing in Creative ZenMirco when using Amarok + libnjb.
I checked the libnjb-2.4.4
source code and found that the text codec
conversion from UTF-8/ISO8859-1 to UCS2 big-endian is home-brew instead of the
standard libiconv
, maybe the libiconv
is overkill since only three codecs
are really needed. According to the specification of
UTF-8
U-00000000 – U-0000007F: | 0xxxxxxx |
U-00000080 – U-000007FF: | 110xxxxx 10xxxxxx |
U-00000800 – U-0000FFFF: | 1110xxxx 10xxxxxx 10xxxxxx |
U-00010000 –U-001FFFFF: | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
U-00200000 – U-03FFFFFF: | 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
U-04000000 – U-7FFFFFFF: | 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
where the libnjb developers made a small mistake, so all 0x80xx characters are categorized as abnormal.
if (numbytes == 2 && str[i+1] > 0x80) {
... ...
} else if (numbytes == 3 && str[i+1] > 0x80 && str[i+2] > 0x80) {
... ...
} else {
/* Abnormal string character, just skip */
Here is a patch against libnjb-2.2.4, you may adapt it to libnjb-2.2.5 as well. For Gentoo users’ convenience, here is the ebuild.
I am just curious why this bug has been here for such a long time. How many Chinese Linux/BSD users take UTF-8 as the locale, transfer music to Creative ZenMicro? Are these three factors are really small that make this almost never gonna happen?