View Single Post
Old 2016-10-25, 05:31
Battler's Avatar
Battler Battler is offline
Welcome to Zirla!
Join Date: Apr 2004
Location: Koper, Slovenia
Posts: 5,264
Send a message via ICQ to Battler Send a message via AIM to Battler Send a message via MSN to Battler Send a message via Yahoo to Battler
I found the issue: After the Japanese text is converted to UTF-8, the offending part has these characters: 人 . But for some reason, vBulletin changes them to � �� when rendering them. It does seem like vBulletin isn't liking that particular sequence of bytes.
Edit: And every time that sequence appears, it gets reencoded, so if the borked text is passed, it gets even more borked. Maybe there's a word filter that's messing things up?

The weird thing is that when I click edit post, the text is fine there. So it only gets borked when the thread is shown to the user. Test: 日本の人, 日本人で.
日本からみんなさん、ようこそ!ここには、日本語でどうぞ。私は、日本a 人であるません、でも日本を話すことができます。

Edit #2: It screws up the text if a specific combination of characters is in a specific position in the line. Above, I added one space, and the mess up went away.

Edit #3: It has to be any non-ASCII character at that specific position in the line. Specifically, the first 33 characters are fine, but if the 34th character is non-ASCII, it gets messed up.

Edit #4: Look at this line, you will see a space in the middle of it, even though there should be none:
abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij

So basically, when vBulletin thinks there's a very long live without spaces, it automatically inserts one. The problem is, Japanese text has no spaces (neither does Chinese text for that matter), and probably because the database encoding is not set correctly, it inserts the space in the middle of multiple bytes encoding such a character, messing it up. Most likely, it's designed to prevent spuriously long lines from messing up the layout, and it's doing things wrong because the database encoding is not set right, causing the vBulletin backend to interpret the characters as most likely ISO 8859-1 (Latin I / Western Europe) rather than the correct UTF-8.
In short, elmuerte needs to change the database encoding to UTF-8. Until he does, it is best to manually insert a space where you see a mess up occurring.
Join #doki-doki on for some nice chit-chat about anime, manga, and other aspects of Japanese culture now!

Last edited by Battler; 2016-10-25 at 17:12.
Reply With Quote