View Single Post
Old 2016-08-30, 15:38   #1130
science_man_88's Avatar
"Forget I exist"
Jul 2009

836910 Posts

Originally Posted by Madpoo View Post
With SQL collations it's more about the sorting, or even how certain umlauts are handled (it surprised me to see "UE" treated as equal to "Ü".

And when sorting, should "Éclair" come before/after "eclair", or, with binary collation, will it show up after "zebra"?

Curiously, even for the same language, different locales may opt to sort accented characters differently. I'm trying to remember the example... I don't know if it was a difference in fr-FR and fr-CA, or maybe pt-PT and pt-BR. Whatever the case... languages are funny things.

Those account for the accent-sensitive and case-sensitive options in collations, but then with other charsets like Cyrillic and Polish (not to mention Chinese, Japanese and Korean...CJK) you have to pay even more attention, and when comparing across the two, find some common collation (like binary, perhaps) where both character sets have a place to live.

Before my current job I never thought a single moment about any of this... SQL collations, ASCII folding in search indices, German decompounding when indexing/searching, the double-wide western characters in CJK (or the frustrating search for a decent font that can show all of the common Unicode characters, monospaced. One that has Japanese *and* Korean, and won't show the Korean characters sideways as many of the freebies would do).

We haven't expanded to Turkey, Greece, any of the Arab countries or Israel. I imagine the fun we'll have if/when we do our first right-to-left (RTL) language and how that would impact our entire design... LOL

Anyway, I don't blame anyone for making the DB columns varchar like they are now... until it's a problem, you don't really know how interesting that makes things.
so basically you've been running into this a lot:

edit: realized my error tried to use a URL in a youtube tag.

Last fiddled with by science_man_88 on 2016-08-30 at 15:57
science_man_88 is offline   Reply With Quote