Commenti a: Unicode, localization and C++ support

Di: Elvis

Elvis — Wed, 18 May 2016 08:09:13 +0000

Why I need to call ios_base::sync_with_stdio(false)?

Di: Thiago

Thiago — Fri, 22 Apr 2016 13:29:47 +0000

Just an update: Since Visual Studio 2015 Update 2 (released 30/03/2016), there are new compiler switches to manage both source and execution character sets:

/source-charset:|.NNNN
/execution-charset:|.NNNN
/utf-8

The later being a synonym for both “/source-charset:utf-8” and “/execution-charset:utf-8”.

More details on the following Visual C++ Team Blog post:
https://blogs.msdn.microsoft.com/vcblog/2016/02/22/new-options-for-managing-character-sets-in-the-microsoft-cc-compiler/

Di: Sergey

Sergey — Thu, 21 Apr 2016 18:14:50 +0000

I think one of the most important things to explain in an article like this is that actually you should not use wchar_t and std::wstring if you want to get a code that can correctly work with all Unicode symbols.

There is a very brief mention of the problems that riddle the wchar_t approach (more specifically the fact that on Windows wchar_t is actually 16 bit which clearly can’t hold all Unicode code points). There is a good post about most of the problems at StackOverflow – http://stackoverflow.com/a/11107667/3375765

But it does not end there, oh no. Writing code that works as expected with any valid Unicode string will most definitely force you into a situation when you will want to iterate over every code point in a string, convert between lower case and upper case, case insensitive algorithms like hashing, searching, collation (comparison) and so on. Perhaps you will work with UTF8 strings and will want to do all of this in a lazy manner without full string conversion to a UTF32 string! Standard library gives you very little on that front (“almost nothing” may be closer to the truth). Additionally I found the process of setting up locales through standard C++ ways very confusing (docs are very sparse on that front).

All in all, based on my experience, if you are serious about Unicode support in you product you should either do your custom implementation of the Unicode rule set (which seems insane) or use a specialized library. Luckily for us there is an excellent ICU library for all these needs. I’m not sure that there is something better in the C++ area. And it’s free! Most of its API relies on custom string class – UnicodeString – but it is relatively easy to implement your own ICU-like wrappers around the parts of its API that work with raw strings to make it work with your custom string class (if you have one).

Di: Marco Alesiani

Marco Alesiani — Thu, 21 Apr 2016 09:05:59 +0000

Hi Mikhail, thanks for the comment. I thought it could be less confusing for explanation’s sake, anyway I’ve edited the snippet to just reuse the same object.

Di: Mikhail

Mikhail — Thu, 21 Apr 2016 08:41:09 +0000

Why converter_UTF8_wchar and converter_UTF8_char16 variables are declared with exactly the same type?