unicode utf8 emoji - MySQL C# Text Encoding Problems
Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:
- Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
- Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
- Whenever you select based on Unicode data ensure you use the N prefix again
MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')"); MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");
If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set
I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.
The string that I retrieve from it have strange characters when ë like characters are expected.
For example: "ë" is "Ã«".
I thought I could get this right by converting from UTF8 to UTF16.
return Encoding.Unicode.GetString( Encoding.Convert( Encoding.UTF8, Encoding.Unicode, Encoding.UTF8.GetBytes(utf8))); }
This however doesn't change a thing.
How could I get the data from this database in proper form?
As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.
What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "Ã«" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of Ã and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.