unicode utf8 - MySQL C#Text Encoding Problems





emoji incorrect (5)


As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.

What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "ë" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of à and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.

I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.

The string that I retrieve from it have strange characters when ë like characters are expected.

For example: "ë" is "ë".

I thought I could get this right by converting from UTF8 to UTF16.

 return Encoding.Unicode.GetString(                
            Encoding.Convert(
            Encoding.UTF8,
            Encoding.Unicode,
            Encoding.UTF8.GetBytes(utf8)));
    }

This however doesn't change a thing.

How could I get the data from this database in proper form?




Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.




thank you The Mouth of a Cow , your solution works but still we need converting characters. i think this is your problem :) and for converting characters you can use this code

 System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;

 string s = "unicode";

 //string to utf
 byte[] utf = System.Text.Encoding.UTF8.GetBytes(s);

 //utf to string
 string s2= System.Text.Encoding.UTF8.GetString(utf);



Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:

  1. Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
  2. Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
  3. Whenever you select based on Unicode data ensure you use the N prefix again

MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");

MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");

If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set




The idea of storing a bunch of reference images is interesting. Would it be so wrong to store say 25Mb of sample images, and have the encoder try and compose an image using bits of those? With such a minuscule pipe, the machinery at either end is by necessity going to be much greater than the volume of data passing through, so what's the difference between 25Mb of code, and 1Mb of code and 24Mb of image data?

(note the original guidelines ruled out restricting the input to images already in the library - I'm not suggesting that).





c# mysql unicode utf-8