hexcode of é
https://www.codetable.net/hex/e9
Symbol Name: Latin Small Letter E With Acute
Html Entity: é
Hex Code: é
Decimal Code: é
Unicode Group: Latin-1 Supplement
http://www.unicode.org/charts/index.html 这个页面搜索的时候,需要输入00E9,字母必须大写。然后就能找到字符是属于Latin-1 Supplement
Latin-1 Supplement https://www.unicode.org/charts/PDF/U0080.pdf
If intérêt
shows up as intérêt
you likely (i.e. short of corruption due to double encoding) have UTF-8 encoded text being shown up as if it were ISO-8859-1.
Make sure the headers are correctly formed and present the content as being UTF-8 encoded.
This article shows how to convert a string that has been double encoded using UTF-8.
For example, say you have the string Müller instead of the string Müller.
How did it happen?
The letter ü is encoded in UTF-8 as 2 bytes: 195 and 188
If you encoded the bytes again then the 195 converts to 195 and 131 which is the Ã
And the 188 converts to 194 and 188 which is the ¼
有一个错误的字符串,转换步骤如下
1.先用utf8,把字符串转换成utf8的字节数组
2.把utf8的字节数组,转换成iso的字节数组
3.再用utf8,把iso的字节数组,转换成utf8对应的字符串
[Test]
public void Test20210409003()
{
string correctFormat = "125,chaînes";//This is the correct format
var utf8Str = "125,chaînes";
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(utf8Str);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
var result = utf8.GetString(isoBytes);
Console.WriteLine(result);
}
Use Encoding.Convert to adjust the byte array before attempting to decode it into your destination encoding.
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(Message);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string msg = iso.GetString(isoBytes);
The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). Controls C1 (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.
The C1 controls and Latin-1 Supplement block has been included in its present form, with the same character repertoire since version 1.0 of the Unicode Standard.[3] Its block name in Unicode 1.0 was simply Latin1.
The answer would be you have wrong data in the database. What probably happened is that you did a conversion ISO-8859-1 -> UTF-8 on data that's already in UTF-8. Therefore, doing a conversion UTF-8 -> ISO-8859-1 gives you the original UTF-8 data back.
Make sure you're not calling utf8_encode
(which does an ISO-8859-1 -> UTF-8 conversion) on UTF-8 data! 这里是double encoding的问题,已经编码成utf-8的字符串,又做了一次从iso-8859-1到utf-8的转换。
Since every UTF-8 string is also a valid ISO-8859-1 string (well, not quite, but it's commonly extended so that that's the case), you have no errors on the ISO-8859-1 -> UTF-8 conversion over UTF-8 data.
î被错误的编码读取
850
ibm850
OEM Multilingual Latin 1; Western European (DOS)
1252
windows-1252
ANSI Latin 1; Western European (Windows)
28591
iso-8859-1
ISO 8859-1 Latin 1; Western European (ISO)
[Test]
public void Test20210414001()
{
Console.WriteLine(Encoding.Default.EncodingName);
Console.WriteLine(Encoding.Default.CodePage);
string str = "î";
var array = Encoding.UTF8.GetBytes(str);
var encoding2 = Encoding.GetEncoding(850);
var str2 = encoding2.GetString(array);
Console.WriteLine(str2);
var encoding3 = Encoding.GetEncoding(1252);
var str3 = encoding3.GetString(array);
Console.WriteLine(str3);
var encoding4 = Encoding.GetEncoding(28591);
var str4 = encoding3.GetString(array);
Console.WriteLine(str4);
}
code page 850解析的是├«
code page 1252解析的是î
code page 28591解析的是î
测试
QQ农场 encode with gb2312 and decode with Windows-252,the result is QQÅ©³¡
https://github.com/ChuckTest/UnitTest/blob/master/UnitTest/EncodingTest.cs
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 01252(Windows-1252)[Western European (Windows)] as: QQÅ©³¡
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 01254(windows-1254)[Turkish (Windows)] as: QQÅ©³¡
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 01258(windows-1258)[Vietnamese (Windows)] as: QQÅ©³¡
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 28591(iso-8859-1)[Western European (ISO)] as: QQÅ©³¡
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 28599(iso-8859-9)[Turkish (ISO)] as: QQÅ©³¡
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 28605(iso-8859-15)[Latin 9 (ISO)] as: QQÅ©³¡
string QQ农场 with 00936(gb2312)[Chinese Simplified (GB2312)], decode with 65000(utf-7)[Unicode (UTF-7)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 01252(Windows-1252)[Western European (Windows)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 01254(windows-1254)[Turkish (Windows)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 01258(windows-1258)[Vietnamese (Windows)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 28591(iso-8859-1)[Western European (ISO)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 28599(iso-8859-9)[Turkish (ISO)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 28605(iso-8859-15)[Latin 9 (ISO)] as: QQÅ©³¡
string QQ农场 with 10008(x-mac-chinesesimp)[Chinese Simplified (Mac)], decode with 65000(utf-7)[Unicode (UTF-7)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 01252(Windows-1252)[Western European (Windows)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 01254(windows-1254)[Turkish (Windows)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 01258(windows-1258)[Vietnamese (Windows)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 28591(iso-8859-1)[Western European (ISO)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 28599(iso-8859-9)[Turkish (ISO)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 28605(iso-8859-15)[Latin 9 (ISO)] as: QQÅ©³¡
string QQ农场 with 20936(x-cp20936)[Chinese Simplified (GB2312-80)], decode with 65000(utf-7)[Unicode (UTF-7)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 01252(Windows-1252)[Western European (Windows)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 01254(windows-1254)[Turkish (Windows)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 01258(windows-1258)[Vietnamese (Windows)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 28591(iso-8859-1)[Western European (ISO)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 28599(iso-8859-9)[Turkish (ISO)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 28605(iso-8859-15)[Latin 9 (ISO)] as: QQÅ©³¡
string QQ农场 with 50227(x-cp50227)[Chinese Simplified (ISO-2022)], decode with 65000(utf-7)[Unicode (UTF-7)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 01252(Windows-1252)[Western European (Windows)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 01254(windows-1254)[Turkish (Windows)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 01258(windows-1258)[Vietnamese (Windows)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 28591(iso-8859-1)[Western European (ISO)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 28599(iso-8859-9)[Turkish (ISO)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 28605(iso-8859-15)[Latin 9 (ISO)] as: QQÅ©³¡
string QQ农场 with 51936(EUC-CN)[Chinese Simplified (EUC)], decode with 65000(utf-7)[Unicode (UTF-7)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 01252(Windows-1252)[Western European (Windows)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 01254(windows-1254)[Turkish (Windows)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 01258(windows-1258)[Vietnamese (Windows)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 28591(iso-8859-1)[Western European (ISO)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 28599(iso-8859-9)[Turkish (ISO)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 28605(iso-8859-15)[Latin 9 (ISO)] as: QQÅ©³¡
string QQ农场 with 54936(GB18030)[Chinese Simplified (GB18030)], decode with 65000(utf-7)[Unicode (UTF-7)] as: QQÅ©³¡