python中的双解码unicode（python unicode解码）

25-02-02 21

对于python中的双解码unicode感兴趣的读者，本文将提供您所需要的所有信息，我们将详细讲解pythonunicode解码，并且为您提供关于Python-UnicodeDecodeError：'

对于python中的双解码unicode感兴趣的读者，本文将提供您所需要的所有信息，我们将详细讲解python unicode解码，并且为您提供关于Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 229393 中的字节 0x81:: 字符映射到、Python 3 UnicodeDecodeError：“ charmap”编解码器无法解码字节0x9d、python json不编码unicode并且除去无用的空格、python utf-8/gbk/unicode 编码及解码的宝贵知识。

本文目录一览：

python中的双解码unicode（python unicode解码）
Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 229393 中的字节 0x81:: 字符映射到
Python 3 UnicodeDecodeError：“ charmap”编解码器无法解码字节0x9d
python json不编码unicode并且除去无用的空格
python utf-8/gbk/unicode 编码及解码

python中的双解码unicode（python unicode解码）

我正在针对似乎渴望返回双重UTF-8编码字符串的应用程序进行工作。

我发送u''XüYß''使用UTF-8编码的字符串，因此变成X\u00fcY\u00df（等于X\xc3\xbcY\xc3\x9f）。

服务器应
仅回显我发送的内容，但返回以下内容：（X\xc3\x83\xc2\xbcY\xc3\x83\xc2\x9f应为X\xc3\xbcY\xc3\x9f）。如果我将其解码采用str.decode(''utf-8'')变u''X\xc3\xbcY\xc3\x9f''，使用UTF-8，它看起来像一个......的unicode字符串，包含原始字符串编码。

但是Python不允许我在不重新编码的情况下解码unicode字符串-由于某种原因而失败，这使我逃脱了：

>>> ret = ''X\xc3\x83\xc2\xbcY\xc3\x83\xc2\x9f''.decode(''utf-8'')>>> retu''X\xc3\xbcY\xc3\x9f''>>> ret.decode(''utf-8'')# Throws UnicodeEncodeError: ''ascii'' codec can''t encode ...

如何说服Python重新解码字符串？-和/或是否有（实际）调试字符串中实际内容的方法，而没有通过所有隐式转换print使用该方法？

（是的，我已经向服务器端的开发人员报告了此行为。）

答案1

小编典典

ret.decode()尝试ret使用系统编码隐式编码-在您的情况下为ascii。

如果您明确编码unicode字符串，则应该没问题。有内置的编码可以满足您的需求：

>>> ''X\xc3\xbcY\xc3\x9f''.encode(''raw_unicode_escape'').decode(''utf-8'')''XüYß''

确实，.encode(''latin1'')（或cp1252）可以，因为这几乎是服务器很少使用的内容。该raw_unicode_escape编解码器将只是给你的东西识别在最后，而不是抛出一个异常：

>>> ''€\xe2\x82\xac''.encode(''raw_unicode_escape'').decode(''utf8'')''\\u20ac€''>>> ''€\xe2\x82\xac''.encode(''latin1'').decode(''utf8'')Traceback (most recent call last):  File "<stdin>", line 1, in <module>UnicodeEncodeError: ''latin-1'' codec can''t encode character ''\u20ac'' in position 0: ordinal not in range(256)

如果遇到这种混合数据，则可以再次使用编解码器来规范化所有内容：

>>> ''€\xe2\x82\xac''.encode(''raw_unicode_escape'').decode(''utf8'')''\\u20ac€''>>> ''\\u20ac€''.encode(''raw_unicode_escape'')b''\\u20ac\\u20ac''>>> ''\\u20ac€''.encode(''raw_unicode_escape'').decode(''raw_unicode_escape'')''€€''

Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 229393 中的字节 0x81:: 字符映射到 <undefined>

Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 229393 中的字节 0x81:: 字符映射到

您打开的文件不是 utf-8 格式，请检查格式（编码）并使用该格式而不是 utf-8。

试试

  encoding=''utf-8-sig''

Python 3 UnicodeDecodeError：“ charmap”编解码器无法解码字节0x9d

我想制作搜索引擎，并按照某些网络中的教程进行操作。我想测试解析html

from bs4 import BeautifulSoup

def parse_html(filename):
    """Extract the Author,Title and Text from a HTML file
    which was produced by pdftotext with the option -htmlmeta."""
    with open(filename) as infile:
        html = BeautifulSoup(infile,"html.parser",from_encoding='utf-8')
        d = {'text': html.pre.text}
        if html.title is not None:
            d['title'] = html.title.text
        for meta in html.findAll('meta'):
            try:
                if meta['name'] in ('Author','Title'):
                    d[meta['name'].lower()] = meta['content']
            except KeyError:
                continue
        return d

parse_html("C:\\pdf\\pydf\\data\\muellner2011.html")

它得到错误

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 867: character maps to <undefined>enter code here

我在网上看到了一些使用encode（）的解决方案。但是我不知道如何在代码中插入encode（）函数。谁能帮我？

python json不编码unicode并且除去无用的空格

data = json.dumps(data,separators=('','','':''),ensure_ascii=False)

python utf-8/gbk/unicode 编码及解码

如果想知道python 的某个bytes类型是通过什么类型编码，可以先安装chardet 。

pip install chardet

Python utf-8 编码及解码

str = "python编码"
# 转为utf-8 类型的bytes 字符串
str_utf8 = str.encode("utf-8")
print("转码结果："+repr(str_utf8))
print(type(str_utf8))
print(chardet.detect(str_utf8))
print("解码结果："+str_utf8.decode("utf-8"))

运行结果：

转码结果：b'python\xe7\xbc\x96\xe7\xa0\x81'
<class 'bytes'>
{'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}
解码结果：python编码
转码结果：b'python\xb1\xe0\xc2\xeb'

Python gbk 编码及解码

# 转为gbk 类型的bytes 字符串
str_gbk = str.encode("gbk")
print("转码结果："+repr(str_gbk))
print(type(str_gbk))
print(chardet.detect(str_gbk))
print("解码结果："+str_gbk.decode("gbk"))

运行结果：

转码结果：b'python\xb1\xe0\xc2\xeb'
<class 'bytes'>
{'encoding': None, 'confidence': 0.0, 'language': None}
解码结果：python编码

Python unicode 编码及解码

# 转为unicode 类型的bytes 字符串
str_unicode = str.encode("unicode-escape")
print("转码结果："+repr(str_unicode))
print(type(str_unicode))
print(chardet.detect(str_unicode))
print("解码结果："+str_unicode.decode("unicode-escape"))

运行结果：

转码结果：b'python\\u7f16\\u7801'
<class 'bytes'>
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
解码结果：python编码

我们今天的关于python中的双解码unicode和python unicode解码的分享就到这里，谢谢您的阅读，如果想了解更多关于Python - UnicodeDecodeError：'charmap' 编解码器无法解码位置 229393 中的字节 0x81:: 字符映射到、Python 3 UnicodeDecodeError：“ charmap”编解码器无法解码字节0x9d、python json不编码unicode并且除去无用的空格、python utf-8/gbk/unicode 编码及解码的相关信息，可以在本站进行搜索。

本文标签：