将Unicode转换为ASCII且在Python中没有错误（将unicode转换为字符的函数）

25-02-04 16

这篇文章主要围绕将Unicode转换为ASCII且在Python中没有错误和将unicode转换为字符的函数展开，旨在为您提供一份详细的参考资料。我们将全面介绍将Unicode转换为ASCII且在Py

这篇文章主要围绕将Unicode转换为ASCII且在Python中没有错误和将unicode转换为字符的函数展开，旨在为您提供一份详细的参考资料。我们将全面介绍将Unicode转换为ASCII且在Python中没有错误的优缺点，解答将unicode转换为字符的函数的相关问题，同时也会为您带来bash – 如何将文件名从unicode转换为ascii、JS Unicode转中文,中文转Unicode,ASCII转Unicode,Unicode转ASCII、js 中文汉字转Unicode、Unicode转中文汉字、ASCII转换Unicode、Unicode转换ASCII、中文转换&#XXX函数代码、Python ascii utf Unicode的实用方法。

本文目录一览：

将Unicode转换为ASCII且在Python中没有错误（将unicode转换为字符的函数）
bash – 如何将文件名从unicode转换为ascii
JS Unicode转中文,中文转Unicode,ASCII转Unicode,Unicode转ASCII
js 中文汉字转Unicode、Unicode转中文汉字、ASCII转换Unicode、Unicode转换ASCII、中文转换&#XXX函数代码
Python ascii utf Unicode

将Unicode转换为ASCII且在Python中没有错误（将unicode转换为字符的函数）

我的代码只是刮取一个网页，然后将其转换为Unicode。

html = urllib.urlopen(link).read()html.encode("utf8","ignore")self.response.out.write(html)

但是我得到了UnicodeDecodeError：

Traceback (most recent call last):  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 507, in __call__    handler.get(*groups)  File "/Users/greg/clounce/main.py", line 55, in get    html.encode("utf8","ignore")UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xa0 in position 2818: ordinal not in range(128)

我认为这意味着HTML在某处包含一些错误的Unicode尝试。我能丢掉导致问题的任何代码字节而不出错吗？

答案1

小编典典

使用类似的压缩gzip已变得非常流行（约73％的网站都在使用它，包括Google，YouTube，Yahoo，Wikipedia，Reddit，Stack Overflow和Stack Exchange Network网站等大型网站）。
如果你像原始答案中那样使用gzip压缩响应进行简单解码，则会收到类似以下错误：

UnicodeDecodeError：''utf8''编解码器无法解码位置1的字节0x8b

为了解码gzpipped响应，你需要添加以下模块（在Python 3中）：

import gzipimport io

答案2

小编典典

然后，你可以像这样解析内容：

response = urlopen("https://example.com/gzipped-ressource")buffer = io.BytesIO(response.read()) # Use StringIO.StringIO(response.read()) in Python 2gzipped_file = gzip.GzipFile(fileobj=buffer)decoded = gzipped_file.read()content = decoded.decode("utf-8") # Replace utf-8 with the source encoding of your requested resource

此代码读取响应，并将字节放入缓冲区。然后，gzip模块使用GZipFile函数读取缓冲区。之后，可以将压缩后的文件再次读取为字节，最后将其解码为通常可读的文本。

bash – 如何将文件名从unicode转换为ascii

我在 linux上安装的NTFS分区上有一堆音乐文件,文件名带有unicode字符.我在编写脚本以重命名文件时遇到问题,因此所有文件名都只使用ASCII字符.我认为使用iconv命令应该可以工作,但是我无法转义’mv’命令的字符.

编辑：如果没有unicode字符的直接转换并不重要.我想我只会用“？”代替那些.字符.

解决方法

我不认为iconv有任何角色替换设施.这在 Python中可能有所帮助：

#!/usr/bin/python
import sys

def unistrip(s):
    if isinstance(s,str):
        s = s.decode('utf-8')
    chars = []
    for i in s:
        if ord(i) > 0x7f:
            chars.append(u'?')
        else:
            chars.append(i)
    return u''.join(chars)

if __name__ == '__main__':
    print unistrip(sys.argv[1])

然后打电话给：

$./unistrip.py "yikes_

总结

以上是小编为你收集整理的bash – 如何将文件名从unicode转换为ascii全部内容。

如果觉得小编网站内容还不错，欢迎将小编网站推荐给好友。

JS Unicode转中文,中文转Unicode,ASCII转Unicode,Unicode转ASCII

在线转换工具https://oktools.net/unicode

Unicode转中文

function decodeUnicode(str) {
       return unescape(str.replace(/\\u/gi, ''%u''))
    }

中文转Unicode

function encodeUnicode(str) {
        let res = [];
        for (let i = 0; i < str.length; i++) {
            res[i] = ("00" + str.charCodeAt(i).toString(16)).slice(-4);
        }
        return  "\\u" + res.join("\\u");
    }

Ascii 转 Unicode

function A2U(str) {
    var reserved = '''';

    for (var i = 0; i < str.length; i++) {
        reserved += ''&#'' + str.charCodeAt(i) + '';'';
    }

    return reserved;
}

Unicode 转 Ascii

function U2A(str) {
    var reserved = '''';
    var code = str.match(/&#(d+);/g);

    if (code === null) {
        return str;
    }

    for (var i = 0; i < code.length; i++) {
        reserved += String.fromCharCode(code[i].replace(/[&#;]/g, ''''));
    }

    return reserved;
}

js 中文汉字转Unicode、Unicode转中文汉字、ASCII转换Unicode、Unicode转换ASCII、中文转换&#XXX函数代码

最近看不少在线工具里面都有一些编码转换的代码，很多情况下我们都用得到，这里小编小编就跟大家分享一下这些资料

Unicode介绍

Unicode（统一码、万国码、单一码）是一种在计算机上使用的字符编码。 Unicode 是为了解决传统的字符编码方案的局限而产生的，它为每种语言中的每个字符设定了统一并且唯一的二进制编码，以满足跨语言、跨平台进行文本转换、处理的要求。 Unicode是国际组织制定的可以容纳世界上所有文字和符号的字符编码方案。Unicode用数字0-0x10FFFF来映射这些字符，最多可以容纳1114112个字符，或者说有1114112个码位。码位就是可以分配给字符的数字。 Unicode 到目前为止所定义的五个平面中，第0平面(BMP)最为重要，其编码中文汉字范围为：4E00-9FBFCJK 统一表意符号 (CJK Unified Ideographs)

ASCII介绍

ASCII是基于拉丁字母的一套电脑编码系统。它主要用于显示现代英语和其他西欧语言。它是现今最通用的单字节编码系统，并等同于国际标准ISO/IEC 646。 0-127 是7位ASCII 码的范围，是国际标准。至于汉字，不同的字符集用的ascii 码的范围也不一样，常用的汉字字符集有GB2312-80,GBK,Big5,unicode 等。 GB_2312 字符集是目前最常用的汉字编码标准。在这个标准中，每个汉字用2个字节来表示，每个字节的ascii码为 161-254 (16 进制A1 - FE)，第一个字节对应于区码的1-94 区，第二个字节对应于位码的1-94 位。

ASCII介绍

native2ascii是sun java sdk提供的一个工具。用来将别的文本类文件（比如*.txt,*.ini,*.properties,*.java等等）编码转为Unicode编码。为什么要进行转码，原因在于程序的国际化。安装了jdk后，假如你是在windows上安装，那么在jdk的安装目录下，会有一个bin目录，其中native2ascii.exe正是native2ascii中文转unicode工具。 native2ascii的命令行的命名格式：native2ascii -[options] [inputfile [outputfile]]。例如：native2ascii zh.txt u.txt:将zh.txt转换为Unicode编码，输出文件到u.txt。

本工具中汉字与Unicode转换采用PHP开发，支持十六进制和十进制表示，能够中文汉字和Unicode互转；默认情况下采用十六进制。

下面函数都需要用到的函数

rush:js;"> function left_zero_4(str) { if (str != null && str != '' && str != 'undefined') { if (str.length == 2) { return '00' + str; } } return str; }

中文汉字转Unicode

rush:js;"> function unicode(str){ var value=''; for (var i = 0; i < str.length; i++) { value += '\\u' + left_zero_4(parseInt(str.charCodeAt(i)).toString(16)); } return value; } function left_zero_4(str) { if (str != null && str != '' && str != 'undefined') { if (str.length == 2) { return '00' + str; } } return str; }

Unicode转中文汉字、ASCII转换Unicode

rush:js;"> function reconvert(str){ str = str.replace(/(\\u)(\w{1,4})/gi,function($0){ return (String.fromCharCode(parseInt((escape($0).replace(/(%5Cu)(\w{1,4})/g,"$2")),16))); }); str = str.replace(/(&#x)(\w{1,4});/gi,function($0){ return String.fromCharCode(parseInt(escape($0).replace(/(%26%23x)(\w{1,4})(%3B)/g,"$2"),16)); }); str = str.replace(/(&#)(\d{1,6});/gi,function($0){ return String.fromCharCode(parseInt(escape($0).replace(/(%26%23)(\d{1,6})(%3B)/g,"$2"))); });