这篇文章主要围绕以numpy或pandas处理巨大数字和numpy将大于某个值设置为0展开,旨在为您提供一份详细的参考资料。我们将全面介绍以numpy或pandas处理巨大数字的优缺点,解答numpy
这篇文章主要围绕以numpy或pandas处理巨大数字和numpy 将大于某个值设置为0展开,旨在为您提供一份详细的参考资料。我们将全面介绍以numpy或pandas处理巨大数字的优缺点,解答numpy 将大于某个值设置为0的相关问题,同时也会为您带来16-numpy笔记-莫烦pandas-4、Numpy Pandas、numpy – 在pandas 0.10.1上使用pandas.read_csv指定dtype float32、numpy.random.random & numpy.ndarray.astype & numpy.arange的实用方法。
本文目录一览:- 以numpy或pandas处理巨大数字(numpy 将大于某个值设置为0)
- 16-numpy笔记-莫烦pandas-4
- Numpy Pandas
- numpy – 在pandas 0.10.1上使用pandas.read_csv指定dtype float32
- numpy.random.random & numpy.ndarray.astype & numpy.arange
以numpy或pandas处理巨大数字(numpy 将大于某个值设置为0)
我正在参加比赛,向我提供匿名数据。相当多的列具有HUGE值。最大的是40位数字!我曾经使用过,pd.read_csv
但是这些列已被转换为对象。
我最初的计划是按比例缩小数据,但是由于它们被视为对象,因此我无法对此进行算术运算。
有没有人对如何处理Pandas或Numpy中的大量数字提出建议?
请注意,我尝试将值转换为auint64
时没有运气。我收到错误消息:“长度太大,无法转换”
答案1
小编典典您可以在导入字符串时使用Pandas转换器调用int
或在字符串上使用其他自定义转换器函数:
import pandas as pd from StringIO import StringIOtxt=''''''\line,Big_Num,text1,1234567890123456789012345678901234567890,"That sure is a big number"2,9999999999999999999999999999999999999999,"That is an even BIGGER number"3,1,"Tiny"4,-9999999999999999999999999999999999999999,"Really negative"''''''df=pd.read_csv(StringIO(txt), converters={''Big_Num'':int})print df
印刷品:
line Big_Num text0 1 1234567890123456789012345678901234567890 That sure is a big number1 2 9999999999999999999999999999999999999999 That is an even BIGGER number2 3 1 Tiny3 4 -9999999999999999999999999999999999999999 Really negative
现在测试算术:
n=df["Big_Num"][1]print n,n+1
印刷品:
9999999999999999999999999999999999999999 10000000000000000000000000000000000000000
如果该列中有任何可能导致int
崩溃的值,则可以执行以下操作:
txt=''''''\line,Big_Num,text1,1234567890123456789012345678901234567890,"That sure is a big number"2,9999999999999999999999999999999999999999,"That is an even BIGGER number"3,0.000000000000000001,"Tiny"4,"a string","Use 0 for strings"''''''def conv(s): try: return int(s) except ValueError: try: return float(s) except ValueError: return 0df=pd.read_csv(StringIO(txt), converters={''Big_Num'':conv})print df
印刷品:
line Big_Num text0 1 1234567890123456789012345678901234567890 That sure is a big number1 2 9999999999999999999999999999999999999999 That is an even BIGGER number2 3 1e-18 Tiny3 4 0 Use 0 for strings
然后,列中的每个值都将是Python int或float并支持算术。
16-numpy笔记-莫烦pandas-4
代码
import pandas as pd
import numpy as np
dates = pd.date_range(''20130101'', periods=6)
df=pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=[''A'',''B'',''C'',''D''])
# 行数,列数,赋值
df.iloc[0,1] = np.nan
df.iloc[1,2] = np.nan
# 以行丢掉
print(''-1-'')
print(df.dropna(axis=0))
# 有nan就丢 这是默认情况
print(''-2-'')
print(df.dropna(axis=0, how=''any''))
# 全是nan再丢
print(''-3-'')
print(df.dropna(axis=0, how=''all''))
# 填上
print(''-4-'')
print(df.fillna(value=0))
# 判断每个的结果
print(''-5-'')
print(df.isnull())
# 整体内是不是有null
print(''-6-'')
print(np.any(df.isnull()) == True)
# 读取保存数据 read_csv to_csv
df1 = pd.DataFrame(np.ones((3,4))*0,columns=[''a'',''b'',''c'',''d''])
df2 = pd.DataFrame(np.ones((3,4))*1,columns=[''a'',''b'',''c'',''d''])
df3 = pd.DataFrame(np.ones((3,4))*2,columns=[''a'',''b'',''c'',''d''])
print(''-7-'')
print(df1)
print(df2)
print(df3)
# axis=0 竖向合并
res = pd.concat([df1,df2,df3], axis=0)
print(''-8-'')
print(res)
res = pd.concat([df1,df2,df3], axis=0, ignore_index=True)
print(''-9-'')
print(res)
df1 = pd.DataFrame(np.ones((3,4))*0,columns=[''a'',''b'',''c'',''d''],index=[1,2,3])
df2 = pd.DataFrame(np.ones((3,4))*1,columns=[''b'',''c'',''d'',''e''],index=[2,3,4])
print(''-10-'')
print(df1)
print(df2)
# 组合模式
res = pd.concat([df1,df2])
print(''-11-'')
print(res)
# defalut 并集
res = pd.concat([df1,df2], join=''outer'')
print(''-12-'')
print(res)
# 交集
res = pd.concat([df1,df2], join=''inner'')
print(''-13-'')
print(res)
res = pd.concat([df1,df2], join=''inner'', ignore_index=True)
print(''-14-'')
print(res)
# axis=1 左右合并 只考虑df1的index
res = pd.concat([df1,df2], axis=1,join_axes=[df1.index])
print(''-15-'')
print(res)
# axis=1 左右合并
res = pd.concat([df1,df2], axis=1)
print(''-16-'')
print(res)
df1 = pd.DataFrame(np.ones((3,4))*0,columns=[''a'',''b'',''c'',''d''])
df2 = pd.DataFrame(np.ones((3,4))*1,columns=[''a'',''b'',''c'',''d''])
df3 = pd.DataFrame(np.ones((3,4))*2,columns=[''b'',''c'',''d'',''e''],index=[2,3,4])
res = df1.append(df2, ignore_index=True)
print(''-17-'')
print(res)
res = df1.append([df2, df3], ignore_index=True)
print(''-18-'')
print(res)
s1 = pd.Series([1,2,3,4], index=[''a'',''b'',''c'',''d''])
res = df1.append(s1,ignore_index=True)
print(''-19-'')
print(res)
输出
-1-
A B C D
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
-2-
A B C D
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
-3-
A B C D
2013-01-01 0 NaN 2.0 3
2013-01-02 4 5.0 NaN 7
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
-4-
A B C D
2013-01-01 0 0.0 2.0 3
2013-01-02 4 5.0 0.0 7
2013-01-03 8 9.0 10.0 11
2013-01-04 12 13.0 14.0 15
2013-01-05 16 17.0 18.0 19
2013-01-06 20 21.0 22.0 23
-5-
A B C D
2013-01-01 False True False False
2013-01-02 False False True False
2013-01-03 False False False False
2013-01-04 False False False False
2013-01-05 False False False False
2013-01-06 False False False False
-6-
True
-7-
a b c d
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
a b c d
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0
a b c d
0 2.0 2.0 2.0 2.0
1 2.0 2.0 2.0 2.0
2 2.0 2.0 2.0 2.0
-8-
a b c d
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0
0 2.0 2.0 2.0 2.0
1 2.0 2.0 2.0 2.0
2 2.0 2.0 2.0 2.0
-9-
a b c d
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0
6 2.0 2.0 2.0 2.0
7 2.0 2.0 2.0 2.0
8 2.0 2.0 2.0 2.0
-10-
a b c d
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0
b c d e
2 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
d:\Alex\WorkLog\34-deeplearning\myWorks\TransferLearningExample\mofangTransferLearning\1.py:62: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass ''sort=True''.
To retain the current behavior and silence the warning, pass sort=False
res = pd.concat([df1,df2])
-11-
a b c d e
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 0.0 0.0 0.0 0.0 NaN
2 NaN 1.0 1.0 1.0 1.0
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0
d:\Alex\WorkLog\34-deeplearning\myWorks\TransferLearningExample\mofangTransferLearning\1.py:66: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass ''sort=True''.
To retain the current behavior and silence the warning, pass sort=False
res = pd.concat([df1,df2], join=''outer'')
-12-
a b c d e
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 0.0 0.0 0.0 0.0 NaN
2 NaN 1.0 1.0 1.0 1.0
3 NaN 1.0 1.0 1.0 1.0
4 NaN 1.0 1.0 1.0 1.0
-13-
b c d
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
2 1.0 1.0 1.0
3 1.0 1.0 1.0
4 1.0 1.0 1.0
-14-
b c d
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 1.0 1.0 1.0
4 1.0 1.0 1.0
5 1.0 1.0 1.0
-15-
a b c d b c d e
1 0.0 0.0 0.0 0.0 NaN NaN NaN NaN
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
-16-
a b c d b c d e
1 0.0 0.0 0.0 0.0 NaN NaN NaN NaN
2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0
4 NaN NaN NaN NaN 1.0 1.0 1.0 1.0
-17-
a b c d
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:6201: FutureWarning: Sorting because non-concatenation axis
is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass ''sort=True''.
To retain the current behavior and silence the warning, pass sort=False
sort=sort)
-18-
a b c d e
0 0.0 0.0 0.0 0.0 NaN
1 0.0 0.0 0.0 0.0 NaN
2 0.0 0.0 0.0 0.0 NaN
3 1.0 1.0 1.0 1.0 NaN
4 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 1.0 NaN
6 NaN 2.0 2.0 2.0 2.0
7 NaN 2.0 2.0 2.0 2.0
8 NaN 2.0 2.0 2.0 2.0
-19-
a b c d
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
3 1.0 2.0 3.0 4.0
Numpy Pandas
数据分析中计算比py中自带字典要快的模块 Numpy和Pandas是基于C编写的,运用大量矩阵,可以避免计算成本高的问题,速度成倍数加快。 在Tensorflow、机器学习等领域均适用。
Numpy安装 Google搜素:www.numpy.org/ Getting Numpy 找到SourceForge 下载numpy
Anaconda全家桶一键拥有全世界 (或者选择Miniconda)
windows终端输入 pip3 install numpy pip3 install pandas
numpy – 在pandas 0.10.1上使用pandas.read_csv指定dtype float32
我已经把我对read_csv的一些复杂的调用归结为这个简单的测试用例。我实际上在我的“真实”场景中使用转换器的参数,但我删除了为简单。
下面是我的ipython会话:
>>> cat test.out a b 0.76398 0.81394 0.32136 0.91063 >>> import pandas >>> import numpy >>> x = pandas.read_csv('test.out',dtype={'a': numpy.float32},delim_whitespace=True) >>> x a b 0 0.76398 0.81394 1 0.32136 0.91063 >>> x.a.dtype dtype('float64')
我也试过这个用numpy.int32或numpy.int64的dtype。这些选择导致异常:
AttributeError: 'nonetype' object has no attribute 'dtype'
我假设AttributeError是因为pandas不会自动尝试转换/截断浮点值为整数?
我在一个32位的机器上运行32位版本的Python。
>>> !uname -a Linux ubuntu 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011 i686 i686 i386 GNU/Linux >>> import platform >>> platform.architecture() ('32bit','ELF') >>> pandas.__version__ '0.10.1'
解决方法
见http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification
你可以在0.11这样做:
# dont' use dtype converters explicity for the columns you care about # they will be converted to float64 if possible,or object if they cannot df = pd.read_csv('test.csv'.....) #### this is optional and related to the issue you posted #### # force anything that is not a numeric to nan # columns are the list of columns that you are interesetd in df[columns] = df[columns].convert_objects(convert_numeric=True) # astype df[columns] = df[columns].astype('float32') see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion Its not as efficient as doing it directly in read_csv (but that requires
我已经确认用0.11-dev,这个DOES工作(对32位和64位,结果是一样的)
In [5]: x = pd.read_csv(StringIO.StringIO(data),dtype={'a': np.float32},delim_whitespace=True) In [6]: x Out[6]: a b 0 0.76398 0.81394 1 0.32136 0.91063 In [7]: x.dtypes Out[7]: a float32 b float64 dtype: object In [8]: pd.__version__ Out[8]: '0.11.0.dev-385ff82' In [9]: quit() vagrant@precise32:~/pandas$ uname -a Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux some low-level changes)
numpy.random.random & numpy.ndarray.astype & numpy.arange
今天看到这样一句代码:
xb = np.random.random((nb, d)).astype(''float32'') #创建一个二维随机数矩阵(nb行d列)
xb[:, 0] += np.arange(nb) / 1000. #将矩阵第一列的每个数加上一个值
要理解这两句代码需要理解三个函数
1、生成随机数
numpy.random.random(size=None)
size为None时,返回float。
size不为None时,返回numpy.ndarray。例如numpy.random.random((1,2)),返回1行2列的numpy数组
2、对numpy数组中每一个元素进行类型转换
numpy.ndarray.astype(dtype)
返回numpy.ndarray。例如 numpy.array([1, 2, 2.5]).astype(int),返回numpy数组 [1, 2, 2]
3、获取等差数列
numpy.arange([start,]stop,[step,]dtype=None)
功能类似python中自带的range()和numpy中的numpy.linspace
返回numpy数组。例如numpy.arange(3),返回numpy数组[0, 1, 2]
我们今天的关于以numpy或pandas处理巨大数字和numpy 将大于某个值设置为0的分享已经告一段落,感谢您的关注,如果您想了解更多关于16-numpy笔记-莫烦pandas-4、Numpy Pandas、numpy – 在pandas 0.10.1上使用pandas.read_csv指定dtype float32、numpy.random.random & numpy.ndarray.astype & numpy.arange的相关信息,请在本站查询。
本文标签: