最近很多小伙伴都在问PythonPandas:将选定的列保留为DataFrame而不是Series和pandas将某一列变为列表这两个问题,那么本篇文章就来给大家详细解答一下,同时本文还将给你拓展Pa
最近很多小伙伴都在问Python Pandas:将选定的列保留为DataFrame而不是Series和pandas将某一列变为列表这两个问题,那么本篇文章就来给大家详细解答一下,同时本文还将给你拓展Pandas基本操作:Series和DataFrame(Python)、Python / Pandas:如何将字符串列表与DataFrame列匹配、Python Pandas -- DataFrame、Python pandas dataframe等相关知识,下面开始了哦!
本文目录一览:- Python Pandas:将选定的列保留为DataFrame而不是Series(pandas将某一列变为列表)
- Pandas基本操作:Series和DataFrame(Python)
- Python / Pandas:如何将字符串列表与DataFrame列匹配
- Python Pandas -- DataFrame
- Python pandas dataframe
Python Pandas:将选定的列保留为DataFrame而不是Series(pandas将某一列变为列表)
从pandas DataFrame中选择单个列时(例如df.iloc[:,0]
,df[''A'']
或df.A
等),结果矢量将自动转换为Series而不是单列DataFrame。但是,我正在编写一些将DataFrame作为输入参数的函数。因此,我更喜欢处理单列DataFrame而不是Series,以便函数可以假定df.columns是可访问的。现在,我必须使用来将Series显式转换为DataFramepd.DataFrame(df.iloc[:,0])
。这似乎不是最干净的方法。是否有更优雅的方法直接从DataFrame进行索引,以使结果为单列DataFrame而不是Series?
答案1
小编典典正如@Jeff提到的,有几种方法可以做到这一点,但我建议使用loc / iloc来使其更明确(如果尝试歧义,请提早出错):
In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=[''A'', ''B''])In [11]: dfOut[11]: A B0 1 21 3 4In [12]: df[[''A'']]In [13]: df[[0]]In [14]: df.loc[:, [''A'']]In [15]: df.iloc[:, [0]]Out[12-15]: # they all return the same thing: A0 11 3
在整数列名称的情况下,后两种选择消除了歧义(正是创建loc / iloc的原因)。例如:
In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=[''A'', 0])In [17]: dfOut[17]: A 00 1 21 3 4In [18]: df[[0]] # ambiguousOut[18]: A0 11 3
Pandas基本操作:Series和DataFrame(Python)
直接上代码
Series
import pandas
# print(pandas.Series([232, 455, 2, 3456, 2]))
t = pandas.Series([15,2,3,4,5],index=list("abcde"))
# print(t["c"])
# print(t[1:4])
# print(t[[1,4]])
# print(t[t>2])
print(t.values)
DataFrame
import numpy
import pandas
numpy.random.seed(9)
t = pandas.DataFrame(numpy.random.random(40).reshape(10,4),
index=list("abcdefghij"),columns=list("ABCD"))
# print(t) #同理,t也可以是字典,或者字典构成的列表
# print(t.index)
# print(t.columns)
# print(t.values)
# print(t["D"].mean())
# print(t.shape)
# print(t.dtypes)
# print(t.ndim)
# print(t.info())
# print(t.describe())
# print(t.sort_values(by = "e", ascending= False))
# print(t[:7]) #取前7行
# print(t["B"]) #取列
# print(type(t["B"]))
# print(t.loc["h", :]) #用loc的各种切片。这里注意loc后面是[]
# print(t.loc[["h","a"], ["B","D"]])
# print(t.loc[["h","a"], "A":"C"])
# print(t.iloc[1:8,[3,1]]) #用iloc切片,直接用数字索引
# t = t.iloc[1:4,[3,2,1]] #测试下赋值
# print(t)
# t[t>0.5]=numpy.NaN
# print(t)
# print(t[(t["D"]>0.2)&(t["D"]<0.8)]) #带条件切片,与条件
# print(t[(t["A"]>0.8)|(t["D"]>0.8)]) #带条件切片,或条件
# t = t[t>0.5]
# t2 = pandas.notnull(t) #False为NaN
# # print(t)
# # print(t2)
# # print(t.dropna(how="all")) #删除NaN
# print(t.fillna(8888)) #填充NaN
Python / Pandas:如何将字符串列表与DataFrame列匹配
我想比较两个columnnDescription
和Employer
。我想查看是否Employer
在该Description
列中找到任何关键字。我已将该Employer
列分解为单词,并转换为列表。现在,我想看看这些单词中是否有相应的Description
列。
输入样例:
print(df.head(25)) Date Description Amount AutoNumber \0 3/17/2015 WW120 TFR?FR xxx8690 140.00 49246 2 3/13/2015 JX154 TFR?FR xxx8690 150.00 49246 5 3/6/2015 CANSEL SURVEY E PAY 1182.08 49246 9 3/2/2015 UE200 TFR?FR xxx8690 180.00 49246 10 2/27/2015 JH401 TFR?FR xxx8690 400.00 49246 11 2/27/2015 CANSEL SURVEY E PAY 555.62 49246 12 2/25/2015 HU204 TFR?FR xxx8690 200.00 49246 13 2/23/2015 UQ263 TFR?FR xxx8690 102.00 49246 14 2/23/2015 UT460 TFR?FR xxx8690 200.00 49246 15 2/20/2015 CANSEL SURVEY E PAY 1222.05 49246 17 2/17/2015 UO414 TFR?FR xxx8690 250.00 49246 19 2/11/2015 HI540 TFR?FR xxx8690 130.00 49246 20 2/11/2015 HQ010 TFR?FR xxx8690 177.00 49246 21 2/10/2015 WU455 TFR?FR xxx8690 200.00 49246 22 2/6/2015 JJ500 TFR?FR xxx8690 301.00 49246 23 2/6/2015 CANSEL SURVEY E PAY 1182.08 49246 24 2/5/2015 IR453 TFR?FR xxx8690 168.56 49246 26 2/2/2015 RQ574 TFR?FR xxx8690 500.00 49246 27 2/2/2015 UT022 TFR?FR xxx8690 850.00 49246 28 12/31/2014 HU521 TFR?FR xxx8690 950.17 49246 Employer 0 Cansel Survey Equipment 2 Cansel Survey Equipment 5 Cansel Survey Equipment 9 Cansel Survey Equipment 10 Cansel Survey Equipment 11 Cansel Survey Equipment 12 Cansel Survey Equipment 13 Cansel Survey Equipment 14 Cansel Survey Equipment 15 Cansel Survey Equipment 17 Cansel Survey Equipment 19 Cansel Survey Equipment 20 Cansel Survey Equipment 21 Cansel Survey Equipment 22 Cansel Survey Equipment 23 Cansel Survey Equipment 24 Cansel Survey Equipment 26 Cansel Survey Equipment 27 Cansel Survey Equipment 28 Cansel Survey Equipment
我尝试过类似的方法,但似乎不起作用。
df[''Text_Search''] = df[''Employer''].apply(lambda x: x.split(" "))df[''Match''] = np.where(df[''Description''].str.contains("|".join(df[''Text_Search''])), "Yes", "No")
我想要的输出如下所示:
Date Description Amount AutoNumber \0 3/17/2015 WW120 TFR?FR xxx8690 140.00 49246 2 3/13/2015 JX154 TFR?FR xxx8690 150.00 49246 5 3/6/2015 CANSEL SURVEY E PAY 1182.08 49246 9 3/2/2015 UE200 TFR?FR xxx8690 180.00 49246 10 2/27/2015 JH401 TFR?FR xxx8690 400.00 49246 11 2/27/2015 CANSEL SURVEY E PAY 555.62 49246 12 2/25/2015 HU204 TFR?FR xxx8690 200.00 49246 13 2/23/2015 UQ263 TFR?FR xxx8690 102.00 49246 14 2/23/2015 UT460 TFR?FR xxx8690 200.00 49246 15 2/20/2015 CANSEL SURVEY E PAY 1222.05 49246 17 2/17/2015 UO414 TFR?FR xxx8690 250.00 49246 19 2/11/2015 HI540 TFR?FR xxx8690 130.00 49246 20 2/11/2015 HQ010 TFR?FR xxx8690 177.00 49246 21 2/10/2015 WU455 TFR?FR xxx8690 200.00 49246 22 2/6/2015 JJ500 TFR?FR xxx8690 301.00 49246 23 2/6/2015 CANSEL SURVEY E PAY 1182.08 49246 24 2/5/2015 IR453 TFR?FR xxx8690 168.56 49246 26 2/2/2015 RQ574 TFR?FR xxx8690 500.00 49246 27 2/2/2015 UT022 TFR?FR xxx8690 850.00 49246 28 12/31/2014 HU521 TFR?FR xxx8690 950.17 49246 29 12/30/2014 WZ553 TFR?FR xxx8690 200.00 49246 32 12/29/2014 JW173 TFR?FR xxx8690 300.00 49246 33 12/24/2014 CANSEL SURVEY E PAY 1219.21 49246 34 12/24/2014 CANSEL SURVEY E PAY 434.84 49246 36 12/23/2014 WT002 TFR?FR xxx8690 160.00 49246 Employer Text_Search Match 0 Cansel Survey Equipment [Cansel, Survey, Equipment] No 2 Cansel Survey Equipment [Cansel, Survey, Equipment] No 5 Cansel Survey Equipment [Cansel, Survey, Equipment] Yes 9 Cansel Survey Equipment [Cansel, Survey, Equipment] No 10 Cansel Survey Equipment [Cansel, Survey, Equipment] No 11 Cansel Survey Equipment [Cansel, Survey, Equipment] Yes 12 Cansel Survey Equipment [Cansel, Survey, Equipment] No 13 Cansel Survey Equipment [Cansel, Survey, Equipment] No 14 Cansel Survey Equipment [Cansel, Survey, Equipment] No 15 Cansel Survey Equipment [Cansel, Survey, Equipment] Yes 17 Cansel Survey Equipment [Cansel, Survey, Equipment] No 19 Cansel Survey Equipment [Cansel, Survey, Equipment] No 20 Cansel Survey Equipment [Cansel, Survey, Equipment] No 21 Cansel Survey Equipment [Cansel, Survey, Equipment] No 22 Cansel Survey Equipment [Cansel, Survey, Equipment] No 23 Cansel Survey Equipment [Cansel, Survey, Equipment] Yes 24 Cansel Survey Equipment [Cansel, Survey, Equipment] No 26 Cansel Survey Equipment [Cansel, Survey, Equipment] No 27 Cansel Survey Equipment [Cansel, Survey, Equipment] No 28 Cansel Survey Equipment [Cansel, Survey, Equipment] No 29 Cansel Survey Equipment [Cansel, Survey, Equipment] No 32 Cansel Survey Equipment [Cansel, Survey, Equipment] No 33 Cansel Survey Equipment [Cansel, Survey, Equipment] Yes 34 Cansel Survey Equipment [Cansel, Survey, Equipment] Yes 36 Cansel Survey Equipment [Cansel, Survey, Equipment] No
答案1
小编典典这是使用个人的可读解决方案search_func
:
def search_func(row): matches = [test_value in row["Description"].lower() for test_value in row["Text_Search"]] if any(matches): return "Yes" else: return "No"
然后按行应用此函数:
# create example datadf = pd.DataFrame({"Description": ["CANSEL SURVEY E PAY", "JX154 TFR?FR xxx8690"], "Employer": ["Cansel Survey Equipment", "Cansel Survey Equipment"]})print(df) Description Employer0 CANSEL SURVEY E PAY Cansel Survey Equipment1 JX154 TFR?FR xxx8690 Cansel Survey Equipment# create text searches and match columndf["Text_Search"] = df["Employer"].str.lower().str.split()df["Match"] = df.apply(search_func, axis=1)# show resultprint(df) Description Employer Text_Search Match0 CANSEL SURVEY E PAY Cansel Survey Equipment [cansel, survey, equipment] Yes1 JX154 TFR?FR xxx8690 Cansel Survey Equipment [cansel, survey, equipment] No
Python Pandas -- DataFrame
pandas.DataFrame
-
class
pandas.
DataFrame
(data=None, index=None, columns=None, dtype=None, copy=False)[source] -
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure
Parameters: data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
index : Index or array-like
Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided
columns : Index or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer
copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input
See also
-
DataFrame.from_records
- constructor from tuples, also record arrays
-
DataFrame.from_dict
- from dicts of Series, arrays, or dicts
-
DataFrame.from_items
- from sequence of (key, value) pairs
pandas.read_csv
,pandas.read_table
,pandas.read_clipboard
1. 先来个小菜
基于dictionary创建
from pandas import Series, DataFrame import pandas as pd import numpy as np d = {''col1'':[1,2],''col2'':[3,4]} df = pd.DataFrame(data=d) print(df) print(df.dtypes) # col1 col2 #0 1 3 #1 2 4 #col1 int64 #col2 int64 #dtype: object
基于Numy的ndarrary
df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),columns=[''a'', ''b'', ''c'', ''d'', ''e'']) print (df2) # a b c d e #0 0 2 4 7 0 #1 6 7 3 4 1 #2 5 3 3 8 7 #3 0 9 4 3 4 #4 7 4 7 0 0
-
Python pandas dataframe
dataframe 列类型
df['客户id'] = df['客户id'].apply(pd.to_numeric)
df = pd.DataFrame(a, dtype='float') #示例1
df = pd.DataFrame(data=d, dtype=np.int8) #示例2
df = pd.read_csv("somefile.csv", dtype = {'column_name' : str})
df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)
df[['two', 'three']] = df[['two', 'three']].astype(float)
df.dtypes
type(mydata[0][0])
维度查看:df.shape
数据表基本信息(维度、列名称、数据格式、所占空间等):df.info()
每一列数据的格式:df.dtypes
某一列格式:df['B'].dtype
文件操作
DataFrame 数据的保存和读取
- df.to_csv 写入到 csv 文件
- pd.read_csv 读取 csv 文件
- df.to_json 写入到 json 文件
- pd.read_json 读取 json 文件
- df.to_html 写入到 html 文件
- pd.read_html 读取 html 文件
- df.to_excel 写入到 excel 文件
- pd.read_excel 读取 excel 文件
pandas.DataFrame.to_csv
将 DataFrame 写入到 csv 文件
DataFrame.to_csv(path_or_buf=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True,
index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"',
line_terminator='\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True,
escapechar=None, decimal='.')
参数:
path_or_buf : 文件路径,如果没有指定则将会直接返回字符串的 json
sep : 输出文件的字段分隔符,默认为 “,”
na_rep : 用于替换空数据的字符串,默认为''
float_format : 设置浮点数的格式(几位小数点)
columns : 要写的列
header : 是否保存列名,默认为 True ,保存
index : 是否保存索引,默认为 True ,保存
index_label : 索引的列标签名
条件筛选
单条件筛选
多条件筛选
索引筛选
切片操作
loc函数[行用序号,列用名称]
iloc函数[行用序号,列用序号]
使用方法同loc函数,但是不再输入列名,而是输入列的index: data.iloc[row_index,col_index]
ix函数
at函数
iat函数
df.set_index(
'month'
)
df.set_index([
'year'
,
'month'
])
DataFrame.columns = [newName]
df['Hour'] = pd.to_datetime(df['report_date'])
df.rename(index = str,column = new_names)
删除列:
#通过特征选取
data = data[['age']]
#通过del 关键字
del data['name']
#通过drop函数
data.drop(['name'],axis=1, inplace=True)
#通过pop
data.pop('name')
df = pd.read_csv(INPUTFILE, encoding = "utf-8")
df_bio = pd.read_csv(INPUTFILE, encoding = "utf-8", header=None) # header=None, header=0
显示前几行
df.head()
显示后几行
df.tail()
删除重复的数据
isDuplicated=df.duplicated() #判断重复数据记录
print(isDuplicated)
0 False
1 False
2 True
3 False
dtype: bool
#删除重复的数据
print(df.drop_duplicates()) #删除所有列值相同的记录,index为2的记录行被删除
col1 col2
0 a 3
1 b 2
3 c 2
print(df.drop_duplicates(['col1'])) #删除col1列值相同的记录,index为2的记录行被删除
col1 col2
0 a 3
1 b 2
3 c 2
print(df.drop_duplicates(['col2'])) #删除col2列值相同的记录,index为2和3的记录行被删除
col1 col2
0 a 3
1 b 2
print(df.drop_duplicates(['col1','col2'])) #删除指定列(col1和col2)值相同的记录,index为2的记录行被删除
col1 col2
0 a 3
1 b 2
3 c 2
df 某一列字母转大写小写
df['列名'] = df['列名'].str.upper()
df['列名'] = df['列名'].str.lower()
REF
https://www.cnblogs.com/aro7/p/9748202.html
https://www.cnblogs.com/hankleo/p/11462532.html
今天关于Python Pandas:将选定的列保留为DataFrame而不是Series和pandas将某一列变为列表的介绍到此结束,谢谢您的阅读,有关Pandas基本操作:Series和DataFrame(Python)、Python / Pandas:如何将字符串列表与DataFrame列匹配、Python Pandas -- DataFrame、Python pandas dataframe等更多相关知识的信息可以在本站进行查询。
本文标签: