GVKun编程网logo

Python Pandas:将选定的列保留为DataFrame而不是Series(pandas将某一列变为列表)

12

最近很多小伙伴都在问PythonPandas:将选定的列保留为DataFrame而不是Series和pandas将某一列变为列表这两个问题,那么本篇文章就来给大家详细解答一下,同时本文还将给你拓展Pa

最近很多小伙伴都在问Python Pandas:将选定的列保留为DataFrame而不是Seriespandas将某一列变为列表这两个问题,那么本篇文章就来给大家详细解答一下,同时本文还将给你拓展Pandas基本操作:Series和DataFrame(Python)、Python / Pandas:如何将字符串列表与DataFrame列匹配、Python Pandas -- DataFrame、Python pandas dataframe等相关知识,下面开始了哦!

本文目录一览:

Python Pandas:将选定的列保留为DataFrame而不是Series(pandas将某一列变为列表)

Python Pandas:将选定的列保留为DataFrame而不是Series(pandas将某一列变为列表)

从pandas DataFrame中选择单个列时(例如df.iloc[:,0]df[''A'']df.A等),结果矢量将自动转换为Series而不是单列DataFrame。但是,我正在编写一些将DataFrame作为输入参数的函数。因此,我更喜欢处理单列DataFrame而不是Series,以便函数可以假定df.columns是可访问的。现在,我必须使用来将Series显式转换为DataFrame
pd.DataFrame(df.iloc[:,0])。这似乎不是最干净的方法。是否有更优雅的方法直接从DataFrame进行索引,以使结果为单列DataFrame而不是Series?

答案1

小编典典

正如@Jeff提到的,有几种方法可以做到这一点,但我建议使用loc / iloc来使其更明确(如果尝试歧义,请提早出错):

In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=[''A'', ''B''])In [11]: dfOut[11]:   A  B0  1  21  3  4In [12]: df[[''A'']]In [13]: df[[0]]In [14]: df.loc[:, [''A'']]In [15]: df.iloc[:, [0]]Out[12-15]:  # they all return the same thing:   A0  11  3

在整数列名称的情况下,后两种选择消除了歧义(正是创建loc / iloc的原因)。例如:

In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=[''A'', 0])In [17]: dfOut[17]:   A  00  1  21  3  4In [18]: df[[0]]  # ambiguousOut[18]:   A0  11  3

Pandas基本操作:Series和DataFrame(Python)

Pandas基本操作:Series和DataFrame(Python)

直接上代码

Series

import pandas

# print(pandas.Series([232, 455, 2, 3456, 2]))

t = pandas.Series([15,2,3,4,5],index=list("abcde"))


# print(t["c"])

# print(t[1:4])

# print(t[[1,4]])

# print(t[t>2])

print(t.values)
DataFrame
import numpy
import pandas
numpy.random.seed(9)
t = pandas.DataFrame(numpy.random.random(40).reshape(10,4),
                     index=list("abcdefghij"),columns=list("ABCD"))

# print(t)  #同理,t也可以是字典,或者字典构成的列表

# print(t.index)

# print(t.columns)

# print(t.values)

# print(t["D"].mean())

# print(t.shape)

# print(t.dtypes)

# print(t.ndim)

# print(t.info())

# print(t.describe())

# print(t.sort_values(by = "e", ascending= False))

# print(t[:7])   #取前7行

# print(t["B"])   #取列
# print(type(t["B"]))

# print(t.loc["h", :])  #用loc的各种切片。这里注意loc后面是[]
# print(t.loc[["h","a"], ["B","D"]])
# print(t.loc[["h","a"], "A":"C"])
# print(t.iloc[1:8,[3,1]])  #用iloc切片,直接用数字索引

# t = t.iloc[1:4,[3,2,1]]   #测试下赋值
# print(t)
# t[t>0.5]=numpy.NaN
# print(t)

# print(t[(t["D"]>0.2)&(t["D"]<0.8)])  #带条件切片,与条件
# print(t[(t["A"]>0.8)|(t["D"]>0.8)])    #带条件切片,或条件

# t = t[t>0.5]
# t2 = pandas.notnull(t)   #False为NaN
# # print(t)
# # print(t2)
# # print(t.dropna(how="all"))   #删除NaN
# print(t.fillna(8888))  #填充NaN

Python / Pandas:如何将字符串列表与DataFrame列匹配

Python / Pandas:如何将字符串列表与DataFrame列匹配

我想比较两个columnnDescriptionEmployer。我想查看是否Employer在该Description列中找到任何关键字。我已将该Employer列分解为单词,并转换为列表。现在,我想看看这些单词中是否有相应的Description列。

输入样例:

print(df.head(25))          Date           Description   Amount  AutoNumber  \0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246   2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246   5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246   9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246   10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246   11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246   12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246   13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246   14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246   15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246   17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246   19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246   20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246   21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246   22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246   23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246   24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246   26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246   27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246   28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246                   Employer  0   Cansel Survey Equipment  2   Cansel Survey Equipment  5   Cansel Survey Equipment  9   Cansel Survey Equipment  10  Cansel Survey Equipment  11  Cansel Survey Equipment  12  Cansel Survey Equipment  13  Cansel Survey Equipment  14  Cansel Survey Equipment  15  Cansel Survey Equipment  17  Cansel Survey Equipment  19  Cansel Survey Equipment  20  Cansel Survey Equipment  21  Cansel Survey Equipment  22  Cansel Survey Equipment  23  Cansel Survey Equipment  24  Cansel Survey Equipment  26  Cansel Survey Equipment  27  Cansel Survey Equipment  28  Cansel Survey Equipment

我尝试过类似的方法,但似乎不起作用。

df[''Text_Search''] = df[''Employer''].apply(lambda x: x.split(" "))df[''Match''] = np.where(df[''Description''].str.contains("|".join(df[''Text_Search''])), "Yes", "No")

我想要的输出如下所示:

          Date           Description   Amount  AutoNumber  \0    3/17/2015  WW120 TFR?FR xxx8690   140.00       49246   2    3/13/2015  JX154 TFR?FR xxx8690   150.00       49246   5     3/6/2015   CANSEL SURVEY E PAY  1182.08       49246   9     3/2/2015  UE200 TFR?FR xxx8690   180.00       49246   10   2/27/2015  JH401 TFR?FR xxx8690   400.00       49246   11   2/27/2015   CANSEL SURVEY E PAY   555.62       49246   12   2/25/2015  HU204 TFR?FR xxx8690   200.00       49246   13   2/23/2015  UQ263 TFR?FR xxx8690   102.00       49246   14   2/23/2015  UT460 TFR?FR xxx8690   200.00       49246   15   2/20/2015   CANSEL SURVEY E PAY  1222.05       49246   17   2/17/2015  UO414 TFR?FR xxx8690   250.00       49246   19   2/11/2015  HI540 TFR?FR xxx8690   130.00       49246   20   2/11/2015  HQ010 TFR?FR xxx8690   177.00       49246   21   2/10/2015  WU455 TFR?FR xxx8690   200.00       49246   22    2/6/2015  JJ500 TFR?FR xxx8690   301.00       49246   23    2/6/2015   CANSEL SURVEY E PAY  1182.08       49246   24    2/5/2015  IR453 TFR?FR xxx8690   168.56       49246   26    2/2/2015  RQ574 TFR?FR xxx8690   500.00       49246   27    2/2/2015  UT022 TFR?FR xxx8690   850.00       49246   28  12/31/2014  HU521 TFR?FR xxx8690   950.17       49246   29  12/30/2014  WZ553 TFR?FR xxx8690   200.00       49246   32  12/29/2014  JW173 TFR?FR xxx8690   300.00       49246   33  12/24/2014   CANSEL SURVEY E PAY  1219.21       49246   34  12/24/2014   CANSEL SURVEY E PAY   434.84       49246   36  12/23/2014  WT002 TFR?FR xxx8690   160.00       49246                   Employer                  Text_Search Match  0   Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  2   Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  5   Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes 9   Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  10  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  11  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  12  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  13  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  14  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  15  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  17  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  19  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  20  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  21  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  22  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  23  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  24  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  26  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  27  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  28  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  29  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  32  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No  33  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  34  Cansel Survey Equipment  [Cansel, Survey, Equipment]    Yes  36  Cansel Survey Equipment  [Cansel, Survey, Equipment]    No

答案1

小编典典

这是使用个人的可读解决方案search_func

def search_func(row):    matches = [test_value in row["Description"].lower()                for test_value in row["Text_Search"]]    if any(matches):        return "Yes"    else:        return "No"

然后按行应用此函数:

# create example datadf = pd.DataFrame({"Description": ["CANSEL SURVEY E PAY", "JX154 TFR?FR xxx8690"],                   "Employer": ["Cansel Survey Equipment", "Cansel Survey Equipment"]})print(df)    Description             Employer0   CANSEL SURVEY E PAY     Cansel Survey Equipment1   JX154 TFR?FR xxx8690    Cansel Survey Equipment# create text searches and match columndf["Text_Search"] = df["Employer"].str.lower().str.split()df["Match"] = df.apply(search_func, axis=1)# show resultprint(df)    Description             Employer                    Text_Search                     Match0   CANSEL SURVEY E PAY     Cansel Survey Equipment     [cansel, survey, equipment]     Yes1   JX154 TFR?FR xxx8690    Cansel Survey Equipment     [cansel, survey, equipment]     No

Python Pandas -- DataFrame

Python Pandas -- DataFrame

pandas.DataFrame

class  pandas. DataFrame (data=Noneindex=Nonecolumns=Nonedtype=Nonecopy=False)[source]

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

Parameters:

data : numpy ndarray (structured or homogeneous), dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects

index : Index or array-like

Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided

columns : Index or array-like

Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

dtype : dtype, default None

Data type to force. Only a single dtype is allowed. If None, infer

copy : boolean, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input

See also

DataFrame.from_records
constructor from tuples, also record arrays
DataFrame.from_dict
from dicts of Series, arrays, or dicts
DataFrame.from_items
from sequence of (key, value) pairs

pandas.read_csvpandas.read_tablepandas.read_clipboard

1. 先来个小菜

  基于dictionary创建

from pandas import Series, DataFrame
import pandas as pd  
import numpy as np
d = {''col1'':[1,2],''col2'':[3,4]}
df = pd.DataFrame(data=d)
print(df)
print(df.dtypes)
#   col1  col2
#0     1     3
#1     2     4
#col1    int64
#col2    int64
#dtype: object

基于Numy的ndarrary

df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),columns=[''a'', ''b'', ''c'', ''d'', ''e''])
print (df2)
#   a  b  c  d  e
#0  0  2  4  7  0
#1  6  7  3  4  1
#2  5  3  3  8  7
#3  0  9  4  3  4
#4  7  4  7  0  0

 

Python pandas dataframe

Python pandas dataframe

 dataframe 列类型

df['客户id'] = df['客户id'].apply(pd.to_numeric)
df = pd.DataFrame(a, dtype='float')  #示例1
df = pd.DataFrame(data=d, dtype=np.int8) #示例2
df = pd.read_csv("somefile.csv", dtype = {'column_name' : str})
df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)
df[['two', 'three']] = df[['two', 'three']].astype(float)

    df.dtypes

    type(mydata[0][0])

   维度查看:df.shape
   数据表基本信息(维度、列名称、数据格式、所占空间等):df.info()
   每一列数据的格式:df.dtypes
   某一列格式:df['B'].dtype

文件操作

DataFrame 数据的保存和读取

  • df.to_csv 写入到 csv 文件
  • pd.read_csv 读取 csv 文件
  • df.to_json 写入到 json 文件
  • pd.read_json 读取 json 文件
  • df.to_html 写入到 html 文件
  • pd.read_html 读取 html 文件
  • df.to_excel 写入到 excel 文件
  • pd.read_excel 读取 excel 文件

pandas.DataFrame.to_csv
将 DataFrame 写入到 csv 文件
    DataFrame.to_csv(path_or_buf=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True,
                     index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"',
                     line_terminator='\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True,
                     escapechar=None, decimal='.')

参数:
    path_or_buf : 文件路径,如果没有指定则将会直接返回字符串的 json
    sep : 输出文件的字段分隔符,默认为 “,”
    na_rep : 用于替换空数据的字符串,默认为''
    float_format : 设置浮点数的格式(几位小数点)
    columns : 要写的列
    header : 是否保存列名,默认为 True ,保存
    index : 是否保存索引,默认为 True ,保存
    index_label : 索引的列标签名

 

 

 

条件筛选

单条件筛选

选取col1列的取值大于n的记录: data[data['col1']>n]
筛选col1列的取值大于n的记录,但是显示col2,col3列的值: data[['col2','col3']][data['col1']>n]
选择特定行:使用isin函数根据特定值筛选记录。筛选col1值等于list中元素的记录: data[data.col1.isin(list)]
 
 

多条件筛选

可以使用&(并)与| (或)操作符或者特定的函数实现多条件筛选
使用&筛选col1列的取值大于n,col2列的取值大于m的记录:data[(data['col1'] > n) & (data['col2'] > m)]
使用numpy的logical_and函数完成同样的功能:data[np.logical_and(data['col1']> n,data['col2']>m)]
 
 

索引筛选

切片操作

使用切片操作选择特定的行: data[n:m]
传入列名选择特定的列: data[['col1','col2']]
 

loc函数[行用序号,列用名称]

当每列已有column name时,用 data ['col1'] 就能选取出一整列数据。如果你知道column names 和index,可以选择 .loc同时进行行列选择: data.loc[index,'colum_names']
 

iloc函数[行用序号,列用序号]

使用方法同loc函数,但是不再输入列名,而是输入列的index: data.iloc[row_index,col_index]

 

ix函数

ix的功能更加强大,参数既可以是索引,也可以是名称,相当于,loc和iloc的合体。需要注意的是在使用的时候需要统一,在行选择时同时出现索引和名称, 同样在同行选择时同时出现索引和名称: data.ix[n:m,['col1','col2']]
但是在最新的版本中,ix函数不建议使用
 
 

at函数

根据指定行index及列label,快速定位DataFrame的元素,选择列时仅支持列名:data.at[row_index,'column_names']
 
 

iat函数

与at的功能相同,只使用索引参数:data.iat[row_index,column_index]

 

 

df.set_index('month')    

df.set_index(['year','month'])     

DataFrame.columns = [newName]

df['Hour'] = pd.to_datetime(df['report_date'])

df.rename(index = str,column = new_names)

 

删除列

#通过特征选取
data = data[['age']]

#通过del 关键字
del  data['name']

#通过drop函数
data.drop(['name'],axis=1, inplace=True)

#通过pop
data.pop('name')

 

df = pd.read_csv(INPUTFILE, encoding = "utf-8")

df_bio = pd.read_csv(INPUTFILE, encoding = "utf-8", header=None) # header=None, header=0

 

显示前几行

df.head()

显示后几行

df.tail()

 

删除重复的数据
isDuplicated=df.duplicated() #判断重复数据记录
print(isDuplicated)
0    False
1    False
2     True
3    False
dtype: bool

#删除重复的数据
print(df.drop_duplicates()) #删除所有列值相同的记录,index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

print(df.drop_duplicates(['col1'])) #删除col1列值相同的记录,index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

print(df.drop_duplicates(['col2'])) #删除col2列值相同的记录,index为2和3的记录行被删除
  col1  col2
0    a     3
1    b     2

print(df.drop_duplicates(['col1','col2'])) #删除指定列(col1和col2)值相同的记录,index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

df 某一列字母转大写小写
df['列名'] = df['列名'].str.upper()

df['列名'] = df['列名'].str.lower()

 

 

REF

https://www.cnblogs.com/aro7/p/9748202.html

https://www.cnblogs.com/hankleo/p/11462532.html

今天关于Python Pandas:将选定的列保留为DataFrame而不是Seriespandas将某一列变为列表的介绍到此结束,谢谢您的阅读,有关Pandas基本操作:Series和DataFrame(Python)、Python / Pandas:如何将字符串列表与DataFrame列匹配、Python Pandas -- DataFrame、Python pandas dataframe等更多相关知识的信息可以在本站进行查询。

本文标签: