Python Pandas在dataFrame中计算百分比并将其放入列表的问题（pandas 如何求百分比）

25-01-25 22

如果您想了解PythonPandas在dataFrame中计算百分比并将其放入列表的问题和pandas如何求百分比的知识，那么本篇文章将是您的不二之选。我们将深入剖析PythonPandas在data

如果您想了解Python Pandas在dataFrame中计算百分比并将其放入列表的问题和pandas 如何求百分比的知识，那么本篇文章将是您的不二之选。我们将深入剖析Python Pandas在dataFrame中计算百分比并将其放入列表的问题的各个方面，并为您解答pandas 如何求百分比的疑在这篇文章中，我们将为您介绍Python Pandas在dataFrame中计算百分比并将其放入列表的问题的相关知识，同时也会详细的解释pandas 如何求百分比的运用方法，并给出实际的案例分析，希望能帮助到您！

本文目录一览：

Python Pandas在dataFrame中计算百分比并将其放入列表的问题（pandas 如何求百分比）
PANDAS中类似SQL的窗口函数：Python Pandas Dataframe中的行编号
pandas在DataFrame中获取给定索引的位置
Python Pandas -- DataFrame
Python pandas dataframe

Python Pandas在dataFrame中计算百分比并将其放入列表的问题（pandas 如何求百分比）

您可以尝试：

df.groupby('gender')['impressions'].apply(lambda x : (sum(x)/sum(df['impressions'])*100))


gender
female    57.0276
male      42.9724

和

df.groupby('gender')['impressions'].apply(lambda x : (sum(x)/sum(df['impressions'])*100)).to_list()


[57.02762682448004,42.972373175519957]

如果要使用所需的确切数据框，请将上面的内容另存为“ s”，然后执行以下操作：

s=df.groupby('gender')['impressions'].apply(lambda x : (sum(x)/sum(df['impressions'])*100))

pd.DataFrame(s).T

gender          female       male
impressions  57.027627  42.972373

您在这里：

df_agg = df.drop(['age'],axis=1).groupby('gender').sum()
print(df_agg['impressions']/df_agg['impressions'].sum()*100)

打印（根据您的数据可能有所不同）：

F    71.428571
M    28.571429
Name: impressions,dtype: float64

history

您可以尝试以下一种方法：

(df.groupby('gender').sum()['impressions']/df['impressions'].sum()).to_frame(name = 'impressions').T

PANDAS中类似SQL的窗口函数：Python Pandas Dataframe中的行编号

我来自sql背景，并且经常使用以下数据处理步骤：

按一个或多个字段对数据表进行分区
对于每个分区，在其每一行中添加一个行号，以行的一个或多个其他字段对行进行排名，分析人员在其中指定升序或降序

例如：

df = pd.DataFrame({''key1'' : [''a'',''a'',''a'',''b'',''a''],           ''data1'' : [1,2,2,3,3],           ''data2'' : [1,10,2,3,30]})df     data1        data2     key1    0    1            1         a           1    2            10        a        2    2            2         a       3    3            3         b       4    3            30        a

我正在寻找如何执行相当于此sql窗口函数的PANDAS：

RN = ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY Data1 ASC, Data2 DESC)    data1        data2     key1    RN0    1            1         a       1    1    2            10        a       2 2    2            2         a       33    3            3         b       14    3            30        a       4

我尝试了以下在没有“分区”的情况下必须工作的方法：

def row_number(frame,orderby_columns, orderby_direction,name):    frame.sort_index(by = orderby_columns, ascending = orderby_direction, inplace = True)    frame[name] = list(xrange(len(frame.index)))

我试图将这个想法扩展到可以使用分区（熊猫中的组），但是以下操作不起作用：

df1 = df.groupby(''key1'').apply(lambda t: t.sort_index(by=[''data1'', ''data2''], ascending=[True, False], inplace = True)).reset_index()def nf(x):    x[''rn''] = list(xrange(len(x.index)))df1[''rn1''] = df1.groupby(''key1'').apply(nf)

但是当我这样做时，我得到了很多NaN。

理想情况下，有一种简洁的方法可以复制sql的窗口函数功能（我已经弄清楚了基于窗口的聚合……这是熊猫的一个内衬）……有人可以和我分享最惯用的方法吗？在PANDAS中编号这样的行？

答案1

小编典典

您可以通过groupby与rank方法一起使用两次来做到这一点：

In [11]: g = df.groupby(''key1'')

使用min方法参数为共享相同RN的相同数据的值赋值：

In [12]: g[''data1''].rank(method=''min'')Out[12]:0    11    22    23    14    4dtype: float64In [13]: df[''RN''] = g[''data1''].rank(method=''min'')

然后对这些结果进行分组，并添加关于data2的排名：

In [14]: g1 = df.groupby([''key1'', ''RN''])In [15]: g1[''data2''].rank(ascending=False) - 1Out[15]:0    01    02    13    04    0dtype: float64In [16]: df[''RN''] += g1[''data2''].rank(ascending=False) - 1In [17]: dfOut[17]:   data1  data2 key1  RN0      1      1    a   11      2     10    a   22      2      2    a   33      3      3    b   14      3     30    a   4

感觉应该有一种本机的方法可以做到这一点（可能有！！）。

pandas在DataFrame中获取给定索引的位置

假设我有一个像这样的DataFrame：

df     A  B5    0  118   2  3125  4  5

5, 18, 125索引在哪里

我想在某个索引之前（或之后）得到该行。例如，我有索引18（例如，通过执行df[df.A==2].index），并且我想在之前获得该行，但我不知道该行具有5索引。

2个子问题：

如何获得索引的位置18？像df.loc[18].get_position()这样的东西会回来，1所以我可以在到达之前df.iloc[df.loc[18].get_position()-1]
有另一种解决办法，有点像选择-C，-A或者-B使用grep？

答案1

小编典典

对于第一个问题：

base = df.index.get_indexer_for((df[df.A == 2].index))

或者

base = df.index.get_loc(18)

要获得周围的环境：

mask = pd.Index(base).union(pd.Index(base - 1)).union(pd.Index(base + 1))

我使用索引和联合来删除重复项。您可能需要保留它们，在这种情况下可以使用np.concatenate

注意第一行或最后一行的匹配：)

Python Pandas -- DataFrame

pandas.DataFrame

class pandas. DataFrame (data=None, index=None, columns=None, dtype=None, copy=False)[source]

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

Parameters:	data : numpy ndarray (structured or homogeneous), dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects index : Index or array-like Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided columns : Index or array-like Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided dtype : dtype, default None Data type to force. Only a single dtype is allowed. If None, infer copy : boolean, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

Parameters:

data : numpy ndarray (structured or homogeneous), dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects

index : Index or array-like

Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided

columns : Index or array-like

Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

dtype : dtype, default None

Data type to force. Only a single dtype is allowed. If None, infer

copy : boolean, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input

条件筛选

单条件筛选

选取col1列的取值大于n的记录: data[data['col1']>n]

筛选col1列的取值大于n的记录,但是显示col2，col3列的值: data[['col2','col3']][data['col1']>n]

选择特定行：使用isin函数根据特定值筛选记录。筛选col1值等于list中元素的记录: data[data.col1.isin(list)]

多条件筛选

可以使用&(并)与| (或)操作符或者特定的函数实现多条件筛选

使用&筛选col1列的取值大于n，col2列的取值大于m的记录:data[(data['col1'] > n) & (data['col2'] > m)]

使用numpy的logical_and函数完成同样的功能:data[np.logical_and(data['col1']> n,data['col2']>m)]

索引筛选

切片操作

使用切片操作选择特定的行: data[n:m]

传入列名选择特定的列: data[['col1','col2']]

loc函数[行用序号，列用名称]

当每列已有column name时，用 data ['col1'] 就能选取出一整列数据。如果你知道column names 和index，可以选择 .loc同时进行行列选择: data.loc[index,'colum_names']

iloc函数[行用序号，列用序号]

使用方法同loc函数，但是不再输入列名，而是输入列的index: data.iloc[row_index,col_index]

ix函数

ix的功能更加强大，参数既可以是索引，也可以是名称，相当于，loc和iloc的合体。需要注意的是在使用的时候需要统一，在行选择时同时出现索引和名称，同样在同行选择时同时出现索引和名称: data.ix[n:m,['col1','col2']]

但是在最新的版本中，ix函数不建议使用

at函数

根据指定行index及列label，快速定位DataFrame的元素，选择列时仅支持列名:data.at[row_index,'column_names']

iat函数

与at的功能相同，只使用索引参数:data.iat[row_index,column_index]

df.set_index('month')

df.set_index(['year','month'])

DataFrame.columns = [newName]

df['Hour'] = pd.to_datetime(df['report_date'])

df.rename(index = str,column = new_names)

删除列：

#通过特征选取
data = data[['age']]

#通过del 关键字
del data['name']

#通过drop函数
data.drop(['name'],axis=1, inplace=True)

#通过pop
data.pop('name')

df = pd.read_csv(INPUTFILE, encoding = "utf-8")

df_bio = pd.read_csv(INPUTFILE, encoding = "utf-8", header=None) # header=None, header=0

显示前几行

df.head()

显示后几行

df.tail()

删除重复的数据

isDuplicated=df.duplicated() #判断重复数据记录
print(isDuplicated)
0    False
1    False
2     True
3    False
dtype: bool

#删除重复的数据
print(df.drop_duplicates()) #删除所有列值相同的记录，index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

print(df.drop_duplicates(['col1'])) #删除col1列值相同的记录，index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

print(df.drop_duplicates(['col2'])) #删除col2列值相同的记录，index为2和3的记录行被删除
  col1  col2
0    a     3
1    b     2

print(df.drop_duplicates(['col1','col2'])) #删除指定列（col1和col2）值相同的记录，index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

df 某一列字母转大写小写
df['列名'] = df['列名'].str.upper()

df['列名'] = df['列名'].str.lower()

REF

https://www.cnblogs.com/aro7/p/9748202.html

https://www.cnblogs.com/hankleo/p/11462532.html

今天关于Python Pandas在dataFrame中计算百分比并将其放入列表的问题和pandas 如何求百分比的介绍到此结束，谢谢您的阅读，有关PANDAS中类似SQL的窗口函数：Python Pandas Dataframe中的行编号、pandas在DataFrame中获取给定索引的位置、Python Pandas -- DataFrame、Python pandas dataframe等更多相关知识的信息可以在本站进行查询。

本文标签：