Python-JSON转换为Pandas DataFrame（python json转化）

25-02-21 14

关于Python-JSON转换为PandasDataFrame和pythonjson转化的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于PandasDataFrame-将列转换为JSON并

关于Python-JSON转换为Pandas DataFrame和python json转化的问题就给大家分享到这里，感谢你花时间阅读本站内容，更多关于Pandas DataFrame - 将列转换为 JSON 并添加为新列、Pandas将Dataframe转换为Nested Json、Python Pandas -- DataFrame、Python pandas dataframe等相关知识的信息别忘了在本站进行查找喔。

本文目录一览：

Python-JSON转换为Pandas DataFrame（python json转化）
Pandas DataFrame - 将列转换为 JSON 并添加为新列
Pandas将Dataframe转换为Nested Json
Python Pandas -- DataFrame
Python pandas dataframe

Python-JSON转换为Pandas DataFrame（python json转化）

答案1

小编典典

我找到了一个快速便捷的解决方案，以解决我想要使用的json_normalize()问题pandas 0.13。

from urllib2 import Request, urlopenimport jsonfrom pandas.io.json import json_normalizepath1 = ''42.974049,-81.205203|42.974298,-81.195755''request=Request(''http://maps.googleapis.com/maps/api/elevation/json?locations=''+path1+''&sensor=false'')response = urlopen(request)elevations = response.read()data = json.loads(elevations)json_normalize(data[''results''])

这提供了一个很好的扁平化数据框架，其中包含我从Google Maps API获得的json数据。

Pandas DataFrame - 将列转换为 JSON 并添加为新列

您可以在 agg 上以 dict 的身份执行 axis=1

对于字典：

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(dict,1))

对于json：

out = df[['col1']].assign(new_col=df.iloc[:,1:].agg(pd.Series.to_json,1))

print(out)

   col1                                            new_col
0   cat  {'col2': 'black','col3': 'small','col4': 'lo...
1   dog  {'col2': 'white','col3': 'medium','col4': 'b...
2  mice  {'col2': 'grey','col3': 'tinny','col4': 'fast'}

如您所料，有很多方法可以做到这一点，但我想到的是：

>>> import pandas as pd
>>> d = {"col1": ["cat",'dog','mice'],"col2": ["black","white","grey"],"col3": ["small",'medium','tinny'],'col4': ['lovely','brave','fast']}
>>> df = pd.DataFrame(d)
>>> pd.concat([df[['col1']],pd.DataFrame({"newcol": df[['col2','col3','col4']].to_dict(orient='records')})],axis=1)

对于您可能不知道从 DataFrame 中获得的列名称的情况，您可以使用以下方法来选择列索引。在这种情况下，从第 1 列到最后。

>>> pd.concat([df[['col1']],pd.DataFrame({"newcol": df.iloc[:,1:].to_dict(orient='records')})],axis=1)

使用 df.to_json(orient='records') 转储 json 记录列表，然后将 json 加载到 dict 列表中，分配给新列。

import pandas as pd
df = pd.DataFrame({'col1': ['cat','col2' : ['black','white','grey'],'col3' : ['small','tinny']})

# create json column
# data_json = df.iloc[:,1:].to_json(orient='records')
# data = json.loads(data_json)
data = df.iloc[:,1:].to_dict(orient='records')

# keep first column
dfn = df.iloc[:,[0]].copy()
dfn['newcol'] = data
# dfn['newcol'] = pd.Series(data).map(json.dumps)

dfn

   col1                               newcol
0   cat   {"col2": "black","col3": "small"}
1   dog  {"col2": "white","col3": "medium"}
2  mice    {"col2": "grey","col3": "tinny"}

data_json（输入 str）

[{"col2":"black","col3":"small"},{"col2":"white","col3":"medium"},{"col2":"grey","col3":"tinny"}]

除了 Anky 的回答，我发现这篇文章描述了更多的操作： https://codeflex.co/mysql-table-migration-with-pandas-dataframe/

我在下面的示例中使用了三列。

data = {'col1': ['cat','tinny']}
import pandas as pd
df = pd.DataFrame(data)
col = list(df.columns)

我们可以使用如下的 lambda 函数

df.apply(lambda x: {col[1]:x[1],col[2]:x[2]},axis =1)

您可以按如下方式将其添加到数据框中

df['new_col'] = df.apply(lambda x: {col[1]:x[1],axis =1)

这会产生以下输出。

df
   col1   col2    col3                              new_col
0   cat  black   small   {'col2': 'black','col3': 'small'}
1   dog  white  medium  {'col2': 'white','col3': 'medium'}
2  mice   grey   tinny    {'col2': 'grey','col3': 'tinny'}

然后使用 df.drop 删除不需要的列

这应该会产生所需的输出。


df.drop(['col2','col3'],axis = 1)
   col1                              new_col
0   cat   {'col2': 'black','col3': 'small'}
1   dog  {'col2': 'white','col3': 'medium'}
2  mice    {'col2': 'grey','col3': 'tinny'}

对于给定要求，我建议使用 itertuples 来生成字典列表并将其分配给数据框，如下所示

<div></div>

了解更多关于 itertuples() check this out

对于一些不同的情况，当您想保留所有索引并将所有值转换为单个字典时，您可以这样做

import pandas as pd 
data = {'col1': ['cat','fast']} 
df = pd.DataFrame(data) 

def getDictColumn_df1(df,new_col_name="newcol",cols_from_start=1):
    df[new_col_name] = tuple(map(lambda row: row._asdict(),df.iloc[:,cols_from_start:].itertuples()))
    return df[['col1',new_col_name]]

getDictColumn_df1(df)

Pandas将Dataframe转换为Nested Json

我的问题本质上与此相反：

从深度嵌套的JSON创建Pandas
DataFrame

我想知道是否有可能做相反的事情。给定一个像这样的表：

     Library  Level           School Major  2013 Total
200  MS_AVERY  UGRAD  GENERAL STUDIES  GEST        5079
201  MS_AVERY  UGRAD  GENERAL STUDIES  HIST           5
202  MS_AVERY  UGRAD  GENERAL STUDIES  MELC           2
203  MS_AVERY  UGRAD  GENERAL STUDIES  PHIL          10
204  MS_AVERY  UGRAD  GENERAL STUDIES  PHYS           1
205  MS_AVERY  UGRAD  GENERAL STUDIES  POLS          53

是否可以生成嵌套的字典（或JSON），如：

字典：

{'MS_AVERY': 
    { 'UGRAD' :
        {'GENERAL STUDIES' : {'GEST' : 5}
                             {'MELC' : 2}

 ...

Python Pandas -- DataFrame

pandas.DataFrame

class pandas. DataFrame (data=None, index=None, columns=None, dtype=None, copy=False)[source]

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

Parameters:	data : numpy ndarray (structured or homogeneous), dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects index : Index or array-like Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided columns : Index or array-like Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided dtype : dtype, default None Data type to force. Only a single dtype is allowed. If None, infer copy : boolean, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

Parameters:

data : numpy ndarray (structured or homogeneous), dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects

index : Index or array-like

Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided

columns : Index or array-like

Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

dtype : dtype, default None

Data type to force. Only a single dtype is allowed. If None, infer

copy : boolean, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input

条件筛选

单条件筛选

选取col1列的取值大于n的记录: data[data['col1']>n]

筛选col1列的取值大于n的记录,但是显示col2，col3列的值: data[['col2','col3']][data['col1']>n]

选择特定行：使用isin函数根据特定值筛选记录。筛选col1值等于list中元素的记录: data[data.col1.isin(list)]

多条件筛选

可以使用&(并)与| (或)操作符或者特定的函数实现多条件筛选

使用&筛选col1列的取值大于n，col2列的取值大于m的记录:data[(data['col1'] > n) & (data['col2'] > m)]

使用numpy的logical_and函数完成同样的功能:data[np.logical_and(data['col1']> n,data['col2']>m)]

索引筛选

切片操作

使用切片操作选择特定的行: data[n:m]

传入列名选择特定的列: data[['col1','col2']]

loc函数[行用序号，列用名称]

当每列已有column name时，用 data ['col1'] 就能选取出一整列数据。如果你知道column names 和index，可以选择 .loc同时进行行列选择: data.loc[index,'colum_names']

iloc函数[行用序号，列用序号]

使用方法同loc函数，但是不再输入列名，而是输入列的index: data.iloc[row_index,col_index]

ix函数

ix的功能更加强大，参数既可以是索引，也可以是名称，相当于，loc和iloc的合体。需要注意的是在使用的时候需要统一，在行选择时同时出现索引和名称，同样在同行选择时同时出现索引和名称: data.ix[n:m,['col1','col2']]

但是在最新的版本中，ix函数不建议使用

at函数

根据指定行index及列label，快速定位DataFrame的元素，选择列时仅支持列名:data.at[row_index,'column_names']

iat函数

与at的功能相同，只使用索引参数:data.iat[row_index,column_index]

df.set_index('month')

df.set_index(['year','month'])

DataFrame.columns = [newName]

df['Hour'] = pd.to_datetime(df['report_date'])

df.rename(index = str,column = new_names)

删除列：

#通过特征选取
data = data[['age']]

#通过del 关键字
del data['name']

#通过drop函数
data.drop(['name'],axis=1, inplace=True)

#通过pop
data.pop('name')

df = pd.read_csv(INPUTFILE, encoding = "utf-8")

df_bio = pd.read_csv(INPUTFILE, encoding = "utf-8", header=None) # header=None, header=0

显示前几行

df.head()

显示后几行

df.tail()

删除重复的数据

isDuplicated=df.duplicated() #判断重复数据记录
print(isDuplicated)
0    False
1    False
2     True
3    False
dtype: bool

#删除重复的数据
print(df.drop_duplicates()) #删除所有列值相同的记录，index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

print(df.drop_duplicates(['col1'])) #删除col1列值相同的记录，index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

print(df.drop_duplicates(['col2'])) #删除col2列值相同的记录，index为2和3的记录行被删除
  col1  col2
0    a     3
1    b     2

print(df.drop_duplicates(['col1','col2'])) #删除指定列（col1和col2）值相同的记录，index为2的记录行被删除
  col1  col2
0    a     3
1    b     2
3    c     2

df 某一列字母转大写小写
df['列名'] = df['列名'].str.upper()

df['列名'] = df['列名'].str.lower()

REF

https://www.cnblogs.com/aro7/p/9748202.html

https://www.cnblogs.com/hankleo/p/11462532.html

我们今天的关于Python-JSON转换为Pandas DataFrame和python json转化的分享已经告一段落，感谢您的关注，如果您想了解更多关于Pandas DataFrame - 将列转换为 JSON 并添加为新列、Pandas将Dataframe转换为Nested Json、Python Pandas -- DataFrame、Python pandas dataframe的相关信息，请在本站查询。

本文标签：