Python Pandas按二级索引（或任何其他级别）切片multiindex（pandas 层级索引）

25-02-17 14

在这篇文章中，我们将为您详细介绍PythonPandas按二级索引的内容，并且讨论关于或任何其他级别切片multiindex的相关问题。此外，我们还会涉及一些关于Pandas0.24MultiInde

在这篇文章中，我们将为您详细介绍Python Pandas按二级索引的内容，并且讨论关于或任何其他级别切片multiindex的相关问题。此外，我们还会涉及一些关于Pandas 0.24 MultiIndex的问题、Pandas excel 双标题多级索引层次化索引 MultiIndex、pandas MultiIndex中代码的用途是什么？、pandas pytable：如何指定MultiIndex元素的min_itemsize的知识，以帮助您更全面地了解这个主题。

本文目录一览：

Python Pandas按二级索引（或任何其他级别）切片multiindex（pandas 层级索引）
Pandas 0.24 MultiIndex的问题
Pandas excel 双标题多级索引层次化索引 MultiIndex
pandas MultiIndex中代码的用途是什么？
pandas pytable：如何指定MultiIndex元素的min_itemsize

Python Pandas按二级索引（或任何其他级别）切片multiindex（pandas 层级索引）

关于将multiindex的level [0]划分为1级范围的文章很多。但是，我找不到解决问题的方法。也就是说，对于level
[0]索引值，我需要一个级别为1的索引

数据帧：首先是A到Z，等级是1到400；我需要每个level [0]的前2个和后2个（第一个），但不需要在同一步骤中。

           Title Score
First Rank 
A     1    foo   100
      2    bar   90
      3    lime  80
      4    lame  70
B     1    foo   400
      2    lime  300
      3    lame  200
      4    dime  100

我正在尝试使用以下代码获取每个级别1索引的最后2行，但它仅对第一个level [0]值正确地切片。

[IN]  df.ix[x.index.levels[1][-2]:]
[OUT] 
               Title Score
    First Rank 
    A     3    lime  80
          4    lame  70
    B     1    foo   400
          2    lime  300
          3    lame  200
          4    dime  100

我通过交换索引获得了前2行，但是我无法使其适用于后2行。

df.index = df.index.swaplevel("Rank","First")
df= df.sortlevel() #to sort by Rank
df.ix[1:2] #Produces the first 2 ranks with 2 level[1] (First) each.
           Title Score
Rank First 
1     A    foo   100
      B    foo   400
2     A    bar   90
      B    lime  300

当然我可以换回去得到这个：

df2 = df.ix[1:2]
df2.index = ttt.index.swaplevel("First","rank") #change the order of the indices back.
df2.sortlevel()
               Title Score
    First Rank 
    A     1    foo   100
          2    bar   90
    B     1    foo   400
          2    lime  300

希望通过相同的步骤获得任何帮助：

索引1（排名）的最后2行
以及获得前两行的更好方法

通过@ako编辑以下反馈：

pd.IndexSlice真正使用可以轻松切片任何级别的索引。这里是一个更通用的解决方案，下面是我逐步采用的方法来获取前两行。此处提供更多信息：http : //pandas.pydata.org/pandas-
docs/stable/advanced.html#using-slicers

"""    
Slicing a dataframe at the level[2] index of the
major axis (row) for specific and at the level[1] index for columns.

"""
    df.loc[idx[:,:,['some label','another label']],idx[:,'yet another label']]

"""
Thanks to @ako below is my solution,including how I
get the top and last 2 rows.
"""
    idx = pd.IndexSlice
    # Top 2
    df.loc[idx[:,[1,2],:] #[1,2] is NOT a row index,it is the rank label. 
    # Last 2
    max = len(df.index.levels[df.index.names.index("rank")]) # unique rank labels
    last2=[x for x in range(max-2,max)]
    df.loc[idx[:,last2],:] #for last 2 - assuming all level[0] have the same lengths.

Pandas 0.24 MultiIndex的问题

如何解决Pandas 0.24 MultiIndex的问题？

我需要创建一个具有multiIndex的熊猫数据框，第二层是空元组。但是，我可以创建数据框，但不能按列对其进行索引。

import pandas as pd
cols = pd.MultiIndex.from_arrays([ [1,2],[(),()]])
df   = pd.DataFrame([[5,6]],columns = cols)
df[cols]

Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\m
ulti.py",line 1295,in from_arrays
    if len(arrays[i]) != len(arrays[i - 1]):
TypeError: object of type ''int'' has no len()
>>> cols = pd.MultiIndex.from_arrays([ [1,()]])
>>> df   = pd.Dataframe([[5,columns = cols)
Traceback (most recent call last):
  File "<stdin>",in <module>
AttributeError: module ''pandas'' has no attribute ''Dataframe''
>>> df   = pd.DataFrame([[5,columns = cols)
>>> df[cols]
Traceback (most recent call last):
  File "<stdin>",in <module>
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\frame.py",line 2679,in __getitem__
    return self._getitem_array(key)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\frame.py",line 2723,in _getitem_array
    indexer = self.loc._convert_to_indexer(key,axis=1)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexing.
py",line 1314,in _convert_to_indexer
    indexer = check = labels.get_indexer(objarr)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\m
ulti.py",line 2042,in get_indexer
    indexer = self._engine.get_indexer(target)
  File "pandas\_libs\index.pyx",line 654,in pandas._libs.index.BaseMultiIndexC
odesEngine.get_indexer
  File "pandas\_libs\index.pyx",line 648,in pandas._libs.index.BaseMultiIndexC
odesEngine._extract_level_codes
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\b
ase.py",line 3207,in get_indexer
    target = _ensure_index(target)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\b
ase.py",line 4957,in _ensure_index
    return Index(index_like)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\b
ase.py",line 435,in __new__
    data,names=name or kwargs.get(''names''))
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\m
ulti.py",line 1356,in from_tuples
    return MultiIndex.from_arrays(arrays,sortorder=sortorder,names=names)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\m
ulti.py",line 1305,in from_arrays
    names=names,verify_integrity=False)
  File "D:\Users\Administrator\Anaconda3\lib\site-packages\pandas\core\indexes\m
ulti.py",line 222,in __new__
    raise ValueError(''Must pass non-zero number of levels/labels'')
ValueError: Must pass non-zero number of levels/labels

是否存在另一种按列索引此数据帧的方法？我的熊猫版本是0.24.2

谢谢

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

Pandas excel 双标题多级索引层次化索引 MultiIndex

1 import pandas as pd
2 import numpy as np

多级索引

多级索引（也称层次化索引）是pandas的重要功能，可以在Series、DataFrame对象上拥有2个以及2个以上的索引。
实质上，单级索引对应Index对象,多级索引对应MultiIndex对象。

一、Series对象的多级索引

多级索引Series对象的创建

se1=pd.Series(np.random.randn(4),index=[list("aabb"),[1,2,1,2]])
se1

代码结果：

a  1    0.945676
   2    1.240454
b  1    1.021960
   2    0.363063
dtype: float64

子集的选取

se1[''a'']

代码结果：

1    0.945676
2    1.240454
dtype: float64

se1[''a'':''b'']

代码结果：

a  1    0.945676
   2    1.240454
b  1    1.021960
   2    0.363063
dtype: float64

甚至能内层选取

se1[:,1]

代码结果：

a    0.945676
b    1.021960
dtype: float64

二、DataFrame对象的多级索引

创建

df1=pd.DataFrame(np.arange(12).reshape(4,3),index=[list("AABB"),[1,2,1,2]],columns=[list("XXY"),[10,11,10]])
df1

代码结果：

X Y 10 11 10 A 1 0 1 2 2 3 4 5 B 1 6 7 8 2 9 10 11

每一层都可以赋名

df1.columns.names=[''XY'',''sum'']
df1.index.names=[''AB'',''num'']
df1

代码结果：

XY X Y sum 10 11 10 AB num A 1 0 1 2 2 3 4 5 B 1 6 7 8 2 9 10 11

· 可以创建MultiIndex对象再作为索引

df1.index=pd.MultiIndex.from_arrays([list("AABB"),[3,4,3,4]],names=["AB","num"])
df1

代码结果：

XY X Y sum 10 11 10 AB num A 3 0 1 2 4 3 4 5 B 3 6 7 8 4 9 10 11

可以对各级索引进行互换

df1.swaplevel(''AB'',''num'')

代码结果：

XY X Y sum 10 11 10 num AB 3 A 0 1 2 4 A 3 4 5 3 B 6 7 8 4 B 9 10 11

pandas MultiIndex中代码的用途是什么？

代码可以指定每个标签的位置。

例如：

pd.MultiIndex(levels =  [[1,2],['red','blue']],codes=[[1,1,0],[0,1]])

给出结果：

MultiIndex([(2,'red'),(1,'blue'),(2,'blue')],)

如果我们更改代码：

pd.MultiIndex(levels =  [[1,codes=[[0,1],[1,0]])

那么结果是：

MultiIndex([(1,'red')],)

pandas pytable：如何指定MultiIndex元素的min_itemsize

我将pandas数据框存储为包含MultiIndex的pytable。

MultiIndex的第一级是与用户ID对应的字符串。现在，大多数用户ID的长度为13个字符，但其中一些则为15个字符。当我追加一条包含长用户ID的记录时，pytables会引发错误，因为它期望使用13个字符的字段。

ValueError(''Trying to store a string with len [15] in [user] column but\nthis column has a limit of [13]!\nConsider using min_itemsize to preset the sizes on these columns'',)

但是，我不知道如何为MultiIndex的元素设置属性min_itemsize。我已经尝试过了{''index'': 15}，但是行不通…

我知道我可以通过添加空格来强制所有ID从一开始就长度为15个字符，但我希望避免这种情况。

感谢您的帮助！

答案1

小编典典

您需要指定要为其设置的多索引级别的名称min_itemsize。这是一个例子：

创建2个多索引框架

In [1]: df1 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([[''abcdefghijklm'',''foo''],[1,2]],names=[''string'',''number'']))In [2]: df2 = DataFrame(np.random.randn(4,2),index=MultiIndex.from_product([[''abcdefghijklmop'',''foo''],[1,2]],names=[''string'',''number'']))In [3]: df1Out[3]:                              0         1string        number                    abcdefghijklm 1       0.737976  0.840718              2       0.605763  1.797398foo           1       1.589278  0.104186              2       0.029387  1.417195[4 rows x 2 columns]In [4]: df2Out[4]:                                0         1string          number                    abcdefghijklmop 1       0.539507 -1.059085                2       1.263722 -1.773187foo             1       1.625073  0.078650                2      -0.030827 -1.691805[4 rows x 2 columns]

建立店铺

In [9]: store = pd.HDFStore(''test.h5'',mode=''w'')In [10]: store.append(''df1'',df1)

这是长度的计算

In [12]: store.get_storer(''df1'').tableOut[12]: /df1/table (Table(4,)) ''''  description := {  "index": Int64Col(shape=(), dflt=0, pos=0),  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),  "number": Int64Col(shape=(), dflt=0, pos=2),  "string": StringCol(itemsize=13, shape=(), dflt='''', pos=3)}  byteorder := ''little''  chunkshape := (1456,)  autoindex := True  colindexes := {    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,    "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,    "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}

这是你现在得到的错误

In [13]: store.append(''df1'',df2)ValueError: Trying to store a string with len [15] in [string] column butthis column has a limit of [13]!Consider using min_itemsize to preset the sizes on these columns

min_itemsize用级别名称指定

In [14]: store.append(''df'',df1,min_itemsize={ ''string'' : 15 })In [15]: store.get_storer(''df'').tableOut[15]: /df/table (Table(4,)) ''''  description := {  "index": Int64Col(shape=(), dflt=0, pos=0),  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1),  "number": Int64Col(shape=(), dflt=0, pos=2),  "string": StringCol(itemsize=15, shape=(), dflt='''', pos=3)}  byteorder := ''little''  chunkshape := (1394,)  autoindex := True  colindexes := {    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,    "number": Index(6, medium, shuffle, zlib(1)).is_csi=False,    "string": Index(6, medium, shuffle, zlib(1)).is_csi=False}

附加

In [16]: store.append(''df'',df2)In [19]: store.dfOut[19]:                                0         1string          number                    abcdefghijklm   1       0.737976  0.840718                2       0.605763  1.797398foo             1       1.589278  0.104186                2       0.029387  1.417195abcdefghijklmop 1       0.539507 -1.059085                2       1.263722 -1.773187foo             1       1.625073  0.078650                2      -0.030827 -1.691805[8 rows x 2 columns]In [20]: store.close()

关于Python Pandas按二级索引和或任何其他级别切片multiindex的介绍已经告一段落，感谢您的耐心阅读，如果想了解更多关于Pandas 0.24 MultiIndex的问题、Pandas excel 双标题多级索引层次化索引 MultiIndex、pandas MultiIndex中代码的用途是什么？、pandas pytable：如何指定MultiIndex元素的min_itemsize的相关信息，请在本站寻找。

本文标签：