关于使用循环填充空的python数据框和pythonwhile循环嵌套的问题就给大家分享到这里,感谢你花时间阅读本站内容,更多关于for循环内的Python时间序列数据框查询、Pandas(Pytho
关于使用循环填充空的python数据框和pythonwhile循环嵌套的问题就给大家分享到这里,感谢你花时间阅读本站内容,更多关于for循环内的Python时间序列数据框查询、Pandas(Python):使用上一行值填充空单元格?、python – 在django中使用循环、python – 逐个运行时效果很好,使用循环时出错等相关知识的信息别忘了在本站进行查找喔。
本文目录一览:- 使用循环填充空的python数据框(pythonwhile循环嵌套)
- for循环内的Python时间序列数据框查询
- Pandas(Python):使用上一行值填充空单元格?
- python – 在django中使用循环
- python – 逐个运行时效果很好,使用循环时出错
使用循环填充空的python数据框(pythonwhile循环嵌套)
可以说我想用循环中的值创建并填充一个空的数据框。
import pandas as pd
import numpy as np
years = [2013,2014,2015]
dn=pd.DataFrame()
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C','B','A'],year: [1,1,1 ],}).set_index('Incidents')
print (df1)
dn=dn.append(df1,ignore_index = False)
即使忽略索引为false,附录也会提供一个对角矩阵:
>>> dn
2013 2014 2015
Incidents
C 1 NaN NaN
B 1 NaN NaN
A 1 NaN NaN
C NaN 1 NaN
B NaN 1 NaN
A NaN 1 NaN
C NaN NaN 1
B NaN NaN 1
A NaN NaN 1
[9 rows x 3 columns]
它看起来应该像这样:
>>> dn
2013 2014 2015
Incidents
C 1 1 1
B 1 1 1
A 1 1 1
[3 rows x 3 columns]
有更好的方法吗?有没有办法解决追加?
我的pandas版本为‘0.13.1-557-g300610e’
for循环内的Python时间序列数据框查询
找到间隔后,您不必提取行。可以同时提取间隔以及之间的行,并且应该同时提取效率。
解决方案
有效提取的关键是通过diff-cumsum
技巧捕获事件条纹。您的情况有些特殊,因为条纹之后的下一个元素也算作条纹。这反映在代码中标志is_ev
的定义中。
执行diff-cumsum
技巧后,事件条纹和非事件条纹将交错。相对于2的正确模数由初始Status
确定。
df["diff"] = df["Status"].diff()
df["is_start"] = df["diff"] == 1
df["is_end"] = df["diff"] == -1
df["is_ev"] = (df["Status"] == 1) | df["is_end"]
df["ev_number"] = df["is_ev"].diff().cumsum()
df["ev_number"].iat[0] = 0
# output 1: interval marks
df_start_end = df[df["is_start"] | df["is_end"]]
# output 2: records between (inclusive) events in chronological ordering
# - different events can be subset using
# df_records[df_records["ev_number"] == N]
# - Why set the modulus:
# If df begins with Status = 1,event records will have even df["ev_number"],namely modulus = 0 against divisor 2
# If df begins with Status = 0,event records will have odd df["ev_number"],so modulus = 1
modulus = 0 if df["Status"].iat[0] == 1 else 1
df_records = df[df["ev_number"] % 2 == modulus]
结果
print(df_start_end.drop(columns=["diff","is_ev"]))
ID Timestamp Value Status is_start is_end ev_number
271381 64 2010-09-22 00:44:15.890 21.5 0.0 False True 0
259875 64 2010-09-22 00:44:18.440 23.0 1.0 True False 2
205910 64 2010-09-22 00:44:23.440 24.5 0.0 False True 2
print(df_records.drop(columns=["diff","is_start","is_end","is_ev"]))
ID Timestamp Value Status ev_number
103177 64 2010-09-21 23:13:21.090 21.5 1.0 0
252019 64 2010-09-22 00:44:14.890 21.5 1.0 0
271381 64 2010-09-22 00:44:15.890 21.5 0.0 0
259875 64 2010-09-22 00:44:18.440 23.0 1.0 2
18870 64 2010-09-22 00:44:19.890 24.5 1.0 2
205910 64 2010-09-22 00:44:23.440 24.5 0.0 2
进一步的步骤
1。间隔列表(开始时间,结束时间)
df_out = df_start_end.loc[df_start_end["is_start"],["Timestamp","ev_number"]]\
.merge(df_start_end.loc[df_start_end["is_end"],["ID","Timestamp","ev_number"]],how="outer",on="ev_number")\
.rename(columns={"Timestamp_x": "StartTime","Timestamp_y": "EndTime"})\
.sort_values("EndTime")\
.reset_index(drop=True)
# add the first record as first interval StartTime when needed
if modulus == 0:
df_out["StartTime"].iat[0] = df["Timestamp"].iat[0]
print(df_out[["ID","StartTime","EndTime","ev_number"]])
ID StartTime EndTime ev_number
0 64 2010-09-21 23:13:21.090 2010-09-22 00:44:15.890 0.0
1 64 2010-09-22 00:44:18.440 2010-09-22 00:44:23.440 2.0
2。将(StartTime,EndTime)附加到df的右侧
这可以通过匹配ev_number
轻松完成。
df_with_interval = df\
.merge(df_start_end.loc[df_start_end["is_start"],on="ev_number",suffixes=("","_1"))\
.merge(df_start_end.loc[df_start_end["is_end"],"_2"))\
.rename(columns={"Timestamp_1": "StartTime","Timestamp_2": "EndTime"})\
.sort_values("Timestamp")
print(df_with_interval)
ID Timestamp ... StartTime EndTime
0 64 2010-09-22 00:44:18.440 ... 2010-09-22 00:44:18.440 2010-09-22 00:44:23.440
1 64 2010-09-22 00:44:19.890 ... 2010-09-22 00:44:18.440 2010-09-22 00:44:23.440
2 64 2010-09-22 00:44:23.440 ... 2010-09-22 00:44:18.440 2010-09-22 00:44:23.440
但是,由于我不知道数据的实际用例,因此我将不做进一步介绍。我相信到目前为止,核心问题已经解决。
这是why you shouldn't assign the means of solutions when you ask questions上的示例。
Pandas(Python):使用上一行值填充空单元格?
如果它们以数字开头,我想用上一行值填充空单元格。例如,我有
Text Text 30 Text Text Text Text Text Text 31 Text Text Text Text 31 Text Text Text Text Text Text 32 Text Text Text Text Text Text Text Text Text Text Text Text
但是我想要
Text Text 30 Text Text 30 Text Text 30 Text Text 31 Text TextText Text 31 Text Text 31 Text Text 31 Text Text 32 Text TextText Text Text Text Text Text Text Text Text Text
我试图通过使用以下代码来达到此目的:
data = pd.read_csv(''DATA.csv'',sep=''\t'', dtype=object, error_bad_lines=False)data = data.fillna(method=''ffill'', inplace=True)print(data)
但它没有用。
反正有这样做吗?
答案1
小编典典首先,将空单元格替换为NaN:
df[df[0]==""] = np.NaN
现在,使用ffill()
:
df.fillna(method=''ffill'')# 0#0 Text#1 30#2 30#3 30#4 31#5 Text#6 31#7 31#8 31#9 32
python – 在django中使用循环
我有一个网页,我循环,并在循环内使用循环.
{% for o in something %}
{% for c in o %}
现在,这意味着每次循环内部,第一个div标签变为白色.但是,我想要的是在白色和黑色之间交替,即以白色开始,然后下一次在循环内部开始第一个带有黑色的div标签.这是可能的在这里实现?
最佳答案
关于这个问题有一个接受bug开放.您可能想尝试建议的更改,看看它是否适合您.
如果您不想尝试,或者它不起作用,请试一试:
{% cycle 'white' 'black' as divcolors %}
{% for o in something %}
{% for c in o %}
据我了解,循环将从白色开始,然后每次循环遍历循环中的值(意味着每次都不会重新启动白色).
python – 逐个运行时效果很好,使用循环时出错
[[0 0 0 0 0] [1 1 1 1 1] [1 0 0 0 1] [0 1 0 0 1] [1 1 1 1 0]]
一个集群将是一组坐标(实际上我使用列表,但它并不重要):
c1=[[1,0],[1,1],2],3],4],[2,[3,4]]
此网格中的另一个群集由下式给出:
c2=[[3,[4,3]]
现在,我已经制定了一个方法,对于给定的起始坐标(如果它的值为1),返回该点所属的簇(例如,如果我选择[1,1]坐标,它将返回c1).
为了测试,我将选择一个点(1,1)和一个小网格.这是结果良好时的输出:
Number of recursions: 10 Length of cluster: 10 [[1 1 1 0 1] [1 1 0 1 1] [0 1 0 0 1] [1 1 1 0 0] [0 1 0 1 1]] [[1 1 1 0 0] [1 1 0 0 0] [0 1 0 0 0] [1 1 1 0 0] [0 1 0 0 0]]
我想知道当簇大小越来越大时我的算法有多快.如果我运行该程序然后重新运行它并多次执行,它总是会产生良好的结果.如果我使用循环,它会开始给出错误的结果.这是一个可能的输出测试场景:
Number of recursions: 10 Length of cluster: 10 [[1 1 1 0 1] [1 1 0 1 1] [0 1 0 0 1] [1 1 1 0 0] [0 1 0 1 1]] [[1 1 1 0 0] [1 1 0 0 0] [0 1 0 0 0] [1 1 1 0 0] [0 1 0 0 0]] Number of recursions: 8 Length of cluster: 8 [[0 1 1 1 0] [1 1 1 0 0] [1 0 0 0 0] [1 1 1 0 1] [1 1 0 0 0]] [[0 0 0 0 0] - the first one is always good,this one already has an error [1 1 0 0 0] [1 0 0 0 0] [1 1 1 0 0] [1 1 0 0 0]] Number of recursions: 1 Length of cluster: 1 [[1 1 1 1 1] [0 1 0 1 0] [0 1 0 0 0] [0 1 0 0 0] [0 1 1 0 1]] [[0 0 0 0 0] - till end [0 1 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0]] Number of recursions: 1 Length of cluster: 1 [[1 1 1 1 1] [0 1 1 0 0] [1 0 1 1 1] [1 1 0 1 0] [0 1 1 1 0]] [[0 0 0 0 0] [0 1 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0]] ... till end
我将给出循环代码(给你所有代码都没问题,但它太大了,错误可能是由于我在循环中做的事情):
import numpy as np from time import time def test(N,p,testTime,length): assert N>0 x=1 y=1 a=PercolationGrid(N) #this is a class that creates a grid a.useFixedProbability(p) #the probability that given point will be 1 a.grid[x,y]=1 #I put the starting point as 1 manually cluster=Cluster(a) t0=time() cluster.getCluster(x,y) #this is what I''m testing how fast is it t1=time() stats=cluster.getStats() #get the length of cluster and some other data testTime.append(t1-t0) testTime.sort() length.append(stats[1]) #[1] is the length stat that interests me length.sort() #both sorts are so I can use plot later print a.getGrid() #show whole grid clusterGrid=np.zeros(N*N,dtype=''int8'').reshape(N,N) #create zero grid where I''ll "put" the cluster of interest c1=cluster.getClusterCoordinates() #this is recursive method (if it has any importance) for xy in c1: k=xy[0] m=xy[1] clusterGrid[k,m]=1 print clusterGrid del a,cluster,clusterGrid testTime=[] length=[] p=0.59 N=35 np.set_printoptions(threshold=''nan'') #so the output doesn''t shrink for i in range(10): test(N,length)
我假设我在释放内存或其他东西时做错了(如果它不是循环中的一些微不足道的错误我看不到)?我在64位Linux上使用python 2.7.3.
编辑:
我知道这里的人不应该检查整个代码,而是具体的问题,但是我找不到正在发生的事情,唯一的建议是我可能有一些静态变量,但在我看来情况并非如此.所以,如果有人有良好的意志和精力,你可以浏览代码,也许你会看到一些东西.我不是在不久前开始使用课程,所以要为很多不好的东西做好准备.
import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import time class ProbabilityGrid(object): """ This class gives 2D quadratic array (a grid) which is filled with float values from 0-1,which in many cases represent probabilities """ def __init__(self,size=2,dataType=''float16''): """initialization of a grid with 0. values""" assert size>1 assert dataType==''float64'' or dataType==''float32'' or dataType==''float16'' self.n=size self.dataType=dataType self.grid=np.zeros((size,size),dtype=dataType) def getGrid(self): """returns a 2D probability array""" return self.grid def getSize(self): """returns a size of a 2D array""" return self.size def fillRandom(self): """fills the grid with uniformly random values from 0 to 1""" n=self.n self.grid=np.random.rand(n,n) def fixedProbabilities(self,p): """fills the grid with fixed value from 0 to 1""" assert p<1.0 self.grid=P*np.ones((self.n,self.n)) class PercolationGrid(object): """ percolation quadratic grid filled with 1 and 0,int8 which represent a state. Percolation grid is closly connected to probabilies grid. ProbabilityGrid gives the starting probabilities will the [i,j] spot be filled or not. All functions change the PercolationGrid.grid when ProbabilityGrid.grid changes,so in a way their values are connected """ def __init__(self,dataType=''int8''): """ initialization of PercolationGrid,sets uniformly 0 and 1 to grid """ assert size>1 assert dataType==''int64'' or dataType==''int32'' or dataType==''int8'' self.n=size self.dataType=dataType self.grid=np.zeros((size,dtype=dataType) self.pGrid=ProbabilityGrid(self.n) self.pGrid.fillRandom() self.useProbabilityGrid() #def fillRandom(self,min=0,max=1,distribution=''uniform''): # n=self.n # self.grid=np.random.random_integers(min,max,n*n).reshape(n,n) def getGrid(self): """returns a 2D percolation array""" return self.grid def useProbabilityGrid(self): #use probability grid to get Percolation grid of 0s and 1es """ this method fills the PercolationGrid.grid according to probabilities from Probability.grid """ comparisonGrid=np.random.rand(self.n,self.n) self.grid=np.array(np.floor(self.pGrid.grid-comparisonGrid)+1,dtype=self.dataType) # Here I used a trick. To simulate whether 1 will apear with probability p,# we can use uniform random generator which returns values from 0 to 1. If # the value<p then we get 1,if value>p it''s 0. # But instead looping over each element,it''s much faster to make same sized # grid of random,uniform values from 0 to 1,calculate the difference,add 1 # and use floor function which round everything larger than 1 to 1,and lower # to 0. Then value-p+1 will give 0 if value<p,1 if value>p. The result is # converted to data type of percolation array. def useFixedProbability(self,p): """ this method fills the PercolationGrid according to fixed probabilities of being filled,for example,a large grid with parameter p set to 0.33 should,aproximatly have one third of places filed with ones and 2/3 with 0 """ self.pGrid.fixedProbabilities(p) self.useProbabilityGrid() def probabilityCheck(self): """ this method checks the number of ones vs number of elements,good for checking if the filling of a grid was close to probability we had in mind. Of course,the accuracy is larger as grid size grows. For smaller grid sizes you can still check the probability by running the test multiple times. """ sum=self.grid.sum() print float(sum)/float(self.n*self.n) #this works because values can only be 0 or 1,so the sum/size gives #the ratio of ones vs size def setGrid(self,grid): shape=grid.shape i,j=shape[0],shape[1] assert i>1 and j>1 if i!=j: print ("The grid needs to be NxN shape,N>1") self.grid=grid def setProbabilities(self,N>1") self.pGrid.grid=grid self.useProbabilityGrid() def showPercolations(self): fig1=plt.figure() fig2=plt.figure() ax1=fig1.add_subplot(111) ax2=fig2.add_subplot(111) myColors=[(1.0,1.0,1.0),(1.0,0.0,1.0)] mycmap=mpl.colors.ListedColormap(myColors) subplt1=ax1.matshow(self.pGrid.grid,cmap=''jet'') cbar1=fig1.colorbar(subplt1) subplt2=ax2.matshow(self.grid,cmap=mycmap) cbar2=fig2.colorbar(subplt2,ticks=[0.25,0.75]) cbar2.ax.set_yticklabels([''None'',''Percolated''],rotation=''vertical'') class Cluster(object): """This is a class of percolation clusters""" def __init__(self,array): self.grid=array.getGrid() self.N=len(self.grid[0,]) self.cluster={} self.numOfSteps=0 #next 4 functions return True if field next to given field is 1 or False if it''s 0 def moveLeft(self,i,j): moveLeft=False assert i<self.N assert j<self.N if j>0 and self.grid[i,j-1]==1: moveLeft=True return moveLeft def moveRight(self,j): moveRight=False assert i<self.N assert j<self.N if j<N-1 and self.grid[i,j+1]==1: moveRight=True return moveRight def moveDown(self,j): moveDown=False assert i<self.N assert j<self.N if i<N-1 and self.grid[i+1,j]==1: moveDown=True return moveDown def moveUp(self,j): moveUp=False assert i<self.N assert j<self.N if i>0 and self.grid[i-1,j]==1: moveUp=True return moveUp def listofOnes(self): """nested list of connected ones in each row""" outlist=[] for i in xrange(self.N): outlist.append([]) helplist=[] for j in xrange(self.N): if self.grid[i,j]==0: if (j>0 and self.grid[i,j-1]==0) or (j==0 and self.grid[i,j]==0): continue # condition needed because of edges outlist[i].append(helplist) helplist=[] continue helplist.append((i,j)) if self.grid[i,j]==1 and j==self.N-1: outlist[i].append(helplist) return outlist def getCluster(self,i=0,j=0,moveD=[1,1,1]): #(left,right,up,down) #moveD short for moveDirections,1 means that it tries to move it to that side,0 so it doesn''t try self.numOfSteps=self.numOfSteps+1 if self.grid[i,j]==1: self.cluster[(i,j)]=True else: print "the starting coordinate is not in any cluster" return if moveD[0]==1: try: #if it comes to same point from different directions we''d get an infinite recursion,checking if it already been on that point prevents that self.cluster[(i,j-1)] moveD[0]=0 except: if self.moveLeft(i,j)==False: #check if 0 or 1 is left to (i,j) moveD[0]=0 else: self.getCluster(i,j-1,1]) #right is 0,because we came from left if moveD[1]==1: try: self.cluster[(i,j+1)] moveD[1]=0 except: if self.moveRight(i,j)==False: moveD[1]=0 else: self.getCluster(i,j+1,[0,1]) if moveD[2]==1: try: self.cluster[(i-1,j)] moveD[2]=0 except: if self.moveUp(i,j)==False: moveD[2]=0 else: self.getCluster(i-1,j,0]) if moveD[3]==1: try: self.cluster[(i+1,j)] moveD[3]=0 except: if self.moveDown(i,j)==False: moveD[3]=0 else: self.getCluster(i+1,1]) if moveD==(0,0): return def getClusterCoordinates(self): return self.cluster def getStats(self): print "Number of recursions:",self.numOfSteps print "Length of cluster:",len(self.cluster) return (self.numOfSteps,len(self.cluster))
解决方法
Here is a link to a blog post that shows an example of this.
下面是getCluster方法的一个工作版本,它既修复了默认的争论问题,又删除了表现出有问题的行为的无关的moveD赋值.
def getCluster(self,moveD=None): #(left,down) #moveD short for moveDirections,0 so it doesn''t try if moveD == None: moveD = [1,1] self.numOfSteps=self.numOfSteps+1 if self.grid[i,j]==1: self.cluster[(i,j)]=True else: print "the starting coordinate is not in any cluster" return if moveD[0]==1: try: #if it comes to same point from different directions we''d get an infinite recursion,checking if it already been on that point prevents that self.cluster[(i,j-1)] except: if self.moveLeft(i,j)==True: #check if 0 or 1 is left to (i,j) self.getCluster(i,because we came from left if moveD[1]==1: try: self.cluster[(i,j+1)] except: if self.moveRight(i,j)==True: self.getCluster(i,1]) if moveD[2]==1: try: self.cluster[(i-1,j)] except: if self.moveUp(i,j)==True: self.getCluster(i-1,0]) if moveD[3]==1: try: self.cluster[(i+1,j)] except: if self.moveDown(i,j)==True: self.getCluster(i+1,1])
今天关于使用循环填充空的python数据框和pythonwhile循环嵌套的讲解已经结束,谢谢您的阅读,如果想了解更多关于for循环内的Python时间序列数据框查询、Pandas(Python):使用上一行值填充空单元格?、python – 在django中使用循环、python – 逐个运行时效果很好,使用循环时出错的相关知识,请在本站搜索。
本文标签: