zl程序教程

您现在的位置是:首页 >  后端

当前栏目

pandas DataFrame操作详解大数据

pandas数据 详解 操作 dataframe
2023-06-13 09:20:26 时间
d2 = {doc:[txt1, txt2], nid:[100, 200]}#多个成员, 字典value的长度相等 df2 = pd.DataFrame(data=d2, columns=(nid, doc)) df2 nid doc 0 100 txt1 1 200 txt2 1.2 add 用于成员追加成份 1.2.1 单成员DataFrame追加
 

 import pandas as pd 

 d = {doc:[txt1], nid:[100]} 

 df = pd.DataFrame(data=d, columns=(nid, doc)) 

 d2 = {doc:[txt2], nid:[200]} 

 nid doc 

0 100 txt1 

 df = df.add(pd.DataFrame(d2)) 

 doc nid 

0 txt1txt2 300
1.2.2 各个成员一起追加
 import pandas as pd 

 d = {doc:[txt1, text3], nid:[100, 300]} 

 df = pd.DataFrame(data=d, columns=(nid, doc)) 

 nid doc 

0 100 txt1 

1 300 text3 

 d2 = {doc:[txt2], nid:[200]} 

 df2 = df.add(pd.DataFrame(d2)) 

 nid doc 

0 100 txt1 

1 300 text3 

 df2 #追加的DataFram成员与原DataFrame成员数相等,出错 

 doc nid 

0 txt1txt2 300.0 

1 NaN NaN 

 d3 = {doc:[txt2, text4], nid:[200, 400]} 

 df3 = df.add(pd.DataFrame(d3)) 

 df3 #追加DataFram成员数与原DataFrame相等,分别追加 

 doc nid 

0 txt1txt2 300 

1 text3text4 700
1.3 append
 import pandas as pd 

 d = {doc:[txt1, text3], nid:[100, 300]} 

 df = pd.DataFrame(data=d, columns=(nid, doc)) 

 d2 = {doc:[txt2], nid:[200]} 

 df = df.append(pd.DataFrame(data=d2, 

 columns=(nid, doc)), 

 ignore_index=True) 

 nid doc 

0 100 txt1 

1 300 text3 

2 200 txt2 

 df.to_csv(p.txt, index=False) #保存为csv文件
1.4 merge 合并

方法原型:
DataFrame.merge(right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False)[source]

1.4.1 columns完全相同的DataFrame合并
 import pandas as pd 

 d = {doc:[txt1, text3], nid:[100, 300]} 

 df = pd.DataFrame(data=d, columns=(nid, doc)) 

 d2 = {doc:[txt2], nid:[200]} 

 d2 = {doc:[txt2, txt1], nid:[200, 500]} 

 df2 = df.merge(pd.DataFrame(d2, columns=(nid, doc))) 

 df2 

Empty DataFrame 

Columns: [nid, doc] 

Index: [] 

 nid doc 

0 100 txt1 

1 300 text3 

 df2 = df.merge(pd.DataFrame(d2, 

 columns=(nid, doc)), 

 how=outer) #外链的形式 

 df2 

 nid doc 

0 100 txt1 

1 300 text3 

2 200 txt2 

3 500 txt1
1.4.2 column部分相同的DataFrame合并
 import pandas as pd 

 d = {doc:[txt1, text3], nid:[100, 300]} 

 df = pd.DataFrame(data=d, columns=(nid, doc)) 

 d2 = {nid:[200]} #只有一个column相同 

 df2 = df.merge(pd.DataFrame(d2, columns=(nid,)), how=outer) 

 df2 

 nid doc 

0 100 txt1 

1 300 text3 

2 200 NaN
1.4.3 column完全不相同
 import pandas as pd 

 d = {doc:[txt1, text3], nid:[100, 300]} 

 df = pd.DataFrame(data=d, columns=(nid, doc)) 

 df2 = pd.DataFrame() 

 df3 =df2.merge(df, how=outer) 

Traceback (most recent call last): 

 File " stdin ", line 1, in module 

 File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4607, in merge 

 copy=copy, indicator=indicator) 

 File "/Library/Python/2.7/site-packages/pandas/tools/merge.py", line 61, in merge 

 copy=copy, indicator=indicator) 

 File "/Library/Python/2.7/site-packages/pandas/tools/merge.py", line 538, in __init__ 

 self._validate_specification() 

 File "/Library/Python/2.7/site-packages/pandas/tools/merge.py", line 883, in _validate_specification 

 raise MergeError(No common columns to perform merge on) 

pandas.tools.merge.MergeError: No common columns to perform merge on

原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/9340.html

分布式文件系统,分布式数据库区块链并行处理(MPP)数据库,数据挖掘开源大数据平台数据中台数据分析数据开发数据治理数据湖数据采集