您现在的位置是：首页 > 后端

当前栏目

pandas apply 应用套路详解

pandas 应用详解 apply 套路

2023-06-13 09:11:18 时间

在 DataFrame 中应用 apply 函数很常见，你使用的多吗？

在应用时，传递给函数的对象是 Series 对象，其索引是 DataFrame 的index (axis=0) 或者 DataFrame 的 columns (axis=1)。

基本语法：

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

基本参数

func : function 应用到每行或每列的函数。
axis ：{0 or 'index', 1 or 'columns'}, default 0 函数应用所沿着的轴。
- 0 or index : 在每一列上应用函数。
- 1 or columns : 在每一行上应用函数。
raw : bool, default False 确定行或列以Series还是ndarray对象传递。
- False : 将每一行或每一列作为一个Series传递给函数。
- True : 传递的函数将接收ndarray 对象。如果你只是应用一个 NumPy 还原函数，这将获得更好的性能。
result_type : {'expand', 'reduce', 'broadcast', None}, default None 这些只有在 axis=1（列）时才会发挥作用。
- expand : 列表式的结果将被转化为列。
- reduce : 如果可能的话，返回一个Series，而不是展开类似列表的结果。这与 expand 相反。
- broadcast : 结果将被广播到 DataFrame 的原始形状，原始索引和列将被保留。

默认行为(None)取决于应用函数的返回值：类似列表的结果将作为这些结果的 Series 返回。但是，如果应用函数返回一个 Series ，这些结果将被扩展为列。

args : tuple 除了数组/序列之外，要传递给函数的位置参数。
**kwds 作为关键字参数传递给函数的附加关键字参数。

应用示例

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

应用 numpy 的通用函数：

>>> df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

在任一轴上使用还原函数：

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64

>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

返回一个类似列表的结果是一个 Series。

>>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

传递 result_type='expand' 将把类似列表的结果扩展到Dataframe的列中

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2

在函数中返回一个 Series 类似于传递 result_type='expand' 。结果的列名将是Series的索引。

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2

传递 result_type='broadcast' 将确保函数返回与原始 DataFrame 有相同的形状结果，无论是列表式还是标量式，并且沿轴的方向广播。结果的列名将是原始的列名。

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2

自定义函数应用框架

基本应用

# 制定自定义函数计算逻辑
>>> def fx(x):
...     return x * 3 + 7
...
# 应用自定义函数
>>> df.apply(fx)
    A   B
0  19  34
1  19  34
2  19  34

某列应用函数

>>> df['B'].apply(fx)
0    34
1    34
2    34
Name: B, dtype: int64

某列应用函数并新增列

>>> df['new'] = df['B'].apply(fx)
>>> df
   A  B   new
0  4  9  34
1  4  9  34
2  4  9  34

使用列表推导式应用自定义函数

>>> df['new2'] = [x * 3 + 7 for x in df['B']]
>>> df
   A  B   new  new2
0  4  9   34    34
1  4  9   34    34
2  4  9   34    34

-- END --

猜你喜欢

Ubuntu命令行修改网络配置方法
超导带材损伤演化检测技术获得突破
直播预告 | 未来已来，AIGC革命将如何颠覆音视频应用与创作
Oracle 参数 BLOCKCHAIN_TABLE_MAX_NO_DROP 官方解释，作用，如何配置最优化建议
将Excel中数据导入到Access数据库中的方法
Linux系统的构建和分支探索（linux的分支）
十三五现代金融体系规划落地，将互联网金融纳入监管范围
Oracle 视图 DBA_HOST_ACLS 官方解释，作用，如何使用详细说明
2022-08-18:每一个序列都是[a,b]的形式，a ＜ b 序列连接的方式为，前一个序列的b，要等于后一个序列的a 比如 : [3, 7]、[7, 13]
数据库课程设计——MySQL火车票售票系统[通俗易懂]
MSSQL可以实现分离镜像吗？（mssql能分离镜像吗）
没错，AR其实也是AI
ORA-26693: string string process dropped successfully, but error occurred while dropping rule set string ORACLE 报错故障修复远程处理
中二维数组存储置Redis中的技巧（二维数组存到redis）
Oracle构筑安全防火墙（oracle保护围栏）
10 Linux 2.6.10：开放源代码的新时代（linux 2.6.）
传Magic Leap将在下周展示产品原型，能证明自己不是“骗子”吗？
一个数据库十年老兵的思考与总结
windows上传ipa到开发者中心（app store）的方法
高效运转，优化内存——Oracle内存调优技巧（oracle内存调优）
又一个前端框架 - dagger.js

zl程序教程