您现在的位置是：首页 > 其他

当前栏目

【pandas】教程：6-如何计算摘要统计

统计 pandas 计算教程如何摘要

2023-09-14 09:15:12 时间

Pandas 计算摘要统计

本节使用的数据为 data/titanic.csv，链接为 pandas案例和教程所使用的数据-机器学习文档类资源-CSDN文库

在这里插入图片描述

加载数据

import pandas as pd

titanic = pd.read_csv("data/titanic.csv")
titanic.head()

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S

求乘客的平均年龄

titanic["Age"].mean()
# output 
# 29.69911764705882

不同的统计都可以应用于数值型列

titanic 乘客年龄中位数和票价中位数

titanic[["Age", "Fare"]].median()

Age     28.0000
Fare    14.4542
dtype: float64

titanic[["Age", "Fare"]].describe()

              Age        Fare
count  714.000000  891.000000
mean    29.699118   32.204208
std     14.526497   49.693429
min      0.420000    0.000000
25%     20.125000    7.910400
50%     28.000000   14.454200
75%     38.000000   31.000000
max     80.000000  512.329200

DataFrame 里的多列组成了 DataFrame；
describe 内置的信息统计函数；

除了自定义的统计函数，我们还可以聚合一些指定的统计方式，如下：

titanic.agg(
    {
        "Age": ["min", "max", "median", "skew"],
        "Fare": ["min", "max", "median", "mean"],
    }
)

              Age        Fare
min      0.420000    0.000000
max     80.000000  512.329200
median  28.000000   14.454200
skew     0.389108         NaN
mean          NaN   32.204208

根据类别分组聚类统计数据

在这里插入图片描述

Titanic 女性乘客和男性乘客的平均年龄？

titanic[['Sex', 'Age']].groupby("Sex").mean()

              Age
Sex              
female  27.915709
male    30.726645

我们对男乘客和女乘客的平均年龄感兴趣，可以选择Sex 和 Age 这两列，然后用 groupby() 方法对每列进行聚类。解决这类问题的更通用的方式是 split-apply-combine ；

split 数据成组
对每个组单独 apply 统计方法
combine 结合这些数据
apply 和 combine 在 pandas 里通常是一起做的。

上面的方法也可以写成如下：

titanic.groupby("Sex")["Age"].mean()

先对Sex进行聚类分析，然后选择 Age

在这里插入图片描述

每个不同性别和舱号的平均票价？

titanic.groupby(['Sex', 'Pclass'])['Fare'].mean()

Sex     Pclass
female  1         106.125798
        2          21.970121
        3          16.118810
male    1          67.226127
        2          19.741782
        3          12.661633
Name: Fare, dtype: float64

groupby 可以同时对多组数据同时进行；

类别计数

在这里插入图片描述

每个舱位人数分别是多少

titanic["Pclass"].value_counts()

3    491
1    216
2    184
Name: Pclass, dtype: int64

value_counts() 方法会统计每个类别有多少。
size 和 count 都可以结合 groupby 使用。size 包含了 NaN 数据并且提供表数据的行数，而 count 排除了那些缺失数据，在 value_counts 方法中，可以使用 dropna 来包含或者排除 NaN 数据。

记住

可以在整个列或行上计算聚合统计信息。

Groupby 提供了拆分-应用-组合模式的功能。

value_counts 是一个比较方便的统计函数，可以根据不同类别进行统计；

【参考】

How to calculate summary statistics? — pandas 1.5.2 documentation (pydata.org)

猜你喜欢

MFC Windows 程序设计[146]之InternetShortcut(附源码)
golang恐慌(panic)和恢复(recover)
Apache Spark源码走读（五）部署模式下的容错性分析 &standalone cluster模式下资源的申请与释放
STM32H7的CAN FD教程笔记
Mybatis中使用${}和使用#{}
SPSS函数之期和时间函数
Java实现洛谷 P1023 税收与补贴问题
嵌入式小项目练习-光电设计竞赛-寻迹小车-03-寻迹算法分析与源代码
阿里开源 Dragonwell JDK 重磅发布 GA 版本：生产环境可用
Lua打印Table对象
Windows 7下VS2008升级补丁
mouse scrollings and zooming operations in linux & windows are opposite
STM32F405的内部ADC采集
MapReduce业务－图片关联计算
iOS真机UI调试利器——Reveal
Pyhton编程：socket实现ssh通讯
GaussDB(for MySQL)如何快速创建索引？华为云数据库资深架构师为您揭秘
Django学习笔记
【python游戏制作】拼图永不过时，这就是我这个年龄该玩的游戏~
VectorDraw web library { 9.9003.1.0 }

相关主题

统计建模
hdu 1251 统计
成绩统计.
java 统计字符个数
shell 统计行数
描述性统计量
词频统计
统计元音
05:统计单词数
词频统计1

zl程序教程

当前栏目

【pandas】教程：6-如何计算摘要统计

Pandas 计算摘要统计

根据类别分组聚类统计数据

类别计数

【参考】

相关文章