您现在的位置是：首页 > 后端

当前栏目

python中request请求库与BeautifulSoup解析库的用法

Python 解析用法请求 request BeautifulSoup

2023-09-14 09:14:25 时间

python中request请求库与BeautifulSoup解析库的用法

request

安装

打开cmd窗口，检查python环境，需要python3.7版本及以上
在这里插入图片描述
然后输入，下载requests库

pip install requests -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

在这里插入图片描述

创建项目

在这里插入图片描述
创建python文件，最好不要含有中文字符

测试代码

在这里插入图片描述

# 1.导入模块
# 1.导入模块
import requests

# 2. 发送请求，获取响应
response = requests.get("http://www.baidu.com")
print(response)  # 这里打印的结果是响应码

# 3. 获取响应数据
# print(response.encoding) # ISO-8859-1

# response.encoding = 'utf-8' # 设置编码格式
# print(response.text)

# 上面两句话等于下面一句话
print(response.content.decode())

运行结果：
在这里插入图片描述

小案例（请求疫情首页）

在这里插入图片描述
案例代码：

# 1. 导入模块
import requests

# 2. 发送请求，获取响应
response = requests.get("https://ncov.dxy.cn/ncovh5/view/pneumonia")

# 3. 从响应中获取数据
print(response.content.decode())

运行结果：
在这里插入图片描述

BeautifulSoup

简介

在这里插入图片描述
Beautiful Soup 是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式.Beautiful Soup会帮你节省数小时甚至数天的工作时间.

安装

运行下面两行命令，或者pycharm可以自动安装。
pip install bs4
pip install lxml

学习代码

# 1. 导入模块
from bs4 import BeautifulSoup

# 2. 创建BeautifulSoup对象
soup = BeautifulSoup('<html>data</html>', 'lxml')
print(soup)

运行结果

在这里插入图片描述

find方法

简介

在这里插入图片描述

案例（根据标签名查找）

在这里插入图片描述

案例代码：

# 1.导入模块
from bs4 import BeautifulSoup

# 2.准备文本字符串
html = '''
    <title>The Dormouse's story</title>
</head>
<body>
<p class="title">
<b>The Dormouse's story</b>
</p>
<p class="story">Once Upon a time three were three little sister;and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, 
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and 
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well. 
</p>
<p class="story">...</p>
</body>
</html>
'''

# 3.创建BeautifulSoup对象
soup = BeautifulSoup(html,'lxml')

# 4.查找title标签
title = soup.find('title')
print(title)

# 5.查找a标签
a = soup.find('a')
print(a)

#查找所有a标签
a_s = soup.find_all('a')
print(a_s)

运行结果：
在这里插入图片描述

案例（根据属性查找）

在这里插入图片描述

案例代码

# 1.导入模块
from bs4 import BeautifulSoup

# 2.准备文本字符串
html = '''
    <title>The Dormouse's story</title>
</head>
<body>
<p class="title">
<b>The Dormouse's story</b>
</p>
<p class="story">Once Upon a time three were three little sister;and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, 
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and 
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well. 
</p>
<p class="story">...</p>
</body>
</html>
'''

# 3.创建BeautifulSoup对象
soup = BeautifulSoup(html,'lxml')

# 二、根据属性查找
#查找 id 为 link1 的标签
#方法一：通过命名参数进行查找
a = soup.find(id = 'link1')
print(a)

#方法二：使用attrs来指定属性字典，进行查找
a = soup.find(attrs={'id':'link1'})
print(a)

运行结果
在这里插入图片描述

案例（根据文本查找）

在这里插入图片描述
案例代码

# 1.导入模块
from bs4 import BeautifulSoup

# 2.准备文本字符串
html = '''
    <title>The Dormouse's story</title>
</head>
<body>
<p class="title">
<b>The Dormouse's story</b>
</p>
<p class="story">Once Upon a time three were three little sister;and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, 
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and 
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well. 
</p>
<p class="story">...</p>
</body>
</html>
'''

# 3.创建BeautifulSoup对象
soup = BeautifulSoup(html,'lxml')

#三、根据文本查找
# 获取下面文档中文本为 Elsie 的标签文本
text = soup.find(text='Elsie')
print(text)

运行结果
在这里插入图片描述

案例（Tag属性使用）

在这里插入图片描述
案例代码

# 1.导入模块
from bs4 import BeautifulSoup

# 2.准备文本字符串
html = '''
    <title>The Dormouse's story</title>
</head>
<body>
<p class="title">
<b>The Dormouse's story</b>
</p>
<p class="story">Once Upon a time three were three little sister;and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, 
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and 
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well. 
</p>
<p class="story">...</p>
</body>
</html>
'''

# 3.创建BeautifulSoup对象
soup = BeautifulSoup(html,'lxml')

a = soup.find(attrs={'id':'link1'})

#Tag对象
print(type(a))  #<class 'bs4.element.Tag'>
print('标签名：',a.name)
print('标签所有属性:',a.attrs)  #输出的class是一个列表，class 一个属性中可以有多个值
print('标签文本内容：',a.text)

运行结果
在这里插入图片描述

案例（从疫情首页提取各国最新的疫情数据）

在这里插入图片描述
ctrl+f查找某个类型元素的区域，然后，需找到对应标签的id,然后根据id的值来通过find方法获取文本内容。

案例代码：

# 1.导入相关模块
import requests
from bs4 import BeautifulSoup

# 2.发送请求，获取疫情首页内容
response = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia')
home_page = response.content.decode()
#print(home_page)

# 3.使用 BeautifulSoup 获取疫情数据
soup = BeautifulSoup(home_page, 'lxml')
script = soup.find(id='getAreaStat')
text = script.text
print(text)

运行结果：
在这里插入图片描述

猜你喜欢

Odin Inspector 系列教程 — Hide Reference Object Picker Attribute[通俗易懂]
SQL Server缓冲区：提升数据库处理效率（sqlserver缓冲区）
在Linux上安装MongoDB RMP包的步骤（mongodb rpm）
索引 B树MySQL索引的高效实现（B树实现MySQL）
FIST! FIST! FIST! Its all in the wrist: Remote Exec[通俗易懂]
李飞飞高徒Andrej Karpathy为大家答疑解惑
Redis缓存满智能化解决方案（redis缓存满怎么解决）
基于C++自动化编译工具的使用详解
ORA-12445: cannot change HIDDEN property of column ORACLE 报错故障修复远程处理
MySQL被CND启动，开启新纪元（cnd启动mysql）
oracle将一个字段拆分成多个值（regexp_substr函数）详解数据库
让linux系统通过GRUB命令行启动（grub命令行启动linux）
傻瓜化配置PHP环境——Appserv
案例一
Rust学习笔记之错误处理
如何授予和撤销Oracle表访问权限？（oracle表访问权限）
对比Oracle临时表和SQL Server临时表的不同点
Js(JavaScript)中,弹出是或否的选择框示例(confirm用法的实例分析)
【IOS】教你如何在手机端轻松安装ipa文件-（安装器已失效21.10）
Linux平台上可靠的DHCP下载指南（linuxdhcp下载）
MySQL优化：从入门到精通（mysqlin优化）

相关主题

python_异步io
h5 Python_python做h5网站
Python标准库：1. 介绍
Python GUI库
python之切片

zl程序教程

当前栏目

python中request请求库与BeautifulSoup解析库的用法

python中request请求库与BeautifulSoup解析库的用法

request

安装

创建项目

测试代码

小案例（请求疫情首页）

BeautifulSoup

简介

安装

学习代码

运行结果

find方法

简介

案例（根据标签名查找）

案例（根据属性查找）

案例（根据文本查找）

案例（Tag属性使用）

案例（从疫情首页提取各国最新的疫情数据）

相关文章