您现在的位置是：首页 > 后端

当前栏目

python爬虫 --- 简书评论

Python 爬虫 --- 评论

2023-09-14 09:12:15 时间

某些网站的一些数据是通过js加载的 ,所以爬取下来的数据拿不到,

找到评论的地址 .进行请求获取评论数据

#coding=utf-8
import json

import requests


def requests_view(response):
    import webbrowser
    requests_url = response.url
    base_url = '<head><base href="%s">' %(requests_url)
    base_url = base_url.encode('utf-8')
    content = response.content.replace(b"<head>",base_url)
    tem_html = open('tmp.html','wb')
    tem_html.write(content)
    tem_html.close()
    webbrowser.open_new_tab("tmp.html")

headers = {
        "User-Agent": 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
response = requests.get("https://www.jianshu.com/notes/26504955/comments?comment_id=&author_only=false&since_id=0&max_id=1586510606000&order_by=likes_count&page=1",headers=headers)
comments = json.loads(response.content)

if comments['comment_exist'] == True:
    for item in comments['comments']:
        print(item['user']['nickname'],item['compiled_content'])

猜你喜欢

finemolds模型_yolo模型训练
6大多人协作工具推荐
Hibernate get方法：通过OID加载实体对象
Oracle MD030实现企业数据的全面优化（oracle md030）
javaScript对文字按照拼音排序实现代码
WordPress美化标签
搞数据库的就是运维，你真逗！
系统比较Linux系统与UNIX系统的异同（linux系统和unix）
Linux系统中XML文件的编辑方式（linuxxml编辑）
关闭时刷新父窗口两种方法
MySQL 资料库多表联结查询技巧（mysql两库表查询）

相关主题

Python爬虫基础
Python元组
[python爬虫] 百度贴吧
爬虫与反爬虫
[转] python-docx
python之线程

zl程序教程

当前栏目

python爬虫 --- 简书评论

相关文章