您现在的位置是：首页 > 后端

当前栏目

Python 保存爬行动物捕捉网页

Python 网页保存捕捉

2023-09-27 14:27:02 时间

选址的桌面壁纸网站汽车主题：

下面的两个print打开调试期间

#print tag
#print attrs

#!/usr/bin/env python
import re
import urllib2
import HTMLParser
base = "http://desk.zol.com.cn"
path = '/home/mk/cars/'
star = ''
def get_url(html):
	parser = parse(False)
	request = urllib2.Request(html)
	response = urllib2.urlopen(request)
	resp = response.read()
	parser.feed(resp)
def download(url):
	content = urllib2.urlopen(url).read()
	format = '[0-9]*\.jpg';
	res = re.search(format,url);
	print 'downloading:',res.group()
	filename = path+res.group()
	f = open(filename,'w+')
	f.write(content)
	f.close()	 
class parse(HTMLParser.HTMLParser):
	def __init__(self,Index):
		self.Index = Index;
		HTMLParser.HTMLParser.__init__(self)
	def handle_starttag(self,tag,attrs):
		#print tag
		#print attrs
		if(self.Index):
			if not cmp(tag,'a'):
				if(len(attrs) == 4):
					if(attrs[0] ==('class','pic')):
						#print tag
						#print attrs
						new = base+attrs[1][1]
						print 'found a link:',new
						global star
						star = new
						get_url(new)
		else:
			if not cmp(tag,'img'):
				if(attrs[0] == ('id','bigImg')):
					#print tag
					#print attrs
					Image_url = attrs[1][1]
					print 'found a picture:',Image_url
					download(Image_url)
			if not cmp(tag,'a'):
				if (len(attrs) == 4):
					if (attrs[1] == ('class','next')):
						#print tag
						#print attrs
						next = base + attrs[2][1]
						print 'found a link:',next
						if (star != next):
							get_url(next)
Index_url = 'http://desk.zol.com.cn/qiche/'
con = urllib2.urlopen(Index_url).read()
Parser_index = parse(True)
Parser_index.feed(con)

唯一的缺点是，在网站上漂亮的壁纸桌面壁纸。

。。

猜你喜欢

Hyper
Vue - 实现信纸输入写作，类似写明信片时的一张背景图片 + 横格纸效果（信纸格子与文字自动对齐、支持自定义背景图、文字或横线大小与颜色，一切由您 DIY 自定义样式）纯CSS完成支持任何vue项目
Leetcode: Can I Win
《惢客创业日记》2019.02.07（周四）粉丝最痛恨谁？
鸿蒙eTS状态管理Observed和ObjectLink
Linux文件查找工具之find “大宝剑”--转载
21天经典算法之直接选择排序
iOS开发系列--通知与消息机制
什么是PEP8？
Useful Qt Examples
【MySQL】磁盘写满之后，数据库show status受到阻塞的原因
产品定位策划
POJ3228二分最大流

相关主题

Python学习_1
Python命名规范
创Python规划2
python 归一化

zl程序教程

当前栏目

Python 保存爬行动物捕捉网页

相关文章