Python通过解析网页实现看报程序的方法
2023-06-13 09:15:40 时间
本文所述实例可以实现基于Python的查看图片报纸《参考消息》并将当天的图片报纸自动下载到本地供查看的功能,具体实现代码如下:
#coding=gbk importurllib2 importsocket importre importtime importos #timeoutinseconds #timeout=10 #socket.setdefaulttimeout(timeout) timeout=10 urllib2.socket.setdefaulttimeout(timeout) home_url="http://www.hqck.net" home_page="" try: home_page_context=urllib2.urlopen(home_url) home_page=home_page_context.read() print"Readhomepagefinishd." print"-------------------------------------------------" excepturllib2.URLError,e: printe.code exit() except: printe.code exit() reg_str=r"<aclass="item-baozhi"href="/arc/jwbt/ckxx/\d{4}/\d{4}/\w+\.html"rel="externalnofollow"><spanclass.+>.+</span></a>" news_url_reg=re.compile(reg_str) today_cankao_news=news_url_reg.findall(home_page) iflen(today_cankao_news)==0: print"Cannotfindtoday"snews!" exit() my_news=today_cankao_news[0] print"Latestnewslink="+my_news print url_s=my_news.find("/arc/") url_e=my_news.find(".html") url_e=url_e+5 print"Linkindex=["+str(url_s)+","+str(url_e)+"]" my_news=my_news[url_s:url_e] print"parturl="+my_news full_news_url=home_url+my_news print"fullurl="+full_news_url print image_folder="E:\\new_folder\\" if(os.path.exists(image_folder)==False): os.makedirs(image_folder) today_num=time.strftime("%Y-%m-%d",time.localtime(time.time())) image_folder=image_folder+today_num+"\\" if(os.path.exists(image_folder)==False): os.makedirs(image_folder) print"Newsimagefolder="+image_folder print context_uri=full_news_url[0:-5] first_page_url=context_uri+".html" try: first_page_context=urllib2.urlopen(first_page_url) first_page=first_page_context.read() excepturllib2.HTTPError,e: printe.code exit() tot_page_index=first_page.find("共") tot_page_index=tot_page_index tmp_str=first_page[tot_page_index:tot_page_index+10] end_s=tmp_str.find("页") page_num=tmp_str[2:end_s] printpage_num page_count=int(page_num) print"Total"+page_num+"pages:" print page_index=1 download_suc=True whilepage_index<=page_count: page_url=context_uri ifpage_index>1: page_url=page_url+"_"+str(page_index) page_url=page_url+".html" print"Newspagelink="+page_url try: news_img_page_context=urllib2.urlopen(page_url) excepturllib2.URLError,e: printe.reason download_suc=False break news_img_page=news_img_page_context.read() #f=open("e:\\page.html","w") #f.write(news_img_page) #f.close() reg_str=r"http://image\S+jpg" image_reg=re.compile(reg_str) image_results=image_reg.findall(news_img_page) iflen(image_results)==0: print"Cannotfindnewspage"+str(page_index)+"!" download_suc=False break image_url=image_results[0] print"Newsimageurl="+image_url news_image_context=urllib2.urlopen(image_url) image_name=image_folder+"page_"+str(page_index)+".jpg" imgf=open(image_name,"wb") print"Gettingimage..." try: whileTrue: date=news_image_context.read(1024*10) ifnotdate: break imgf.write(date) imgf.close() except: download_suc=False print"Saveimage"+str(page_index)+"failed!" print"Unexpectederror:"+sys.exc_info()[0]+sys.exc_info()[1] else: print"Saveimage"+str(page_index)+"succeed!" print page_index=page_index+1 ifdownload_suc==True: print"Newsdownloadsucceed!Path=\""+str(image_folder)+"\"" print"Enjoyit!^^" else: print"newsdownloadfailed!"
相关文章
- 二级Python选择题_二级python选择题题库
- pycharm安装opencv2_python opencv 教程
- python中pygame怎么安_Python中pygame安装方法图文详解
- python删除首行_Python删除文件第一行
- java与python-如何对比Python和Java,只需三分钟告诉你!
- 符合python命名规范的标识符是什么_Python标识符命名规范
- 简单WiFi控制小车系统(树莓派+python+web控制界面)
- 谷歌是python开发的吗_google python
- Python udp编程_python socket udp
- pycharm如何调试python程序_Pycharm断点调试Python程序的步骤方法
- 【7】python_matplotlib 输出(保存)矢量图方法;画图时图例说明(legend)放到图像外侧;Python_matplotlib图例放在外侧保存时显示不完整问题解决
- Python模拟自动登陆网页的三种方法!
- python接受命令选项-h
- 【测试开发】python系列教程:Python数据类型转换
- python-Python与MySQL数据库-处理MySQL查询结果
- python-Python与MongoDB数据库-使用Python执行MongoDB查询(三)
- 50行Python代码实现代理服务器详解编程语言
- Python学习:6.python内置函数详解编程语言
- Python装饰器详解
- Linux查看Python版本的有效方法(linux查看python版本)
- Python驱动Oracle数据库(python操作oracle)
- Python如何连接PostgreSQL数据库?(python连接postgresql)
- Python与MySQL实现数据分析的完美组合(mysql中python)
- python用Redis与Python实现大数据收集与分析(redis 联合)
- python用于url解码和中文解析的小脚本(pythonurldecoder)
- 详细介绍Python语言中的按位运算符
- python通过urllib2爬网页上种子下载示例
- python实现网页链接提取的方法分享
- 用python登录Dr.com思路以及代码分享
- 盘点提高Python代码效率的方法
- python进阶教程之函数参数的多种传递方法
- Python实现抓取网页并且解析的实例