python爬虫程序，爬取网页图片

一段简单的爬取网页图片的python代码

#coding=utf-8

import urllib
import re

url = "https://bh.sb/post/category/main/"

def getHtml(url):
    page = urllib.urlopen(url)  #urllib.urlopen()方法用于打开一个URL地址
    html = page.read() #read()方法用于读取URL上的数据
    return html

def getImg(html):
    p1 = r'https://.[^s]+?.jpg|https://.[^s]+?.png'    #正则表达式，得到图片地址
    imgre = re.compile(p1)     #re.compile() 可以把正则表达式编译成一个正则表达式对象.
    imglist = re.findall(imgre,html)      #re.findall() 方法读取html 中包含 imgre（正则表达式）的    数据
    #把筛选的图片地址通过for循环遍历并保存到本地
    #核心是urllib.urlretrieve()方法,直接将远程数据下载到本地，图片通过x依次递增命名
    x = 0

    for imgurl in imglist:
        urllib.urlretrieve(imgurl,'/home/zlee/image-spider/%s.jpg' % x)
        x=x+1


html = getHtml(url)
print getImg(html)

经过爬取之后，指定的存储为会出现下列图片

版权声明：本文来源CSDN，感谢博主原创文章，遵循 CC 4.0 by-sa 版权协议，转载请附上原文出处链接和本声明。
原文链接：https://blog.csdn.net/LEE18254290736/article/details/85058210
站方申明：本站部分内容来自社区用户分享，若涉及侵权，请联系站方删除。

发表于 2020-03-08 10:56:55
阅读 ( 1001 )
分类：

python爬虫程序，爬取网页图片

你可能感兴趣的文章

精选的优质文章

0 条评论

官方社群

GO教程

推荐文章

猜你喜欢

随便看看