黑基网 首页 电脑 软件精选 查看内容

使用Web Scraper 插件,不需要编程,也能爬网

2018-2-18 23:46| 投稿: xiaotiger |来自: 互联网

摘要: 使用Web Scraper 插件,不需要编程,也能爬网,使用Web Scraper插件,能够创建一个网站地图,并能遍历网站,抓取我们感兴趣的数据,比如,我们登陆淘宝,京东等商务网站,我们可以通过 Web Scraper,抓取某一类商品 ...

使用Web Scraper 插件,不需要编程,也能爬网,使用Web Scraper插件,能够创建一个网站地图,并能遍历网站,抓取我们感兴趣的数据,比如,我们登陆淘宝,京东等商务网站,我们可以通过 Web Scraper,抓取某一类商品的规格说明,价格,厂家等信息,我们通过Web Scraper可以抓取我们进入头条上的最热门的文章,也可以抓取我们自己的所有文章列表,发布时间,阅读和浏览量等信息,当然也能抓取我们的粉丝列表。 最最最重要的是,你不需要写任何的代码,只需点击,点击,点击,最后还能把抓取的结果导出为Excel可以识别的CSV格式。这功能,爽!!!

其官方网站如下:http://webscraper.io/tutorials

Web Scraper Chrome 插件的安装

打开Chrome浏览器,输入下面的URL地址:

https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn

点击“Added to Chrome”就安装了,安装后,在浏览器中按下F12或者点击右键,选择“检查(Inspect)”,在开发者工具下面就能看到WebScraper的Tab。

Web Scraper Chrome 插件的入门例子

下面以抓取京东上面的所有的手机信息为例子,使用Web Scraper演示一下其使用和操作方法。

Step1. 创建一个京东手机的SiteMap(网站站点图)

Step2. 在SiteMap上点击选择需要抓取的信息

如下图所示意,想抓取当前京东上,热门推荐的手机的网站和品牌信息,则抓取方法如下:命名一个id,这个id是自己定义的,然后选择抓取的类型,比如本例子中我们选择,“Link”

  • Link

  • PopupLink

  • Table

  • Element Attribute

  • Image

  • Groupped

  • HTML

  • Element

  • Element Click

  • Element Scroll down

然后选择你感兴趣元素,比如我选择了iPhone,荣耀,小米,华为,Vivo,Oppo,其会自动生成一个获取这些信息数据的表达式,我们可以称呼其为XPath,最后,点击保存。

Step3. 保存后,点击“Data Preview”预览数据。

Step4. 点击“Data Preview”预览的数据如下。

Step5. 点击“Scrape”,立马开始抓取数据。

当出现“Scraping Finished”的字样的时候,说明已经抓取成功。

Step6. 点击“Export Data as CSV”,导出为CVS的数据格式,这样Excel就能打开

写在最后的话

本文简单介绍总结了Web Scraper的插件的功能,安装以及一个简单的单页面例子。其实Web Scraper的功能远远不止于此,其实还能抓取分页,还能多页多元素的抓取,还能抓取二级页面,比如,所有iphone或者华为手机的价格,配置等信息,如果你有兴趣,请在本文后留言,如果收藏和转发数超过100,我将继续分享Web Scraper的高级功能。最后祝大家新年快乐,天天开心。

  • 如果你对笔者的分享感趣的话,请收藏并关注我的自媒体号;

  • 如果你有任何疑问需要探讨,欢迎在文章末尾留言,我尽量在第一时间个大家回复。

, groupId: 6520224868032578062, itemId: 6520224868032578062, type: 1, subInfo: { isOriginal: false, source: 冰尘无极, time: 2018-02-17 23:08:13 }, tagInfo: { tags: [{"name":"Chrome"},{"name":"Excel"},{"name":"iPhone"},{"name":"华为手机"},{"name":"京东"}], groupId: 6520224868032578062, itemId: 6520224868032578062, repin: 0, }, has_extern_link: 0 }, commentInfo: { groupId: 6520224868032578062, itemId: 6520224868032578062, comments_count: 6, ban_comment: 0 }, mediaInfo: { uid: 74999065942, name: 冰尘无极, avatar: //p4.pstatp.com/large/50aa000356b49be004a7, openUrl: /c/user/74999065942/, follow: false }, pgcInfo: {"media_info":{"open_url":"/c/user/74999065942/","avatar_url":"https://p4.pstatp.com/large/50aa000356b49be004a7","media_id":1583413888505870,"name":"冰尘无极","user_verified":false},"articles":[{"item_id":"6523525438197727748","url":"/item/6523525438197727748","title":"大型网站架构前期最容易被忽略的非功能性需求的4大核心特性"},{"item_id":"6520224868032578062","url":"/item/6520224868032578062","title":"使用Web Scraper 插件,不需要编程,也能爬网"},{"item_id":"6523115506239537677","url":"/item/6523115506239537677","title":"Selenium ---Web自动化测试的神兵利器,值得收藏!"},{"item_id":"6518920157761372676","url":"/item/6518920157761372676","title":"美国程序员大都愿意开源,值得每一个软件从业人员的深思!"}]}, feedInfo: { url: /api/pc/feed/, category: __all__, initList: [{"comments_count":13,"media_avatar_url":"//p3.pstatp.com/large/568f0006013e96d2b37d","is_feed_ad":false,"is_diversion_page":false,"title":"HTML5技术资源分享 ES6编程风格","single_mode":true,"gallary_image_count":25,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521117792043794957/","source":"杭州千锋教育","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/61690001f09dd50fb25d","group_id":"6521117792043794957","is_related":true,"media_url":"/c/user/85614609846/"},{"comments_count":1,"is_related":true,"is_feed_ad":false,"is_diversion_page":false,"title":"c语言为什么就不能把java压倒?","single_mode":true,"gallary_image_count":0,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6522680942375469315/","source":"悟空问答","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/6589001bfad79a53fd5b","group_id":"6522680942375469315"},{"comments_count":9,"is_related":true,"is_feed_ad":false,"is_diversion_page":false,"title":"centos上面自带的openjdk 1.8为何被很多人舍弃删除,而下载了jdk官网上的?","single_mode":true,"gallary_image_count":1,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6522391071450726660/","source":"悟空问答","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p1.pstatp.com/list/190x124/5fed0008707d8a3d13f0","group_id":"6522391071450726660"},{"comments_count":6,"media_avatar_url":"//p3.pstatp.com/large/5935000137bed70a8c60","is_feed_ad":false,"is_diversion_page":false,"title":"你知道什么是系统架构吗","single_mode":true,"gallary_image_count":5,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521963859585008141/","source":"石头大V","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/616f0001d544872c1875","group_id":"6521963859585008141","is_related":true,"media_url":"/c/user/59909877329/"},{"comments_count":4,"media_avatar_url":"//p1.pstatp.com/large/216d00213d5ba1354e79","is_feed_ad":false,"is_diversion_page":false,"title":"Docker命令速查表,收藏!","single_mode":true,"gallary_image_count":3,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521504418376974851/","source":"云智小号","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/616a00040d547b25ad82","group_id":"6521504418376974851","is_related":true,"media_url":"/c/user/60798381091/"},{"comments_count":50,"media_avatar_url":"//p3.pstatp.com/large/1541/6137994502","is_feed_ad":false,"is_diversion_page":false,"title":"苹果手机内存又满了?关闭这三个功能,瞬间多出几个G!","single_mode":true,"gallary_image_count":4,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521569759824183811/","source":"泡泡网","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":4,"image_url":"//p3.pstatp.com/list/190x124/616a0004912e05dafecc","group_id":"6521569759824183811","is_related":true,"media_url":"/c/user/3657894252/"},{"comments_count":27,"media_avatar_url":"//p10.pstatp.com/large/46c400065347203f3ce3","is_feed_ad":false,"is_diversion_page":false,"title":"阿里巴巴规范之代码格式,就照这个写,指定没错","single_mode":true,"gallary_image_count":0,"middle_mode":true,"has_video":false,"video_duration_str":null,"source_url":"/group/6521203857580622350/","source":"Free码农","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":9,"image_url":"//p1.pstatp.com/list/190x124/616d0000a92ccd25e1d0","group_id":"6521203857580622350","is_related":true,"media_url":"/c/user/50429504684/"},{"comments_count":11,"is_related":true,"is_feed_ad":false,"is_diversion_page":false,"title":"为什么现在大多数网站是html结尾,很少见以jsp结尾?","single_mode":true,"gallary_image_count":2,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6520641560437063943/","source":"悟空问答","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p9.pstatp.com/list/190x124/5fed000466c57ba08a65","group_id":"6520641560437063943"},{"comments_count":54,"media_avatar_url":"//p3.pstatp.com/large/1232000228220966c025","is_feed_ad":false,"is_diversion_page":false,"title":"漫画:你别做傻事啊!网上都是骗人的!","single_mode":true,"gallary_image_count":4,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521680890379108867/","source":"酒妹漫画","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p1.pstatp.com/list/190x124/5b4d0002615ff39c34f8","group_id":"6521680890379108867","is_related":true,"media_url":"/c/user/52513999763/"},{"comments_count":41,"media_avatar_url":"//p6.pstatp.com/large/5acc000ddff6e2e9dd87","is_feed_ad":false,"is_diversion_page":false,"title":"区块链传奇之国内三大公链:NEO小蚁、Qtum量子链、BTM比原链简介","single_mode":true,"gallary_image_count":7,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6522058198411641358/","source":"区块链传奇","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/568e00012c96360c2e45","group_id":"6522058198411641358","is_related":true,"media_url":"/c/user/4220372764/"},{"comments_count":30,"is_related":true,"is_feed_ad":false,"is_diversion_page":false,"title":"哪里可以免费下载纯净的windows系统?","single_mode":true,"gallary_image_count":7,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521910904311775491/","source":"悟空问答","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/5fee0007abb38ed5a983","group_id":"6521910904311775491"},{"comments_count":3,"media_avatar_url":"//p5a.pstatp.com/large/59360004ec2da4f46ca0","is_feed_ad":false,"is_diversion_page":false,"title":"python 利用PDFMiner包操作PDF","single_mode":true,"gallary_image_count":4,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6520369401739362824/","source":"python宠儿","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":1,"image_url":"//p3.pstatp.com/list/190x124/6165000091454c0d00f1","group_id":"6520369401739362824","is_related":true,"media_url":"/c/user/85632433002/"},{"comments_count":26,"is_related":true,"is_feed_ad":false,"is_diversion_page":false,"title":"怎样获取win10系统最高管理权限?","single_mode":true,"gallary_image_count":4,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521483749018829064/","source":"悟空问答","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/5fed000652a9ba2b7e6a","group_id":"6521483749018829064"},{"comments_count":5,"media_avatar_url":"//p9.pstatp.com/large/4e73000078819aca1a3f","is_feed_ad":false,"is_diversion_page":false,"title":"井底之蛙:教你快速搭建Elasticsearch搜索集群,So Easy!","single_mode":true,"gallary_image_count":5,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6521340756957856269/","source":"井底一只蛙","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p1.pstatp.com/list/190x124/616c0001c227fef6efe4","group_id":"6521340756957856269","is_related":true,"media_url":"/c/user/81230464381/"},{"comments_count":0,"media_avatar_url":"//p2.pstatp.com/large/5e790002d9c4cd2cbb72","is_feed_ad":false,"is_diversion_page":false,"title":"为什么大家总喜欢黑PHP?PHP到底做错了什么","single_mode":true,"gallary_image_count":4,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6520932110394458637/","source":"加班菌的日常","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/616b000019b6f5fa3a9f","group_id":"6520932110394458637","is_related":true,"media_url":"/c/user/82746053034/"},{"comments_count":8,"media_avatar_url":"//p1.pstatp.com/large/567c0004b787dd3b62c2","is_feed_ad":false,"is_diversion_page":false,"title":"新switch快来了?你会站在哪一边?","single_mode":true,"gallary_image_count":3,"middle_mode":false,"has_video":false,"video_duration_str":null,"source_url":"/group/6522043761189454349/","source":"德意志的情怀","more_mode":null,"article_genre":"article","has_gallery":false,"video_play_count":0,"image_url":"//p3.pstatp.com/list/190x124/616f0002718d2fa9ac3b","group_id":"6522043761189454349","is_related":true,"media_url":"/c/user/51278944390/"}] }, shareInfo: { shareUrl: https://m.toutiao.com/group/6520224868032578062/, abstract: 使用WebScraper插件,不需要编程,也能爬网,使用WebScraper插件。能够创建一个网站地图,并能遍历网站,抓取我们感兴趣的数据,比如,我们登陆淘宝,京东等商务网站。
小编推荐:欲学习电脑技术、系统维护、网络管理、编程开发和安全攻防等高端IT技术,请 点击这里 注册黑基账号,公开课频道价值万元IT培训教程免费学,让您少走弯路、事半功倍,好工作升职加薪!



免责声明:本文由投稿者转载自互联网,版权归原作者所有,文中所述不代表本站观点,若有侵权或转载等不当之处请联系我们处理,让我们一起为维护良好的互联网秩序而努力!联系方式见网站首页右下角。


鲜花

握手

雷人

路过

鸡蛋

相关阅读

发表评论

最新评论


新出炉

返回顶部