Skip to content
#

spider

Here are 2,050 public repositories matching this topic...

wawawawo
wawawawo commented Jun 21, 2017

不知道为什么我的爬不出东西来,json文件是0kb的。。其中spider里面我改了一点:from scrapy.spiders import Spider(因为报错说要用spiders)。还有log改logging了,然后运行的结果看不大懂,望大佬指正

D:\LZZZZB\w3school>scrapy crawl w3school
2017-06-21 22:33:03 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: w3school
)
2017-06-21 22:33:03 [scrapy.utils.log] INFO: Overridden settings: {‘BOT_NAME’: ‘
w3school’, ‘NEWSPIDER_MODULE’: ‘w3school.spiders’, ‘ROBOTSTXT

crawlab
seamusic
seamusic commented Feb 15, 2020

Bug 描述
按教程文档说明的,使用docker-compose up -d 安装启动后,直接执行task报错
不知道哪里有问题呢?
我的docker运行环境是win10

`2020-02-15 15:58:04 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: xueqiu)
22020-02-15 15:58:04 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.6.9 (default, Nov 7 2019, 10:44:02) - [GCC 8.3.0], pyOpenSSL 19

z1220726337
z1220726337 commented Jan 30, 2019

我运行的是这4条代码,有可以获得IP,但用python客户端调用没办法取出来

  • 启动scrapy worker,包括代理IP采集器和校验器

    python crawler_booter.py --usage crawler
    python crawler_booter.py --usage validator

  • 启动调度器,包括代理IP定时调度和校验

    python scheduler_booter.py --usage crawler
    python scheduler_booter.py --usage validator
    1

endafarrell
endafarrell commented Apr 14, 2018

I copied the examples/sciencenet_spider.py example and tried to run it using python 3.6 - but:

python sciencenet_spider.py
[2018:04:14 22:21:26] Spider started!
[2018:04:14 22:21:26] Using selector: KqueueSelector
[2018:04:14 22:21:26] Base url: http://blog.sciencenet.cn/
[2018:04:14 22:21:26] Item "Post": 0
[2018:04:14 22:21:26] Requests count: 0
[2018:04:14 22:21:26] Error coun
LWsmile
LWsmile commented Nov 27, 2018

linux:HTTPConnectionPool(host='192.168.0.24', port=6801): Max retries exceeded with url: /listprojects.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f0a78b2d828>: Failed to establish a new connection: [Errno 111] Connection refused',))
windows:HTTPConnectionPool(host='localhost', port=6801): Max retries exceeded with url: /jobs (Caused by Ne

JayBizzle
JayBizzle commented Jan 21, 2020
  • Filestack
  • Google-Ads-Overview Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 6.0.1; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
  • Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleW

Improve this page

Add a description, image, and links to the spider topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spider topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.