spider

i want get the price (follow red frame)

	c.OnHTML("div[id=price]", func(e *colly.HTMLElement) {
		fmt.Printf("test----%+v\n",e)
		price,err := strconv.ParseFloat(e.Text,64)
		//price := e.Text
		fmt.Printf("********* price----%+v\n",price)

		if err != nil {

			fmt.Pri

不知道为什么我的爬不出东西来，json文件是0kb的。。其中spider里面我改了一点：from scrapy.spiders import Spider（因为报错说要用spiders）。还有log改logging了，然后运行的结果看不大懂，望大佬指正

D:\LZZZZB\w3school>scrapy crawl w3school
2017-06-21 22:33:03 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: w3school
)
2017-06-21 22:33:03 [scrapy.utils.log] INFO: Overridden settings: {‘BOT_NAME’: ‘
w3school’, ‘NEWSPIDER_MODULE’: ‘w3school.spiders’, ‘ROBOTSTXT

Bug 描述
按教程文档说明的，使用docker-compose up -d 安装启动后，直接执行task报错
不知道哪里有问题呢？
我的docker运行环境是win10

`2020-02-15 15:58:04 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: xueqiu)
22020-02-15 15:58:04 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.6.9 (default, Nov 7 2019, 10:44:02) - [GCC 8.3.0], pyOpenSSL 19

Reflect this kind of things

{
method : 'POST',
form : { key: 'value', key2: 'value'}
}

我运行的是这4条代码，有可以获得IP，但用python客户端调用没办法取出来

启动scrapy worker，包括代理IP采集器和校验器

python crawler_booter.py --usage crawler
python crawler_booter.py --usage validator
启动调度器，包括代理IP定时调度和校验

python scheduler_booter.py --usage crawler
python scheduler_booter.py --usage validator

It would be much better user experience to use custom widgets for spider args. For example if we could be able to select category from a list or enter URL in separate field it would be much easier to end user to work with.

Hi, according to the following links

https://doc.scrapy.org/en/latest/topics/spiders.html#spiderargs
https://scrapyd.readthedocs.io/en/stable/api.html#schedule-json

Params can be sent to Spider class during initialization, I can't see any place for me to input them.
It will be thankful if this feature added.

I copied the examples/sciencenet_spider.py example and tried to run it using python 3.6 - but:

python sciencenet_spider.py
[2018:04:14 22:21:26] Spider started!
[2018:04:14 22:21:26] Using selector: KqueueSelector
[2018:04:14 22:21:26] Base url: http://blog.sciencenet.cn/
[2018:04:14 22:21:26] Item "Post": 0
[2018:04:14 22:21:26] Requests count: 0
[2018:04:14 22:21:26] Error coun

linux：HTTPConnectionPool(host='192.168.0.24', port=6801): Max retries exceeded with url: /listprojects.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f0a78b2d828>: Failed to establish a new connection: [Errno 111] Connection refused',))
windows：HTTPConnectionPool(host='localhost', port=6801): Max retries exceeded with url: /jobs (Caused by Ne

Filestack
Google-Ads-Overview Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36
Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 6.0.1; generic) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Version/4.0 Mobile Safari/537.36
Google-Ads-Overview Mozilla/5.0 (Linux; U; Android 2.3.4; generic) AppleW

spider

Here are 2,050 public repositories matching this topic...

facert / awesome-spider

gocolly / colly

jhao104 / proxy_pool

shengqiangzhang / examples-of-web-crawlers

guyueyingmu / avbook

s0md3v / Photon

henrylee2cn / pholcus

luyishisi / Anti-Anti-Spider

crawlab-team / crawlab

bda-research / node-crawler

SpiderClub / haipproxy

BruceDone / awesome-crawler

tophubs / TopList

gaojiuli / toapi

DormyMo / SpiderKeeper

shiyanhui / dht

jae-jae / QueryList

Gerapy / Gerapy

gaojiuli / gain

howie6879 / owllook

my8100 / scrapydweb

jumper2014 / lianjia-beike-spider

sjdirect / abot

JAVClub / core

hu17889 / go_spider

xianhu / PSpider

JayBizzle / Crawler-Detect

howie6879 / ruia

geziyor / geziyor

wkunzhi / Python3-Spider

Improve this page

Add this topic to your repo