Skip to content
#

web-crawler

Here are 494 public repositories matching this topic...

crawlab
seamusic
seamusic commented Feb 15, 2020

Bug 描述
按教程文档说明的,使用docker-compose up -d 安装启动后,直接执行task报错
不知道哪里有问题呢?
我的docker运行环境是win10

`2020-02-15 15:58:04 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: xueqiu)
22020-02-15 15:58:04 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.6.9 (default, Nov 7 2019, 10:44:02) - [GCC 8.3.0], pyOpenSSL 19

jnioche
jnioche commented Oct 1, 2018

Just like it's done in ES, we could route the documents in the statusupdaterbolt based on the host / name or IP and in the spouts check that the number of instances is equal to the # of shards and filter the queries per shard accordingly.

At the moment, we can have only one instance of a spout.

https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html

essiembre
essiembre commented Feb 24, 2020

@LeMoussel @essiembre Thanks, I would be interested to see that as I might have to write a committer myself, as I have to find a way to send crawled docs to temporary storage for further processing which is not possible within the Norconex products.
At the risk of widening this thread too much, is a "committer" the right component to be doing that in? I mean taking the actual crawled files (wheth

benhalverson
benhalverson commented Oct 16, 2017

Environment:

@angular/cli: 1.4.7
node: 8.5.0
os: darwin x64
@angular/common: 4.0.0
@angular/core: 4.0.0
@angular/forms: 4.0.0
@angular/http: 4.0.0
@angular/platform-browser: 4.0.0
@angular/platform-browser-dynamic: 4.0.0
@angular/router: 4.0.0
@angular/cli: 1.4.7
@angular/compiler: 4.0.0
@angular/compiler-cli: 4.0.0
typescript: 2.2.2

Steps to reproduce
siteshooter -in

Improve this page

Add a description, image, and links to the web-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawler topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.