scraping-websites

Unless I missed something, the documentation doesn't explain how to query document metadata (searching "site:montferret.dev metadata" through Google returned nothing, neither did grepping the source code).

As an example, I tried to query the og:url metadata.
I tried variations of //meta[property='og:url']::attr(content), with or without the leading //, and with or without the `attr(conte

Currently, in master, the logs are stored in a non-optional manner.

I think a good improvement for this would be to make the file logging optional, as not everyone has log rotation configured or needs logging (especially ad-hoc users).

Mentioned first here: elixir-crawly/crawly#155 (comment)

Some improvements i can think of is:

optional switch to enabl

It would be beneficial to return the URL of the sitemap.xml file directly.

Is your feature request related to a problem? Please describe.
It would be interesting to capture if a listing is a Buy It Now or Auction listing and if Auction how many bids were on it.

Describe the solution you'd like
Another two columns on the df for "Listing Type" and "Number of Bids" where if BIN the latter should be 0. Really we could get away with just one column as all BIN wil

The idea is to have an option like 3 (Do a Google search, save the Urls found and search the emails), but search a list of phrases.

This list can be in a .txt
The option can ask for number of search results in Google

PR for download video feature will be welcome or the gem is only for audio

scraping-websites

Here are 1,084 public repositories matching this topic...

MontFerret / ferret

Anorov / cloudflare-scrape

csbun / thal

TebbaaX / Katana

Python-World / Python_and_the_Web

elixir-crawly / crawly

AmmeySaini / Edu-Mail-Generator

slotix / dataflowkit

baptisteArno / tinking

avidLearnerInProgress / python-automation-scripts

pawlaczyk / sarenka

jvandenaardweg / linkedin-profile-scraper

spekulatius / PHPScraper

Go-phie / gophie

voliveirajr / seleniumcrawler

RyuzakiH / CloudflareSolverRe

JusticeRage / ApkTrack

fedecalendino / nintendeals

alash3al / scraply

driscoll42 / ebayMarketAnalyzer

DiegoCaraballo / Email-extractor

fernandod1 / Instagram-to-discord

Cartmanishere / zippyshare-scraper

lkuffo / web-scraping

kennethreitz / requests-html

jdaviderb / youtube-audio

unixfox / pupflare

alpdias / instagram-bot

johnbumgarner / newspaper3_usage_overview

satyawikananda / anitop

Improve this page

Add this topic to your repo