scraping-websites
Here are 1,084 public repositories matching this topic...
-
Updated
Dec 21, 2021 - Python
-
Updated
Nov 22, 2018 - JavaScript
-
Updated
Nov 30, 2021 - Python
-
Updated
Nov 13, 2021 - Python
Currently, in master, the logs are stored in a non-optional manner.
I think a good improvement for this would be to make the file logging optional, as not everyone has log rotation configured or needs logging (especially ad-hoc users).
Mentioned first here: elixir-crawly/crawly#155 (comment)
Some improvements i can think of is:
- optional switch to enabl
-
Updated
Oct 18, 2021 - Python
-
Updated
Jun 12, 2020 - Go
-
Updated
Apr 15, 2021 - TypeScript
-
Updated
Jan 31, 2021 - Python
-
Updated
Jan 3, 2022 - Python
-
Updated
Jun 22, 2021 - TypeScript
It would be beneficial to return the URL of the sitemap.xml file directly.
-
Updated
Jul 17, 2021 - Go
-
Updated
Feb 28, 2019 - Python
-
Updated
Jun 29, 2021 - C#
-
Updated
Nov 4, 2021 - Java
-
Updated
Dec 12, 2021 - Python
Is your feature request related to a problem? Please describe.
It would be interesting to capture if a listing is a Buy It Now or Auction listing and if Auction how many bids were on it.
Describe the solution you'd like
Another two columns on the df for "Listing Type" and "Number of Bids" where if BIN the latter should be 0. Really we could get away with just one column as all BIN wil
Pareto Plot Warning
The idea is to have an option like 3 (Do a Google search, save the Urls found and search the emails), but search a list of phrases.
This list can be in a .txt
The option can ask for number of search results in Google
-
Updated
Apr 7, 2021 - Python
-
Updated
Dec 13, 2021 - Python
-
Updated
Aug 17, 2021 - Python
-
Updated
Dec 5, 2020 - Python
Download video
PR for download video feature will be welcome or the gem is only for audio
-
Updated
Dec 18, 2021 - JavaScript
-
Updated
Mar 24, 2021 - Python
-
Updated
Aug 23, 2021 - Python
-
Updated
Dec 29, 2021 - TypeScript
Improve this page
Add a description, image, and links to the scraping-websites topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the scraping-websites topic, visit your repo's landing page and select "manage topics."
Unless I missed something, the documentation doesn't explain how to query document metadata (searching "site:montferret.dev metadata" through Google returned nothing, neither did grepping the source code).
As an example, I tried to query the
og:url
metadata.I tried variations of
//meta[property='og:url']::attr(content)
, with or without the leading//
, and with or without the `attr(conte