webscraping

When there is a webpage served under /folder/file1.html as well as under /folder, this creates a conflict:
In the first case, suckit creates a local folder, and in the second case it wants to save the webpage at the same path as the folder, crashing:

[ERROR] Couldn't create fusor.net/old-boards/songs.com: Is a directory (os error 21)

thread '<unnamed>' panicked at 'Couldn't create fu

Download by file extension
Download by mimetype, e.g. png should also match image/png mimetype

dude scrape ... --download png,jpg  # download all png and jpg files
dude scrape ... --download *  # download all files

webscraping

Here are 4,547 public repositories matching this topic...

huginn / huginn

alirezamika / autoscraper

niespodd / browser-fingerprinting

anaskhan96 / soup

holgerd77 / django-dynamic-scraper

maxhumber / gazpacho

Skallwar / suckit

Panic when folder path with dot serves a webpage

Fonts download support

Describe the benefit of each feature in the README

benibela / xidel

[Request] Integrate the EXPath Binary Module

openaustralia / morph

TheCodeMonks / NYTimes-App

chris-greening / instascrape

m8r0wn / CrossLinked

rootVIII / proxy_requests

ayushi7rawat / Youtube-Projects

jchao01 / TradingView-data-scraper

salimk / Rcrawler

yusuzech / r-web-scraping-cheat-sheet

dmi3kno / polite

roniemartinez / dude

Option to download/save files by extension

Selector for JSON contents

Add Autoscraper to Project

davidteather / TikTokBot

owainlewis / falkor

mthipparthi / operating-systems-three-easy-pieces

decryptr / decryptr

sushant10 / HQ_Bot

milaan9 / 91_Python_Mini_Projects

s32x / anirip

0xPrateek / Stardox

feddelegrand7 / ralger

pwlmaciejewski / imghash

CuriousLearner / GeeksForGeeksScrapper

Improve this page

Add this topic to your repo