html-parsing

I have mostly tested htmldate on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction of a date doesn't work so far.

Please install the dateparser library beforehand as it significantly extends linguistic coverage: pipor pip3 install -U dateparser or `pi

Hi, im currently parse website that require authentication for access to some content.
So I log in with browser, copy all created cookies and add them with each parse request.

    void editRequest(BoundRequestBuilder req, * *, * *) {
        cookieService.cookies.each { req.addCookie(it) }
    }

That works like a charm until website change cookies in response - short after this my c

html-parsing

Here are 69 public repositories matching this topic...

PuerkitoBio / goquery

inikulin / parse5

cezheng / Fuzi

ruippeixotog / scala-scraper

milesj / interweave

miso-belica / jusText

bookieio / breadability

petdance / htmlparsing

adbar / htmldate

Test htmldate on further web pages and report bugs

Check the language, clarity and consistency of documentation

ange007 / HTMLp

liuderchi / ide-html

MauriceConrad / XML-Parser

digitalfondue / jfiveparse

whimtrip / jwht-scrapper

Headers from response

mohaxspb / ScpFoundationRu

peterhil / slurp

siongui / go-facebook-post-parser

whimtrip / jwht-htmltopojo

shabanali-faghani / IUST-HTMLCharDet

bradmontgomery / django-janitor

emmanuelroecker / php-simply-html

hrbrmstr / drill-html-tools

ktodorov / go-summarizer

heinrichreimer / android-wg-planer

raymccrae / swift-htmlsaxparser

rgladwell / microtesia

OpenBookPublishers / geturls

decal / cgiaudit

bestrandomnameever / MangaUpdates-iOSApp

AntoData / WebScraperAllMusic

Improve this page

Add this topic to your repo

html-parsing

Here are 69 public repositories matching this topic...

PuerkitoBio / goquery

inikulin / parse5

cezheng / Fuzi

ruippeixotog / scala-scraper

milesj / interweave

miso-belica / jusText

bookieio / breadability

petdance / htmlparsing

adbar / htmldate

Test htmldate on further web pages and report bugs

Check the language, clarity and consistency of documentation

ange007 / HTMLp

liuderchi / ide-html

MauriceConrad / XML-Parser

digitalfondue / jfiveparse

whimtrip / jwht-scrapper

Headers from response

mohaxspb / ScpFoundationRu

peterhil / slurp

siongui / go-facebook-post-parser

whimtrip / jwht-htmltopojo

shabanali-faghani / IUST-HTMLCharDet

bradmontgomery / django-janitor

emmanuelroecker / php-simply-html

hrbrmstr / drill-html-tools

ktodorov / go-summarizer

heinrichreimer / android-wg-planer

raymccrae / swift-htmlsaxparser

rgladwell / microtesia

OpenBookPublishers / geturls

decal / cgiaudit

bestrandomnameever / MangaUpdates-iOSApp

AntoData / WebScraperAllMusic

Improve this page

Add this topic to your repo

Essential cookies

Always active

Analytics cookies