Skip to content
#

text-extraction

Here are 118 public repositories matching this topic...

gunnsth
gunnsth commented Oct 21, 2019

Is your feature request related to a problem? Please describe.
The problem is inefficiency when simply looking for a single operand and then stopping processing.
For example, if only looking for a single colored pixel in a page.

Describe the solution you'd like
It would make sense to be able to set a stop flag on the processor and return out of the handler, which would cause the proc

adbar
adbar commented Jan 9, 2020

I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.

Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more