Skip to content
#

text-mining

Here are 1,695 public repositories matching this topic...

texthero
henrifroese
henrifroese commented Sep 23, 2020

It would be great to have more friendly and funny doctest text content (instead of "Aha", "Text", ...). It's also nicer for users if the docstring examples are all similar.

One idea, for instance, is to use famous sentences said by movie Superheroes. Here are a few examples:

  • I have the power!
  • Flame on!
  • HULK SMASH!
  • Holy ____ Batman!
  • I am the vengeance, I am the night, I am BATMAN!
good first issue testing

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

  • Updated Mar 7, 2022
  • Shell
adbar
adbar commented Jan 9, 2020

I have mostly tested trafilatura on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.

Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com

good first issue up for grabs

Improve this page

Add a description, image, and links to the text-mining topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-mining topic, visit your repo's landing page and select "manage topics."

Learn more