-
Updated
Dec 7, 2021
#
text-mining
Here are 1,695 public repositories matching this topic...
nlp
language
machine-learning
natural-language-processing
text-mining
awesome
deep-learning
awesome-list
extract text from any document. no muss. no fuss.
-
Updated
Mar 21, 2022 - HTML
Library to scrape and clean web pages to create massive datasets.
python
nlp
data-science
natural-language-processing
text-mining
open
artificial-intelligence
language-model
-
Updated
Nov 11, 2020 - Python
Beautiful visualizations of how language differs among document types.
visualization
d3
nlp
machine-learning
natural-language-processing
text-mining
word2vec
exploratory-data-analysis
word-embeddings
sentiment
eda
topic-modeling
scatter-plot
japanese-language
stylometry
computational-social-science
text-visualization
text-as-data
stylometric
semiotic-squares
-
Updated
Nov 15, 2021 - Python
a curated list of R tutorials for Data Science, NLP and Machine Learning
-
Updated
Oct 19, 2020 - R
A curated list of resources dedicated to text summarization
nlp
machine-learning
natural-language-processing
text-mining
deep-learning
extractive-text-summarization
abstractive-text-summarization
-
Updated
Jan 19, 2022
Python package for Korean natural language processing.
-
Updated
Mar 3, 2022 - Python
Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
-
Updated
Feb 7, 2022 - TeX
Text mining using tidy tools ✨ 📄 ✨
-
Updated
Mar 23, 2022 - R
AutoPhrase: Automated Phrase Mining from Massive Text Corpora
-
Updated
Jan 27, 2022 - C++
A configurable web spider with a easy-to-use web console
-
Updated
Aug 21, 2018 - Java
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
nlp
count
machine-learning
natural-language-processing
text-mining
practice
article
text-classification
word2vec
gensim
tf-idf
-
Updated
Dec 2, 2020 - Jupyter Notebook
Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
-
Updated
Sep 18, 2021 - Python
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
natural-language-processing
text-mining
word2vec
word-embeddings
efficiency
topic-modeling
vignette
glove
vectorization
latent-dirichlet-allocation
-
Updated
Sep 19, 2020 - R
A collection of notebooks for Natural Language Processing from NLP Town
-
Updated
May 25, 2021 - Jupyter Notebook
从新浪财经、每经网、金融界、中国证券网、证券时报网上,爬取上市公司(个股)的历史新闻文本数据进行文本分析、提取特征集,然后利用SVM、随机森林等分类器进行训练,最后对实施抓取的新闻数据进行分类预测
-
Updated
Jan 20, 2021 - Python
Fast topic modeling platform
python
c-plus-plus
machine-learning
text-mining
bigdata
topic-modeling
python-api
bigartm
regularizer
-
Updated
Feb 12, 2022 - C++
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
python
search
search-engine
semantic
text-mining
ocr
osint
ui
annotation
thesaurus
text-analysis
journalism
faceted-search
named-entity-recognition
research-tool
search-interface
ontologies
skos
fulltext-search
investigative-journalism
-
Updated
Mar 7, 2022 - Shell
R package for web-based interactive topic model visualization.
-
Updated
Aug 20, 2020 - JavaScript
Repository with all what is necessary for sentiment analysis and related areas
nlp
text-mining
sentiment-analysis
text-analysis
lexicon
sentiment-lexcions
opinion-mining
sentiment-polarity
nlp-machine-learning
sentiment-classifier
sentiment-classification
-
Updated
Oct 19, 2021
Resources for learning about Text Mining and Natural Language Processing
nlp
list
machine-learning
natural-language-processing
text-mining
data-mining
awesome
sentiment-analysis
text-classification
text-analysis
topic-modeling
awesome-list
text-analytics
nlp-machine-learning
-
Updated
Jul 12, 2021
Various Algorithms for Short Text Mining
python
package
machine-learning
natural-language-processing
text-mining
algorithm
neural-network
python-library
topic-modeling
-
Updated
Dec 30, 2021 - Python
adbar
commented
Jan 9, 2020
I have mostly tested trafilatura
on a set of English, German and French web pages I had run into by surfing or during web crawls. There are definitely further web pages and cases in other languages for which the extraction doesn't work so far.
Corresponding bug reports can either be filed as a list in an issue like this one or in the code as XPath expressions in [xpaths.py](https://github.com
Language, Knowledge, Cognition
python
nlp
macos
natural-language-processing
text-mining
algorithm
study
knowledge
philosophy
text-analysis
entities
artificial-intelligence
knowledge-graph
cognitive-science
knowledge-base
nature
knowledge-representation
computational-social-science
natural-language-understanding
hypergraphs
beliefs
digital-media
hypergraph-knowledge
-
Updated
Feb 24, 2022 - C
RMDL: Random Multimodel Deep Learning for Classification
machine-learning
information-retrieval
text-mining
data-mining
deep-neural-networks
deep-learning
text-classification
tensorflow
keras
cnn
dnn
recurrent-neural-networks
classification
rnn
image-classification
ensemble-learning
convolutional-neural-networks
multimodel
-
Updated
Mar 8, 2022 - Python
python
spam
data-science
machine-learning
text-mining
data-mining
text-classification
metrics
text
text-analysis
python3
classification
text-processing
python2
spam-filtering
spam-detection
spam-classification
adversarial-examples
black-box-attacks
black-box-benchmarking
-
Updated
Jan 7, 2022 - Python
Machine Learning Lectures at the European Space Agency (ESA) in 2018
machine-learning
text-mining
lectures
deep-learning
neural-network
random-forest
clustering
linear-regression
pca
topic-modeling
machinelearning
tf-idf
decision-trees
support-vector-machines
lecture-videos
lecture-material
lecture-slides
anomaly-detection
-
Updated
Aug 29, 2020 - Jupyter Notebook
Improve this page
Add a description, image, and links to the text-mining topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the text-mining topic, visit your repo's landing page and select "manage topics."
It would be great to have more friendly and funny doctest text content (instead of "Aha", "Text", ...). It's also nicer for users if the docstring examples are all similar.
One idea, for instance, is to use famous sentences said by movie Superheroes. Here are a few examples: