-
Updated
Oct 9, 2021 - Shell
#
text-processing
Here are 1,007 public repositories matching this topic...
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
-
Updated
Jul 21, 2021 - Python
Intuitive find & replace CLI (sed alternative)
-
Updated
Oct 8, 2021 - Rust
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
natural-language-processing
deep-learning
text-classification
chinese-nlp
text-processing
nlp-parsing
nlp-library
-
Updated
Sep 25, 2021 - Python
Text Classification Algorithms: A Survey
deep-learning
random-forest
text-classification
recurrent-neural-networks
naive-bayes-classifier
dimensionality-reduction
logistic-regression
document-classification
convolutional-neural-networks
text-processing
decision-trees
boosting-algorithms
support-vector-machines
hierarchical-attention-networks
nlp-machine-learning
conditional-random-fields
k-nearest-neighbours
deep-belief-network
rocchio-algorithm
deep-neural-network
-
Updated
Apr 9, 2021 - Python
Python library for creating PEG parsers
python
parsing
parser-combinators
python3
parsing-expression-grammar
python-3
text-processing
python-2
python2
parsing-library
peg-parsers
-
Updated
Oct 13, 2021 - Python
Program to convert lines of text into a tree structure.
-
Updated
Jun 13, 2021 - Go
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp
tokenizer
text-processing
semeval
nlp-library
word-segmentation
spelling-correction
tokenization
text-segmentation
spell-corrector
word-normalization
-
Updated
Feb 8, 2021 - Python
A fast implementation of Aho-Corasick in Rust.
-
Updated
Sep 15, 2021 - Rust
A simple Python module for parsing human names into their individual components
-
Updated
Jun 22, 2021 - Python
Open Korean Text Processor - An Open-source Korean Text Processor
natural-language-processing
tokenizer
korean
text-processing
korean-text-processing
korean-tokenizer
-
Updated
Mar 1, 2021 - Scala
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
python
nlp
machine-learning
natural-language-processing
library
linguistics
computational-linguistics
text-processing
nlp-library
search-algorithms
evaluation-metrics
folia
language-modelling
-
Updated
Mar 13, 2019 - Python
python
spam
data-science
machine-learning
text-mining
data-mining
text-classification
metrics
text
text-analysis
python3
classification
text-processing
python2
spam-filtering
spam-detection
spam-classification
adversarial-examples
black-box-attacks
black-box-benchmarking
-
Updated
Oct 14, 2018 - Python
nlp
go
natural-language-processing
language-detection
language-modeling
golang-library
text-processing
nlp-machine-learning
language-recognition
language-processing
language-identification
language-classification
-
Updated
Jul 8, 2021 - Go
Textpipe: clean and extract metadata from text
-
Updated
Jun 9, 2021 - Python
Open
stringi cheat sheet
2
waynelapierre
commented
Jan 1, 2021
Are there any cheat sheets of stringi available? Like this one of stringr: http://edrub.in/CheatSheets/cheatSheetStringr.pdf
It would be more efficient to have a cheat sheet since R base, stringr, and stringi have different but similar types of syntax, which could be confusing some times.
A low level regular expression library that uses deterministic finite automata.
-
Updated
Sep 17, 2021 - Rust
Automatic Korean word spacing with Python
-
Updated
Oct 11, 2021 - Python
Open
Support for stdin
2
ad-si
commented
Jan 8, 2019
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
transliteration
japanese-language
text-processing
pure-python
preprocessing
character-converter
japanese-kana
julius
-
Updated
Oct 11, 2021 - Python
Text vectorization tool to outperform TFIDF for classification tasks
python
nlp
machine-learning
natural-language-processing
text-classification
text-analysis
tf-idf
text-processing
-
Updated
Dec 3, 2020 - Python
Tool which allow you to detect and translate text.
nlp
recognition
deep-learning
text
craft
pytorch
text-recognition
text-processing
ocr-recognition
crnn
scene-text-detection
scene-text-detectors
-
Updated
Sep 10, 2019 - Python
Python library for Natural Language Preprocessing (NLPre)
-
Updated
Jun 29, 2021 - Python
nlp
natural-language-processing
japanese
text-processing
mecab
kytea
sudachi
allennlp
sentencepiece
janome
-
Updated
Sep 26, 2021 - Python
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
-
Updated
Aug 27, 2020 - JavaScript
Extract indicators of compromise from text, including "escaped" ones.
ioc
text-mining
data-mining
command-line
regex
regexp
extract
extraction
command-line-tool
text-processing
iocs
defang
indicators-of-compromise
escaping
-
Updated
Apr 19, 2020 - Go
Improve this page
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."
I'd like to be able to run commands on all lines of a file. For example,
bsed wrap lines with "
should execute on all lines of the file. Current workaround is to include some trivial filter likewrap lines containing '.' with "