#

text-processing

Here are 1,007 public repositories matching this topic...

learnbyexample / Command-line-text-processing

Star

⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨

ruby linux command-line regex perl ebook awk sed text-processing grep

Updated Oct 9, 2021
Shell

google / diff-match-patch

Star

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

diff match patch text-processing difference

Updated Jul 21, 2021
Python

chmln / sd

Star

Intuitive find & replace CLI (sed alternative)

rust cli terminal command-line regex text-processing

Updated Oct 8, 2021
Rust

fastnlp / fastNLP

Star

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

natural-language-processing deep-learning text-classification chinese-nlp text-processing nlp-parsing nlp-library

Updated Sep 25, 2021
Python

kk7nc / Text_Classification

Star

Text Classification Algorithms: A Survey

deep-learning random-forest text-classification recurrent-neural-networks naive-bayes-classifier dimensionality-reduction logistic-regression document-classification convolutional-neural-networks text-processing decision-trees boosting-algorithms support-vector-machines hierarchical-attention-networks nlp-machine-learning conditional-random-fields k-nearest-neighbours deep-belief-network rocchio-algorithm deep-neural-network

Updated Apr 9, 2021
Python

pyparsing / pyparsing

Star

Python library for creating PEG parsers

python parsing parser-combinators python3 parsing-expression-grammar python-3 text-processing python-2 python2 parsing-library peg-parsers

Updated Oct 13, 2021
Python

birchb1024 / frangipanni

Star

Program to convert lines of text into a tree structure.

go golang tree-structure text-processing

Updated Jun 13, 2021
Go

abadojack / whatlanggo

Star

Natural language detection library for Go

nlp go language text-processing

Updated Jan 15, 2021
Go

sstadick / hck

Star

A sharp cut(1) clone.

rust command-line text-processing

Updated Sep 27, 2021
Rust

cbaziotis / ekphrasis

Star

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

nlp tokenizer text-processing semeval nlp-library word-segmentation spelling-correction tokenization text-segmentation spell-corrector word-normalization

Updated Feb 8, 2021
Python

BurntSushi / aho-corasick

Star

A fast implementation of Aho-Corasick in Rust.

search finite-state-machine text-processing aho-corasick substring-matching

Updated Sep 15, 2021
Rust

derek73 / python-nameparser

Star

A simple Python module for parsing human names into their individual components

python text-processing text-parser python-module

Updated Jun 22, 2021
Python

open-korean-text / open-korean-text

Star

Open Korean Text Processor - An Open-source Korean Text Processor

natural-language-processing tokenizer korean text-processing korean-text-processing korean-tokenizer

Updated Mar 1, 2021
Scala

proycon / pynlpl

Star

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

python nlp machine-learning natural-language-processing library linguistics computational-linguistics text-processing nlp-library search-algorithms evaluation-metrics folia language-modelling

Updated Mar 13, 2019
Python

andrewbihl / bsed

Star

Open

Allow catch-all line commands

andrewbihl commented Jan 27, 2020

I'd like to be able to run commands on all lines of a file. For example, bsed wrap lines with " should execute on all lines of the file. Current workaround is to include some trivial filter like wrap lines containing '.' with "

Read more

good first issue Easy

airbnb / artificial-adversary

Star

🗣️ Tool to generate adversarial text examples and test machine learning models against them

python spam data-science machine-learning text-mining data-mining text-classification metrics text text-analysis python3 classification text-processing python2 spam-filtering spam-detection spam-classification adversarial-examples black-box-attacks black-box-benchmarking

Updated Oct 14, 2018
Python

pemistahl / lingua-go

Star

👄 The most accurate natural language detection library in the Go ecosystem, suitable for long and short text alike

nlp go natural-language-processing language-detection language-modeling golang-library text-processing nlp-machine-learning language-recognition language-processing language-identification language-classification

Updated Jul 8, 2021
Go

textpipe / textpipe

Star

Textpipe: clean and extract metadata from text

nlp text-analysis named-entities named-entity-recognition text-processing language-identification

Updated Jun 9, 2021
Python

gagolews / stringi

Star

Open

stringi cheat sheet

2

waynelapierre commented Jan 1, 2021

Are there any cheat sheets of stringi available? Like this one of stringr: http://edrub.in/CheatSheets/cheatSheetStringr.pdf

It would be more efficient to have a cheat sheet since R base, stringr, and stringi have different but similar types of syntax, which could be confusing some times.

Read more

documentation good first issue

BurntSushi / regex-automata

Star

A low level regular expression library that uses deterministic finite automata.

rust automata regex regexp text-processing nfa automaton dfa regex-engine

Updated Sep 17, 2021
Rust

PyKoSpacing

haven-jeon / PyKoSpacing

Star

Automatic Korean word spacing with Python

nlp text-processing spacing korean-nlp

Updated Oct 11, 2021
Python

pyarabic

linuxscout / pyarabic

Star

pyarabic

text-processing nlp-library arabic-language

Updated Aug 20, 2021
Python

rust-unic

open-i18n / rust-unic

Star

Open

Support for stdin

2

ad-si commented Jan 8, 2019

This should work:

cat test.txt | unic-inspector

Read more

enhancement help wanted good first issue A: apps

ikegami-yukino / jaconv

Sponsor Star

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

transliteration japanese-language text-processing pure-python preprocessing character-converter japanese-kana julius

Updated Oct 11, 2021
Python

textvec / textvec

Star

Text vectorization tool to outperform TFIDF for classification tasks

python nlp machine-learning natural-language-processing text-classification text-analysis tf-idf text-processing

Updated Dec 3, 2020
Python

s3nh / text-detector

Sponsor Star

Tool which allow you to detect and translate text.

nlp recognition deep-learning text craft pytorch text-recognition text-processing ocr-recognition crnn scene-text-detection scene-text-detectors

Updated Sep 10, 2019
Python

NIHOPA / NLPre

Star

Python library for Natural Language Preprocessing (NLPre)

python nlp natural-language-processing text-processing nlp-parsing

Updated Jun 29, 2021
Python

himkt / konoha

Sponsor Star

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

nlp natural-language-processing japanese text-processing mecab kytea sudachi allennlp sentencepiece janome

Updated Sep 26, 2021
Python

hakatashi / japanese.js

Star

Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.

javascript utility katakana hiragana japanese text-processing romanize

Updated Aug 27, 2020
JavaScript

assafmo / xioc

Star

Extract indicators of compromise from text, including "escaped" ones.

ioc text-mining data-mining command-line regex regexp extract extraction command-line-tool text-processing iocs defang indicators-of-compromise escaping

Updated Apr 19, 2020
Go

Improve this page

Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."