-
Updated
Feb 3, 2021 - Python
#
tokenization
Here are 263 public repositories matching this topic...
python
nlp
data-science
machine-learning
natural-language-processing
ai
deep-learning
neural-network
text-classification
cython
artificial-intelligence
spacy
named-entity-recognition
neural-networks
nlp-library
tokenization
entity-linking
Unsupervised text tokenizer focused on computational efficiency
-
Updated
Jan 28, 2021 - C++
Ravencoin Core integration/staging tree
-
Updated
Jan 27, 2021 - C
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp
tokenizer
text-processing
semeval
nlp-library
word-segmentation
spelling-correction
tokenization
text-segmentation
spell-corrector
word-normalization
-
Updated
Aug 13, 2020 - Python
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
-
Updated
Dec 20, 2020 - PHP
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
parse
machine-translation
embeddings
information-extraction
dependency-parser
universal-dependencies
part-of-speech-tagger
dependency-parsing
tokenization
lemmatization
sentence-splitting
nlp-cube
language-pipeline
-
Updated
Feb 2, 2021 - Python
nlp
machine-learning
natural-language-processing
text-classification
spacy
visualizer
named-entity-recognition
ner
dependency-parsing
tokenization
word-vectors
visualizers
streamlit
part-of-speech-tagging
-
Updated
Jan 27, 2021 - Python
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
c
syntax-highlighting
c-plus-plus
parsing
objective-c
code
llvm
static-analysis
clang
source
diagnostics
tokenization
-
Updated
May 9, 2017 - C
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
multilingual
nlp
machine-learning
natural-language-processing
pytorch
artificial-intelligence
adapters
deeplearning
language-model
universal-dependencies
dependency-parsing
tokenization
lemmatization
sentence-segmentation
morphological-tagging
part-of-speech-tagging
xlm-roberta
-
Updated
Jan 27, 2021 - Python
Open
Unzip Slides
AmoDinho
commented
Feb 28, 2018
The slides for most of the courses need to be unzipped.
Open
Format Code
Open
Format Instructions
Rule-based token, sentence segmentation for Russian language
-
Updated
Nov 27, 2020 - Python
Fast and customizable text tokenization library with BPE and SentencePiece support
python
unicode
natural-language-processing
cpp
icu
tokenizer
machine-translation
tokenization
bpe
sentencepiece
-
Updated
Jan 27, 2021 - C++
Secure storage for personal records built to comply with GDPR
golang
security
privacy
encryption
database
vault
application-server
compliance
passportjs
tokenization
gdpr
legaltech
anonymization
pii
data-anonymization
privacy-by-design
user-consent
piidata
ccpa
gdpr-requirements
-
Updated
Jan 20, 2021 - Go
Simple NLP in Rust with Python bindings
-
Updated
Jul 22, 2020 - Rust
Language Modeling and Text Classification in Malayalam Language using ULMFiT
-
Updated
Feb 2, 2021 - Jupyter Notebook
An unofficial Sudachi clone in Rust (incomplete) 🦀
-
Updated
Aug 27, 2020 - Rust
Collection of Wongnai's datasets
-
Updated
Aug 26, 2019
High performance tokenizers for natural language processing and other related tasks
-
Updated
Jan 17, 2021 - Julia
python
nlp
docker
spacy
named-entity-recognition
sense2vec
part-of-speech-tagger
tokenization
sentence-segmentation
-
Updated
Oct 1, 2020 - Python
Natural Language Processing Toolkit in Golang
-
Updated
May 9, 2020 - Go
Tokenize, encrypt/decrypt, mask your data on the fly with Vaulty proxy
-
Updated
Jan 24, 2021 - Go
Multilingual tokenizer that automatically tags each token with its type
multilingual
german
tokenizer
tagging
latin
french
hindi
wink
devanagari
marathi
tokenization
konkani
-
Updated
Nov 24, 2020 - JavaScript
POS Tagger, lemmatizer and stemmer for french language in javascript
-
Updated
Sep 13, 2017 - JavaScript
Smart Language Model
-
Updated
Sep 20, 2020 - C++
Custom Russian tokenizer for spaCy
-
Updated
May 14, 2019 - Python
Rosette API Client Library for Python
python
nlp
machine-learning
natural-language-processing
text-mining
sentiment-analysis
text
morphology
text-analysis
language-detection
fuzzy-matching
name-generation
tokenization
categorization
lemmatization
relation-extraction
entity-extraction
language-identification
name-translation
name-similarity
-
Updated
Jun 16, 2020 - Python
Improve this page
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."