-
Updated
Jun 25, 2021 - Python
#
tokenization
Here are 319 public repositories matching this topic...
python
nlp
data-science
machine-learning
natural-language-processing
ai
deep-learning
neural-network
text-classification
cython
artificial-intelligence
spacy
named-entity-recognition
neural-networks
nlp-library
tokenization
entity-linking
Ravencoin Core integration/staging tree
-
Updated
Jun 19, 2021 - C
Secure vault for customer records built to comply with GDPR
security
privacy
encryption
database
vault
application-server
compliance
passportjs
tokenization
gdpr
data-protection
legaltech
anonymization
pii
data-anonymization
secure-storage
privacy-by-design
user-consent
piidata
ccpa
-
Updated
Jun 13, 2021 - Go
Unsupervised text tokenizer focused on computational efficiency
-
Updated
Jan 28, 2021 - C++
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
multilingual
nlp
machine-learning
natural-language-processing
pytorch
artificial-intelligence
adapters
deeplearning
language-model
universal-dependencies
dependency-parsing
tokenization
lemmatization
sentence-segmentation
morphological-tagging
part-of-speech-tagging
xlm-roberta
-
Updated
Jun 23, 2021 - Python
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp
tokenizer
text-processing
semeval
nlp-library
word-segmentation
spelling-correction
tokenization
text-segmentation
spell-corrector
word-normalization
-
Updated
Feb 8, 2021 - Python
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
-
Updated
May 17, 2021 - PHP
nlp
machine-learning
natural-language-processing
text-classification
spacy
visualizer
named-entity-recognition
ner
dependency-parsing
tokenization
word-vectors
visualizers
streamlit
part-of-speech-tagging
-
Updated
Mar 11, 2021 - Python
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
parse
machine-translation
embeddings
information-extraction
dependency-parser
universal-dependencies
part-of-speech-tagger
dependency-parsing
tokenization
lemmatization
sentence-splitting
nlp-cube
language-pipeline
-
Updated
Jun 10, 2021 - Python
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
c
syntax-highlighting
c-plus-plus
parsing
objective-c
code
llvm
static-analysis
clang
source
diagnostics
tokenization
-
Updated
May 9, 2017 - C
Open
Unzip Slides
Open
Format Code
Open
Format Instructions
Rule-based token, sentence segmentation for Russian language
-
Updated
Feb 18, 2021 - Python
Fast and customizable text tokenization library with BPE and SentencePiece support
python
unicode
natural-language-processing
cpp
icu
tokenizer
machine-translation
tokenization
bpe
sentencepiece
-
Updated
Jun 24, 2021 - C++
Simple NLP in Rust with Python bindings
-
Updated
Jun 15, 2021 - Rust
TokenScript schema, specs and paper
-
Updated
Apr 9, 2021 - HTML
High performance tokenizers for natural language processing and other related tasks
-
Updated
Apr 10, 2021 - Julia
Language Modeling and Text Classification in Malayalam Language using ULMFiT
-
Updated
Jun 8, 2021 - Jupyter Notebook
An unofficial Sudachi clone in Rust (incomplete) 🦀
-
Updated
Aug 27, 2020 - Rust
Collection of Wongnai's datasets
-
Updated
Aug 26, 2019
python
nlp
docker
spacy
named-entity-recognition
sense2vec
part-of-speech-tagger
tokenization
sentence-segmentation
-
Updated
Jun 10, 2021 - Python
Natural Language Processing Toolkit in Golang
-
Updated
May 9, 2020 - Go
Tokenize, encrypt/decrypt, mask your data on the fly with Vaulty proxy
-
Updated
Jan 24, 2021 - Go
firthmj
commented
Jan 21, 2021
How easy would it be to change the library to have versions of the encode and decode functions where the payload JSON was provided / returned just as the JSON text?
There are other good JSON generation / parsing libraries available, and some people may wish to use them to generate or process the payload, rather than the built in claim processing.
Multilingual tokenizer that automatically tags each token with its type
multilingual
german
tokenizer
tagging
latin
french
hindi
wink
devanagari
marathi
tokenization
konkani
-
Updated
Jun 7, 2021 - JavaScript
Custom Russian tokenizer for spaCy
-
Updated
May 14, 2019 - Python
Smart Language Model
-
Updated
Sep 20, 2020 - C++
POS Tagger, lemmatizer and stemmer for french language in javascript
-
Updated
Sep 13, 2017 - JavaScript
Improve this page
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."
The slides for most of the courses need to be unzipped.