-
Updated
Apr 29, 2022 - Python
#
tokenization
Here are 417 public repositories matching this topic...
python
nlp
data-science
machine-learning
natural-language-processing
ai
deep-learning
neural-network
text-classification
cython
artificial-intelligence
spacy
named-entity-recognition
neural-networks
nlp-library
tokenization
entity-linking
LunaSec - Open Source AppSec platform that automatically notifies you the next time vulnerabilities like Log4Shell or node-ipc happen. Track your dependencies and builds in a centralized service. Get started in one-click via our GitHub App or host it yourself. https://github.com/apps/lunatrace-by-lunasec/
security
dependency-analysis
cybersecurity
pci-dss
web-security
compliance
hardening
scanning
cve-scanning
tokenization
gdpr
security-tools
devsecops
zero-trust
soc2
privacy-by-design
sbom
scanning-tool
sbom-generator
log4shell
-
Updated
Apr 30, 2022 - TypeScript
A secure user directory built for developers to comply with the GDPR
security
privacy
encryption
database
vault
application-server
compliance
passportjs
tokenization
gdpr
data-protection
legaltech
anonymization
pii
data-anonymization
secure-storage
privacy-by-design
user-consent
piidata
ccpa
-
Updated
Apr 21, 2022 - Go
Unsupervised text tokenizer focused on computational efficiency
-
Updated
Jan 28, 2021 - C++
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
multilingual
nlp
machine-learning
natural-language-processing
pytorch
artificial-intelligence
adapters
deeplearning
language-model
universal-dependencies
dependency-parsing
tokenization
lemmatization
sentence-segmentation
morphological-tagging
part-of-speech-tagging
xlm-roberta
-
Updated
Mar 29, 2022 - Python
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp
tokenizer
text-processing
semeval
nlp-library
word-segmentation
spelling-correction
tokenization
text-segmentation
spell-corrector
word-normalization
-
Updated
Feb 8, 2021 - Python
nlp
machine-learning
natural-language-processing
text-classification
spacy
visualizer
named-entity-recognition
ner
dependency-parsing
tokenization
word-vectors
visualizers
streamlit
part-of-speech-tagging
-
Updated
Apr 7, 2022 - Python
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
parse
machine-translation
embeddings
information-extraction
dependency-parser
universal-dependencies
part-of-speech-tagger
dependency-parsing
tokenization
lemmatization
sentence-splitting
nlp-cube
language-pipeline
-
Updated
Feb 10, 2022 - Python
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
-
Updated
May 17, 2021 - PHP
Open
Unzip Slides
AmoDinho
commented
Feb 28, 2018
The slides for most of the courses need to be unzipped.
Open
Format Code
Open
Format Instructions
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
c
syntax-highlighting
c-plus-plus
parsing
objective-c
code
llvm
static-analysis
clang
source
diagnostics
tokenization
-
Updated
Aug 2, 2021 - C
joojis
opened
Oct 8, 2018
Rule-based token, sentence segmentation for Russian language
-
Updated
Feb 18, 2021 - Python
Fast and customizable text tokenization library with BPE and SentencePiece support
python
unicode
natural-language-processing
cpp
icu
tokenizer
machine-translation
tokenization
bpe
sentencepiece
-
Updated
Mar 7, 2022 - C++
-
Updated
Apr 19, 2022 - Rust
An official Sudachi clone in Rust 🦀
-
Updated
Mar 1, 2022 - Rust
TokenScript schema, specs and paper
-
Updated
Apr 29, 2022 - HTML
Simple NLP in Rust with Python bindings
-
Updated
Jun 15, 2021 - Rust
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
nlp
ipython-notebook
named-entity-recognition
bag-of-words
tf-idf
stopwords
tokenization
stemming
lemmatization
sentence-segmentation
termfrequency
partofspeech-tagger
vocabulary-matching
python4everybody
python4datascience
tutor-milaan9
inversedocumentfrequency
-
Updated
Nov 9, 2021 - Jupyter Notebook
High performance tokenizers for natural language processing and other related tasks
-
Updated
Dec 30, 2021 - Julia
Implementation of the GBST block from the Charformer paper, in Pytorch
-
Updated
Jul 15, 2021 - Python
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
machine-learning
natural-language-processing
named-entity-recognition
dependency-parser
ner
pos-tagging
tokenization
text-annotation
lemmatization
tweet-analysis
twitter-nlp
nlp-toolkit
-
Updated
Apr 21, 2022 - Python
Language Modeling and Text Classification in Malayalam Language using ULMFiT
-
Updated
Apr 6, 2022 - Jupyter Notebook
Collection of Wongnai's datasets
-
Updated
Aug 26, 2019
firthmj
commented
Jan 21, 2021
How easy would it be to change the library to have versions of the encode and decode functions where the payload JSON was provided / returned just as the JSON text?
There are other good JSON generation / parsing libraries available, and some people may wish to use them to generate or process the payload, rather than the built in claim processing.
python
nlp
docker
spacy
named-entity-recognition
sense2vec
part-of-speech-tagger
tokenization
sentence-segmentation
-
Updated
Oct 11, 2021 - Python
Natural Language Processing Toolkit in Golang
-
Updated
May 9, 2020 - Go
Improve this page
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."
Currently different transaction types show different details. Asset transactions don't show "To" and "From". On RVN Transactions the "From" is almost always listed as "unknown". Maybe change this to list a sending address and if there is a label on one of the sending addresses, put the label after it in parenthesis like it has handled in the "To" section. See inconsistencies below:
![assettx](htt