Skip to content
#

tokenization

Here are 490 public repositories matching this topic...

LunaSec - Open Source AppSec platform that automatically notifies you the next time vulnerabilities like Log4Shell or node-ipc happen. Track your dependencies and builds in a centralized service. Get started in one-click via our GitHub App or host it yourself. https://github.com/apps/lunatrace-by-lunasec/

  • Updated Oct 20, 2022
  • TypeScript

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

  • Updated Sep 29, 2022
  • Python

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

  • Updated Jul 4, 2022
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."

Learn more