Natural Language Toolkit
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
An accurate natural language detection library, suitable for short text and mixed-language text
Generalist model for NER (Extract any entity types from texts)
Extensive Language Pack for Tree-Sitter
Thai Natural Language Processing library
Microsoft Azure Text Analytics Client Library for Python
Textile processing for python.
Natural language processing augmentation library for deep neural networks
Python package for Korean natural language processing.
Pyap is an MIT Licensed text processing library, written in Python, for detecting and parsing addresses. Currently it supports USA, Canadian and British addresses.
Extract quantities from unstructured text.
Functions to preprocess and normalize text.
Module for automatic summarization of text documents and HTML pages.
NeMo text processing for ASR and TTS
Python library for processing Chinese text
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense chunking library
NLP, before and after spaCy
The goal of the Indic NLP Library is to build Python based libraries for common text processing and Natural Language Processing in Indian languages.
A text summarization and keyword extraction package based on TextRank
Nonsense String Evaluator
Natural Language Processing (NLP) library for Urdu language.
uroman is a universal romanizer. It converts text in any script to the standard Latin alphabet.
A base class for wrapping text-processing tools
A fast Voice Activity Detection and Transcription System
Wrappers for several pre-processing scripts from the Moses toolkit.
Process-Sanskrit is python library for automatic Sanskrit text annotation and inflected dictionary search
Python ctypes bindings for reliq
Blazing-fast Thai text processing library powered by Rust
Convert HTML to markdown
A command to manage a header section for a source code tree
A Python library for a _FULL_ Zalgo experience
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
pre-processing package for text strings
STAM is a library for dealing with standoff annotations on text, this is the python binding.
Identification and conversion functions for Chinese text processing
A library for extracting abbreviations from text.
A library for augmenting text for natural language processing applications.
A text extraction library supporting PDFs, images, office documents and more
Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.
A MCP Server that extracts and formats Bilibili video content into structured text, optimized for LLM processing and analysis.
A MCP Server that extracts and formats Bilibili video content into structured text, optimized for LLM processing and analysis.
Python bindings for MeTA
Aspose.PSD for Python via .NET is a standalone API to read, write, process, convert Adobe Photoshop PSD, PSB formats without needing to install Adobe Photoshop® and AI files without Adobe Illustrator®
An augmentation library based on SpaCy for joint augmentation of text and labels.