This curated list contains 840 awesome open-source projects with a total of 2.7M stars grouped into 32 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome!
Contents
- Machine Learning Frameworks 54 projects
- Data Visualization 49 projects
- Text Data & NLP 82 projects
- Image Data 49 projects
- Graph Data 29 projects
- Audio Data 23 projects
- Geospatial Data 22 projects
- Financial Data 23 projects
- Time Series Data 20 projects
- Medical Data 19 projects
- Optical Character Recognition 11 projects
- Data Containers & Structures 28 projects
- Data Loading & Extraction 23 projects
- Web Scraping & Crawling 1 projects
- Data Pipelines & Streaming 35 projects
- Distributed Machine Learning 26 projects
- Hyperparameter Optimization & AutoML 45 projects
- Reinforcement Learning 19 projects
- Recommender Systems 14 projects
- Privacy Machine Learning 6 projects
- Workflow & Experiment Tracking 35 projects
- Model Serialization & Conversion 11 projects
- Model Interpretability 46 projects
- Vector Similarity Search (ANN) 12 projects
- Probabilistics & Statistics 21 projects
- Adversarial Robustness 8 projects
- GPU Utilities 18 projects
- Tensorflow Utilities 13 projects
- Sklearn Utilities 17 projects
- Pytorch Utilities 27 projects
- Database Clients 1 projects
- Others 52 projects
Explanation
π₯ π₯ π₯ Combined project-quality scoreβοΈ Star count from GitHubπ£ New project (less than 6 months old)π€ Inactive project (6 months no activity)π Dead project (12 months no activity)π π Project is trending up or downβ Project was recently addedβοΈ Warning (e.g. missing/risky license)π¨βπ» Contributors count from GitHubπ Fork count from GitHubπ Issue count from GitHubβ±οΈ Last update timestamp on package managerπ₯ Download count from package managerπ¦ Number of dependent projectsTensorflow related project
Sklearn related project
PyTorch related project
MxNet related project
Apache Spark related project
Jupyter related project
PaddlePaddle related project
Pandas related project
Machine Learning Frameworks
General-purpose machine learning and deep learning frameworks.
Tensorflow (π₯ 44 Β· β 160K) - An Open Source Machine Learning Framework for Everyone. Apache-2

-
GitHub (
π¨βπ» 3.5K Β·π 84K Β·π¦ 120K Β·π 31K - 13% open Β·β±οΈ 25.02.2021):git clone https://github.com/tensorflow/tensorflow
-
PyPi (
π₯ 4M / month Β·π¦ 23K Β·β±οΈ 21.01.2021):pip install tensorflow
-
Conda (
π₯ 2.3M Β·β±οΈ 15.07.2020):conda install -c conda-forge tensorflow
-
Docker Hub (
π₯ 48M Β·β 1.8K Β·β±οΈ 25.02.2021):docker pull tensorflow/tensorflow
scikit-learn (π₯ 37 Β· β 45K) - scikit-learn: machine learning in Python. BSD-3

-
GitHub (
π¨βπ» 2.2K Β·π 21K Β·π₯ 660 Β·π¦ 190K Β·π 9.1K - 25% open Β·β±οΈ 25.02.2021):git clone https://github.com/scikit-learn/scikit-learn
-
PyPi (
π₯ 7.7M / month Β·π¦ 38K Β·β±οΈ 19.01.2021):pip install scikit-learn
-
Conda (
π₯ 7.1M Β·β±οΈ 21.01.2021):conda install -c conda-forge scikit-learn
StatsModels (π₯ 36 Β· β 6K) - Statsmodels: statistical modeling and econometrics in Python. BSD-3
-
GitHub (
π¨βπ» 300 Β·π 2.2K Β·π₯ 25 Β·π¦ 38K Β·π 4.3K - 47% open Β·β±οΈ 19.02.2021):git clone https://github.com/statsmodels/statsmodels
-
PyPi (
π₯ 2.3M / month Β·π¦ 6.7K Β·β±οΈ 02.02.2021):pip install statsmodels
-
Conda (
π₯ 3.4M Β·β±οΈ 15.02.2021):conda install -c conda-forge statsmodels
XGBoost (π₯ 35 Β· β 21K) - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or.. Apache-2
-
GitHub (
π¨βπ» 510 Β·π 7.9K Β·π₯ 1.9K Β·π¦ 14K Β·π 3.9K - 6% open Β·β±οΈ 25.02.2021):git clone https://github.com/dmlc/xgboost
-
PyPi (
π₯ 1.9M / month Β·π¦ 1.6K Β·β±οΈ 20.01.2021):pip install xgboost
-
Conda (
π₯ 1.4M Β·β±οΈ 10.12.2020):conda install -c conda-forge xgboost
LightGBM (π₯ 35 Β· β 12K) - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT,.. MIT
-
GitHub (
π¨βπ» 210 Β·π 3.2K Β·π₯ 97K Β·π¦ 5.8K Β·π 2.1K - 4% open Β·β±οΈ 24.02.2021):git clone https://github.com/microsoft/LightGBM
-
PyPi (
π₯ 1M / month Β·π¦ 560 Β·β±οΈ 08.12.2020):pip install lightgbm
-
Conda (
π₯ 500K Β·β±οΈ 23.02.2021):conda install -c conda-forge lightgbm
Theano (π₯ 34 Β· β 9.4K) - Theano is a Python library that allows you to define, optimize, and.. BSD-3
MXNet (π₯ 33 Β· β 19K) - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning.. Apache-2

-
GitHub (
π¨βπ» 950 Β·π 6.8K Β·π₯ 24K Β·π¦ 1.8K Β·π 9.4K - 19% open Β·β±οΈ 24.02.2021):git clone https://github.com/apache/incubator-mxnet
-
PyPi (
π₯ 99K / month Β·π¦ 440 Β·β±οΈ 07.02.2021):pip install mxnet
-
Conda (
π₯ 5.8K Β·β±οΈ 29.02.2020):conda install -c anaconda mxnet
pytorch-lightning (π₯ 33 Β· β 12K) - The lightweight PyTorch wrapper for high-performance.. Apache-2

-
GitHub (
π¨βπ» 400 Β·π 1.4K Β·π₯ 110 Β·π¦ 2K Β·π 3K - 10% open Β·β±οΈ 25.02.2021):git clone https://github.com/PyTorchLightning/pytorch-lightning
-
PyPi (
π₯ 96K / month Β·π¦ 14 Β·β±οΈ 24.02.2021):pip install pytorch-lightning
-
Conda (
π₯ 46K Β·β±οΈ 24.02.2021):conda install -c conda-forge pytorch-lightning
jax (π₯ 32 Β· β 12K) - Composable transformations of Python+NumPy programs: differentiate,.. Apache-2
Thinc (π₯ 32 Β· β 2.2K) - A refreshing functional take on deep learning, compatible with your favorite.. MIT
Catboost (π₯ 31 Β· β 5.7K) - A fast, scalable, high performance Gradient Boosting on Decision.. Apache-2
PaddlePaddle (π₯ 30 Β· β 14K) - PArallel Distributed Deep LEarning: Machine Learning.. Apache-2

TFlearn (π₯ 30 Β· β 9.5K) - Deep learning library featuring a higher-level API for TensorFlow. MIT

Vowpal Wabbit (π₯ 30 Β· β 7.4K) - Vowpal Wabbit is a machine learning system which pushes the.. BSD-3
Turi Create (π₯ 29 Β· β 10K) - Turi Create simplifies the development of custom machine learning.. BSD-3
tensorpack (π₯ 28 Β· β 5.9K) - A Neural Net Training Interface on TensorFlow, with focus.. Apache-2

Ignite (π₯ 27 Β· β 3.3K) - High-level library to help with training and evaluating neural.. BSD-3

Jina (π₯ 27 Β· β 2.4K) - An easier way to build neural search on the cloud. Apache-2
-
GitHub (
π¨βπ» 86 Β·π 360 Β·π¦ 49 Β·π 660 - 7% open Β·β±οΈ 25.02.2021):git clone https://github.com/jina-ai/jina
-
PyPi (
π₯ 4K / month Β·β±οΈ 25.02.2021):pip install jina
-
Docker Hub (
π₯ 150K Β·β±οΈ 25.02.2021):docker pull jinaai/jina
Flax (π₯ 27 Β· β 1.5K) - Flax is a neural network ecosystem for JAX that is designed for.. Apache-2
jax
CNTK (π₯ 26 Β· β 17K Β· π€ ) - Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit. MIT
Neural Network Libraries (π₯ 25 Β· β 2.4K) - Neural Network Libraries. Apache-2
xLearn (π₯ 24 Β· β 2.8K Β· π€ ) - High performance, easy-to-use, and scalable machine learning (ML).. Apache-2
einops (π₯ 24 Β· β 2.5K) - Deep learning operations reinvented (for pytorch, tensorflow, jax and.. MIT
ktrain (π₯ 24 Β· β 750) - ktrain is a Python library that makes deep learning and AI more.. Apache-2

tensorflow-upstream (π₯ 24 Β· β 540) - TensorFlow ROCm port. Apache-2

SHOGUN (π₯ 23 Β· β 2.8K) - Unified and efficient Machine Learning. BSD-3
-
GitHub (
π¨βπ» 250 Β·π 1K Β·π 1.6K - 33% open Β·β±οΈ 08.12.2020):git clone https://github.com/shogun-toolbox/shogun
-
Conda (
π₯ 92K Β·β±οΈ 25.06.2018):conda install -c conda-forge shogun
-
Docker Hub (
π₯ 1.4K Β·β 1 Β·β±οΈ 31.01.2019):docker pull shogun/shogun
mace (π₯ 21 Β· β 4.3K) - MACE is a deep learning inference framework optimized for mobile.. Apache-2
-
GitHub (
π¨βπ» 56 Β·π 760 Β·π₯ 1.3K Β·π 630 - 6% open Β·β±οΈ 18.02.2021):git clone https://github.com/XiaoMi/mace
Neural Tangents (π₯ 21 Β· β 1.3K) - Fast and Easy Infinite Neural Networks in Python. Apache-2
ThunderSVM (π₯ 20 Β· β 1.3K) - ThunderSVM: A Fast SVM Library on GPUs and CPUs. Apache-2
Haiku (π₯ 20 Β· β 970) - JAX-based neural network library. Apache-2
-
GitHub (
π¨βπ» 37 Β·π 67 Β·π¦ 64 Β·π 70 - 28% open Β·β±οΈ 24.02.2021):git clone https://github.com/deepmind/dm-haiku
Torchbearer (π₯ 20 Β· β 580 Β· π ) - torchbearer: A model fitting library for PyTorch. MIT

Objax (π₯ 19 Β· β 570 Β· π£ ) - Objax is a machine learning framework that provides an Object.. Apache-2
jax
elegy (π₯ 17 Β· β 180) - Elegy is a framework-agnostic Trainer interface for the Jax.. Apache-2

jax
ThunderGBM (π₯ 16 Β· β 580) - ThunderGBM: Fast GBDTs and Random Forests on GPUs. Apache-2
NeoML (π₯ 13 Β· β 570) - Machine learning framework for both deep learning and traditional.. Apache-2
-
GitHub (
π¨βπ» 18 Β·π 80 Β·π 29 - 55% open Β·β±οΈ 20.02.2021):git clone https://github.com/neoml-lib/neoml
Show 7 hidden projects...
- dlib (
π₯ 32 Β·β 9.9K) - A toolkit for making real world machine learning and data analysis..βοΈBSL-1.0
- NuPIC (
π₯ 24 Β·β 6.2K Β·π ) - Numenta Platform for Intelligent Computing is an implementation..βοΈAGPL-3.0
- Lasagne (
π₯ 23 Β·β 3.8K Β·π ) - Lightweight library to build and train neural networks in Theano.MIT
- neon (
π₯ 22 Β·β 3.9K Β·π ) - Intel Nervana reference deep learning framework committed to best..Apache-2
- MindsDB (
π₯ 20 Β·β 3.5K) - Predictive AI layer for existing databases.βοΈGPL-3.0
- NeuPy (
π₯ 20 Β·β 660 Β·π ) - NeuPy is a Tensorflow based python library for prototyping and building..MIT
- StarSpace (
π₯ 13 Β·β 3.6K Β·π ) - Learning embeddings for classification, retrieval and ranking.MIT
Data Visualization
General-purpose and task-specific data visualization libraries.
Matplotlib (π₯ 41 Β· β 13K) - matplotlib: plotting with Python. Python-2.0
-
GitHub (
π¨βπ» 1.2K Β·π 5.6K Β·π¦ 330K Β·π 7.7K - 21% open Β·β±οΈ 20.02.2021):git clone https://github.com/matplotlib/matplotlib
-
PyPi (
π₯ 7.6M / month Β·π¦ 79K Β·β±οΈ 28.01.2021):pip install matplotlib
-
Conda (
π₯ 8.3M Β·β±οΈ 28.01.2021):conda install -c conda-forge matplotlib
Seaborn (π₯ 37 Β· β 8.1K) - Statistical data visualization using matplotlib. BSD-3
-
GitHub (
π¨βπ» 150 Β·π 1.4K Β·π₯ 140 Β·π¦ 83K Β·π 1.8K - 5% open Β·β±οΈ 17.02.2021):git clone https://github.com/mwaskom/seaborn
-
PyPi (
π₯ 1.6M / month Β·π¦ 13K Β·β±οΈ 20.12.2020):pip install seaborn
-
Conda (
π₯ 2M Β·β±οΈ 28.01.2021):conda install -c conda-forge seaborn
Plotly (π₯ 35 Β· β 9K) - The interactive graphing library for Python (includes Plotly Express). MIT
-
GitHub (
π¨βπ» 160 Β·π 1.8K Β·π¦ 5 Β·π 1.9K - 43% open Β·β±οΈ 14.01.2021):git clone https://github.com/plotly/plotly.py
-
PyPi (
π₯ 2M / month Β·π¦ 5K Β·β±οΈ 12.01.2021):pip install plotly
-
Conda (
π₯ 1.2M Β·β±οΈ 12.01.2021):conda install -c conda-forge plotly
-
NPM (
π₯ 250K / month Β·π¦ 4 Β·β±οΈ 12.01.2021):npm install plotlywidget
dash (π₯ 34 Β· β 14K) - Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required. MIT
wordcloud (π₯ 31 Β· β 7.9K) - A little word cloud generator in Python. MIT
-
GitHub (
π¨βπ» 59 Β·π 2K Β·π¦ 9.1K Β·π 440 - 21% open Β·β±οΈ 11.11.2020):git clone https://github.com/amueller/word_cloud
-
PyPi (
π₯ 180K / month Β·π¦ 1.1K Β·β±οΈ 11.11.2020):pip install wordcloud
-
Conda (
π₯ 190K Β·β±οΈ 14.01.2021):conda install -c conda-forge wordcloud
bqplot (π₯ 30 Β· β 3K) - Plotting library for IPython/Jupyter notebooks. Apache-2

-
GitHub (
π¨βπ» 51 Β·π 410 Β·π¦ 1.3K Β·π 510 - 36% open Β·β±οΈ 23.02.2021):git clone https://github.com/bqplot/bqplot
-
PyPi (
π₯ 14K / month Β·π¦ 110 Β·β±οΈ 23.02.2021):pip install bqplot
-
Conda (
π₯ 500K Β·β±οΈ 13.02.2021):conda install -c conda-forge bqplot
-
NPM (
π₯ 160K / month Β·π¦ 10 Β·β±οΈ 23.02.2021):npm install bqplot
PyQtGraph (π₯ 30 Β· β 2.3K) - Fast data visualization and GUI tools for scientific / engineering.. MIT
pandas-profiling (π₯ 29 Β· β 6.8K) - Create HTML profiling reports from pandas DataFrame.. MIT


-
GitHub (
π¨βπ» 69 Β·π 1K Β·π¦ 3.2K Β·π 440 - 16% open Β·β±οΈ 21.02.2021):git clone https://github.com/pandas-profiling/pandas-profiling
-
PyPi (
π₯ 150K / month Β·π¦ 160 Β·β±οΈ 07.02.2021):pip install pandas-profiling
-
Conda (
π₯ 110K Β·β±οΈ 20.02.2021):conda install -c conda-forge pandas-profiling
HoloViews (π₯ 29 Β· β 1.8K) - With Holoviews, your data visualizes itself. BSD-3

-
GitHub (
π¨βπ» 100 Β·π 300 Β·π 2.5K - 27% open Β·β±οΈ 23.02.2021):git clone https://github.com/holoviz/holoviews
-
PyPi (
π₯ 49K / month Β·π¦ 170 Β·β±οΈ 17.02.2021):pip install holoviews
-
Conda (
π₯ 430K Β·β±οΈ 22.02.2021):conda install -c conda-forge holoviews
-
NPM (
π₯ 7.2K / month Β·β±οΈ 24.05.2020):npm install @pyviz/jupyterlab_pyviz
VisPy (π₯ 28 Β· β 2.6K) - High-performance interactive 2D/3D data visualization library. BSD-3

-
GitHub (
π¨βπ» 150 Β·π 540 Β·π¦ 460 Β·π 1.1K - 30% open Β·β±οΈ 07.02.2021):git clone https://github.com/vispy/vispy
-
PyPi (
π₯ 13K / month Β·π¦ 120 Β·β±οΈ 28.11.2020):pip install vispy
-
Conda (
π₯ 130K Β·β±οΈ 13.01.2021):conda install -c conda-forge vispy
-
NPM (
π₯ 78 / month Β·β±οΈ 15.03.2020):npm install vispy
datashader (π₯ 28 Β· β 2.4K) - Quickly and accurately render even the largest data. BSD-3
-
GitHub (
π¨βπ» 43 Β·π 310 Β·π¦ 590 Β·π 470 - 30% open Β·β±οΈ 17.01.2021):git clone https://github.com/holoviz/datashader
-
PyPi (
π₯ 8.9K / month Β·π¦ 70 Β·β±οΈ 07.01.2021):pip install datashader
-
Conda (
π₯ 140K Β·β±οΈ 08.01.2021):conda install -c conda-forge datashader
missingno (π₯ 27 Β· β 2.7K) - Missing data visualization module for Python. MIT
-
GitHub (
π¨βπ» 15 Β·π 350 Β·π¦ 2.9K Β·π 100 - 14% open Β·β±οΈ 28.12.2020):git clone https://github.com/ResidentMario/missingno
-
PyPi (
π₯ 110K / month Β·π¦ 150 Β·β±οΈ 09.07.2019):pip install missingno
-
Conda (
π₯ 76K Β·β±οΈ 15.02.2020):conda install -c conda-forge missingno
data-validation (π₯ 27 Β· β 520) - Library for exploring and validating machine learning.. Apache-2


Perspective (π₯ 26 Β· β 3.2K) - Streaming pivot visualization via WebAssembly. Apache-2

-
GitHub (
π¨βπ» 62 Β·π 350 Β·π¦ 180 Β·π 390 - 19% open Β·β±οΈ 21.02.2021):git clone https://github.com/finos/perspective
-
PyPi (
π₯ 460 / month Β·π¦ 8 Β·β±οΈ 24.02.2021):pip install perspective-python
-
NPM (
π₯ 2K / month Β·β±οΈ 12.02.2021):npm install @finos/perspective-jupyterlab
PyVista (π₯ 26 Β· β 700) - 3D plotting and mesh analysis through a streamlined interface for the.. MIT

-
GitHub (
π¨βπ» 55 Β·π 140 Β·π₯ 63 Β·π¦ 260 Β·π 420 - 31% open Β·β±οΈ 24.02.2021):git clone https://github.com/pyvista/pyvista
-
PyPi (
π₯ 9.8K / month Β·π¦ 26 Β·β±οΈ 04.02.2021):pip install pyvista
-
Conda (
π₯ 63K Β·β±οΈ 04.02.2021):conda install -c conda-forge pyvista
pythreejs (π₯ 26 Β· β 700 Β· π ) - A Jupyter - Three.js bridge. BSD-3

-
GitHub (
π¨βπ» 27 Β·π 160 Β·π¦ 15 Β·π 200 - 30% open Β·β±οΈ 24.02.2021):git clone https://github.com/jupyter-widgets/pythreejs
-
PyPi (
π₯ 5.4K / month Β·π¦ 26 Β·β±οΈ 09.10.2020):pip install pythreejs
-
Conda (
π₯ 270K Β·β±οΈ 12.10.2020):conda install -c conda-forge pythreejs
-
NPM (
π₯ 6.7K / month Β·π¦ 8 Β·β±οΈ 19.03.2020):npm install jupyter-threejs
Facets Overview (π₯ 25 Β· β 6.5K) - Visualizations for machine learning datasets. Apache-2

Chartify (π₯ 25 Β· β 2.8K) - Python library that makes it easy for data scientists to create.. Apache-2
HyperTools (π₯ 25 Β· β 1.6K) - A Python toolbox for gaining geometric insights into high-dimensional.. MIT
hvPlot (π₯ 25 Β· β 350) - A high-level plotting API for pandas, dask, xarray, and networkx built on.. BSD-3
Multicore-TSNE (π₯ 23 Β· β 1.5K) - Parallel t-SNE implementation with Python and Torch.. BSD-3

-
GitHub (
π¨βπ» 15 Β·π 200 Β·π¦ 210 Β·π 56 - 64% open Β·β±οΈ 19.08.2020):git clone https://github.com/DmitryUlyanov/Multicore-TSNE
-
PyPi (
π₯ 2K / month Β·π¦ 28 Β·β±οΈ 09.01.2019):pip install MulticoreTSNE
-
Conda (
π₯ 6.1K Β·β±οΈ 12.11.2018):conda install -c conda-forge multicore-tsne
python-ternary (π₯ 23 Β· β 400) - Ternary plotting library for python with matplotlib. MIT
-
GitHub (
π¨βπ» 25 Β·π 110 Β·π₯ 14 Β·π¦ 59 Β·π 100 - 24% open Β·β±οΈ 17.02.2021):git clone https://github.com/marcharper/python-ternary
-
PyPi (
π₯ 740 / month Β·π¦ 20 Β·β±οΈ 17.02.2021):pip install python-ternary
-
Conda (
π₯ 49K Β·β±οΈ 17.02.2021):conda install -c conda-forge python-ternary
D-Tale (π₯ 22 Β· β 2.1K) - Visualizer for pandas data structures. βοΈLGPL-2.1


Pandas-Bokeh (π₯ 22 Β· β 620) - Bokeh Plotting Backend for Pandas and GeoPandas. MIT

Sweetviz (π₯ 19 Β· β 1.3K) - Visualize and compare datasets, target values and associations, with one.. MIT
animatplot (π₯ 19 Β· β 360) - A python package for animating plots build on matplotlib. MIT
AutoViz (π₯ 19 Β· β 310) - Automatically Visualize any dataset, any size with a single line of.. Apache-2
data-describe (π₯ 14 Β· β 260) - datadescribe: Pythonic EDA Accelerator for Data Science. Apache-2
Show 6 hidden projects...
- plotnine (
π₯ 27 Β·β 2.6K) - A grammar of graphics for Python.βοΈGPL-2.0
- PDPbox (
π₯ 23 Β·β 530 Β·π ) - python partial dependence plot toolbox.MIT
- pivottablejs (
π₯ 19 Β·β 420 Β·π ) - Dragndrop Pivot Tables and Charts for Jupyter/IPython..MIT
- ivis (
π₯ 18 Β·β 220) - Dimensionality reduction in very large datasets using Siamese..βοΈGPL-2.0
- pdvega (
π₯ 16 Β·β 340 Β·π ) - Interactive plotting for Pandas using Vega-Lite.MIT
- nptsne (
π₯ 14 Β·β 25) - nptsne is a numpy compatible python binary package that offers a number..Apache-2
Text Data & NLP
Libraries for processing, cleaning, manipulating, and analyzing text data as well as libraries for NLP tasks such as language detection, fuzzy matching, classification, seq2seq learning, conversational AI, keyword extraction, and translation.
spaCy (π₯ 37 Β· β 20K) - Industrial-strength Natural Language Processing (NLP) in Python. MIT
-
GitHub (
π¨βπ» 570 Β·π 3.3K Β·π₯ 3K Β·π¦ 22K Β·π 4.4K - 2% open Β·β±οΈ 24.02.2021):git clone https://github.com/explosion/spaCy
-
PyPi (
π₯ 790K / month Β·π¦ 3.1K Β·β±οΈ 14.02.2021):pip install spacy
-
Conda (
π₯ 1.5M Β·β±οΈ 14.02.2021):conda install -c conda-forge spacy
transformers (π₯ 36 Β· β 41K) - Transformers: State-of-the-art Natural Language.. Apache-2


-
GitHub (
π¨βπ» 780 Β·π 10K Β·π₯ 1.3K Β·π¦ 8K Β·π 6.1K - 11% open Β·β±οΈ 25.02.2021):git clone https://github.com/huggingface/transformers
-
PyPi (
π₯ 600K / month Β·π¦ 130 Β·β±οΈ 24.02.2021):pip install transformers
-
Conda (
π₯ 23K Β·β±οΈ 09.02.2021):conda install -c conda-forge transformers
gensim (π₯ 35 Β· β 12K) - Topic Modelling for Humans. βοΈLGPL-2.1
-
GitHub (
π¨βπ» 390 Β·π 3.9K Β·π₯ 3K Β·π¦ 21K Β·π 1.6K - 21% open Β·β±οΈ 13.02.2021):git clone https://github.com/RaRe-Technologies/gensim
-
PyPi (
π₯ 4.5M / month Β·π¦ 4.7K Β·β±οΈ 15.11.2020):pip install gensim
-
Conda (
π₯ 600K Β·β±οΈ 14.05.2020):conda install -c conda-forge gensim
nltk (π₯ 34 Β· β 9.7K) - Suite of libraries and programs for symbolic and statistical natural.. Apache-2
fairseq (π₯ 31 Β· β 11K Β· π ) - Facebook AI Research Sequence-to-Sequence Toolkit written in.. MIT

ChatterBot (π₯ 31 Β· β 11K) - ChatterBot is a machine learning, conversational dialog engine for.. BSD-3
sentencepiece (π₯ 31 Β· β 4.8K) - Unsupervised text tokenizer for Neural Network-based text.. Apache-2
-
GitHub (
π¨βπ» 49 Β·π 650 Β·π₯ 11K Β·π¦ 6K Β·π 420 - 6% open Β·β±οΈ 23.02.2021):git clone https://github.com/google/sentencepiece
-
PyPi (
π₯ 790K / month Β·π¦ 240 Β·β±οΈ 10.01.2021):pip install sentencepiece
-
Conda (
π₯ 32K Β·β±οΈ 09.02.2021):conda install -c conda-forge sentencepiece
flair (π₯ 30 Β· β 10K) - A very simple framework for state-of-the-art Natural Language Processing.. MIT

snowballstemmer (π₯ 30 Β· β 470) - Snowball compiler and stemming algorithms. BSD-3
-
GitHub (
π¨βπ» 25 Β·π 130 Β·π¦ 45K Β·π 59 - 28% open Β·β±οΈ 02.02.2021):git clone https://github.com/snowballstem/snowball
-
PyPi (
π₯ 2.1M / month Β·π¦ 13K Β·β±οΈ 21.01.2021):pip install snowballstemmer
-
Conda (
π₯ 2.1M Β·β±οΈ 21.01.2021):conda install -c conda-forge snowballstemmer
TextBlob (π₯ 29 Β· β 7.6K) - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech.. MIT
-
GitHub (
π¨βπ» 34 Β·π 990 Β·π₯ 88 Β·π¦ 10K Β·π 230 - 35% open Β·β±οΈ 18.02.2021):git clone https://github.com/sloria/TextBlob
-
PyPi (
π₯ 220K / month Β·π¦ 2.5K Β·β±οΈ 24.02.2019):pip install textblob
-
Conda (
π₯ 110K Β·β±οΈ 24.02.2019):conda install -c conda-forge textblob
Rasa (π₯ 28 Β· β 11K) - Open source machine learning framework to automate text- and voice-.. Apache-2

stanza (π₯ 28 Β· β 5.2K) - Official Stanford NLP Python Library for Many Human Languages. Apache-2
Tokenizers (π₯ 28 Β· β 4.3K) - Fast State-of-the-Art Tokenizers optimized for Research and.. Apache-2
sentence-transformers (π₯ 28 Β· β 4.2K) - Sentence Embeddings with BERT & XLNet. Apache-2

Dedupe (π₯ 28 Β· β 2.9K) - A python library for accurate and scalable fuzzy matching, record.. MIT
phonenumbers (π₯ 28 Β· β 2.6K) - Python port of Google's libphonenumber. Apache-2
-
GitHub (
π¨βπ» 22 Β·π 330 Β·π 120 - 2% open Β·β±οΈ 09.02.2021):git clone https://github.com/daviddrysdale/python-phonenumbers
-
PyPi (
π₯ 600K / month Β·π¦ 2.3K Β·β±οΈ 09.02.2021):pip install phonenumbers
-
Conda (
π₯ 380K Β·β±οΈ 04.08.2019):conda install -c conda-forge phonenumbers
DeepPavlov (π₯ 26 Β· β 5K) - An open source library for deep learning end-to-end dialog.. Apache-2

GluonNLP (π₯ 26 Β· β 2.2K) - Toolkit that enables easy text preprocessing, datasets loading.. Apache-2

TextDistance (π₯ 26 Β· β 1.9K) - Compute distance between sequences. 30+ algorithms, pure python.. MIT
TensorFlow Text (π₯ 26 Β· β 700) - Making text a first-class citizen in TensorFlow. Apache-2

inflect (π₯ 26 Β· β 480 Β· π ) - Correctly generate plurals, ordinals, indefinite articles; convert.. MIT
vaderSentiment (π₯ 25 Β· β 2.8K Β· π€ ) - VADER Sentiment Analysis. VADER (Valence Aware Dictionary.. MIT
haystack (π₯ 25 Β· β 1.4K) - End-to-end Python framework for building natural language search.. Apache-2
jellyfish (π₯ 25 Β· β 1.4K) - a python library for doing approximate and phonetic matching of.. BSD-2
-
GitHub (
π¨βπ» 20 Β·π 120 Β·π¦ 2.1K Β·π 96 - 10% open Β·β±οΈ 30.12.2020):git clone https://github.com/jamesturk/jellyfish
-
PyPi (
π₯ 630K / month Β·π¦ 650 Β·β±οΈ 21.05.2020):pip install jellyfish
-
Conda (
π₯ 110K Β·β±οΈ 08.01.2021):conda install -c conda-forge jellyfish
pyahocorasick (π₯ 25 Β· β 580) - Python module (C extension and plain python) implementing Aho-.. BSD-3
-
GitHub (
π¨βπ» 20 Β·π 88 Β·π¦ 500 Β·π 98 - 32% open Β·β±οΈ 26.01.2021):git clone https://github.com/WojciechMula/pyahocorasick
-
PyPi (
π₯ 120K / month Β·π¦ 100 Β·β±οΈ 26.01.2021):pip install pyahocorasick
-
Conda (
π₯ 110K Β·β±οΈ 13.10.2020):conda install -c conda-forge pyahocorasick
Snips NLU (π₯ 24 Β· β 3.5K Β· π€ ) - Snips Python library to extract meaning from text. Apache-2
T5 (π₯ 24 Β· β 3.2K) - Code for the paper Exploring the Limits of Transfer Learning with a.. Apache-2

Sumy (π₯ 24 Β· β 2.5K) - Module for automatic summarization of text documents and HTML pages. Apache-2
fastNLP (π₯ 24 Β· β 2K) - fastNLP: A Modularized and Extensible NLP Framework. Currently still.. Apache-2
pytorch-nlp (π₯ 24 Β· β 1.9K) - Basic Utilities for PyTorch Natural Language Processing (NLP). BSD-3

PyTextRank (π₯ 24 Β· β 1.5K) - Python implementation of TextRank for phrase extraction and.. MIT
sense2vec (π₯ 24 Β· β 1.2K) - Contextually-keyed word vectors. MIT
-
GitHub (
π¨βπ» 15 Β·π 200 Β·π₯ 13K Β·π¦ 50 Β·π 94 - 15% open Β·β±οΈ 07.02.2021):git clone https://github.com/explosion/sense2vec
-
PyPi (
π₯ 2.2K / month Β·π¦ 12 Β·β±οΈ 07.02.2021):pip install sense2vec
-
Conda (
π₯ 15K Β·β±οΈ 16.03.2020):conda install -c conda-forge sense2vec
spacy-transformers (π₯ 24 Β· β 910) - Use pretrained transformers like BERT, XLNet and GPT-2.. MIT
spacy
SciSpacy (π₯ 24 Β· β 840) - A full spaCy pipeline and models for scientific/biomedical documents. Apache-2
Ciphey (π₯ 23 Β· β 6.3K) - Automatically decrypt encryptions without knowing the key or cipher,.. MIT
-
GitHub (
π¨βπ» 39 Β·π 350 Β·π 220 - 21% open Β·β±οΈ 22.02.2021):git clone https://github.com/Ciphey/Ciphey
-
PyPi (
π₯ 2.3K / month Β·β±οΈ 02.12.2020):pip install ciphey
-
Docker Hub (
π₯ 8.1K Β·β 2 Β·β±οΈ 14.02.2021):docker pull remnux/ciphey
flashtext (π₯ 23 Β· β 4.7K Β· π€ ) - Extract Keywords from sentence or Replace keywords in sentences. MIT
textgenrnn (π₯ 23 Β· β 4.3K Β· π€ ) - Easily train your own text-generating neural network of any.. MIT

neuralcoref (π₯ 23 Β· β 2.2K) - Fast Coreference Resolution in spaCy with Neural Networks. MIT
-
GitHub (
π¨βπ» 20 Β·π 380 Β·π₯ 170 Β·π¦ 290 Β·π 260 - 17% open Β·β±οΈ 24.02.2021):git clone https://github.com/huggingface/neuralcoref
-
PyPi (
π₯ 2.4K / month Β·π¦ 18 Β·β±οΈ 08.04.2019):pip install neuralcoref
-
Conda (
π₯ 6.3K Β·β±οΈ 21.02.2020):conda install -c conda-forge neuralcoref
pySBD (π₯ 23 Β· β 280) - pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence.. MIT
scattertext (π₯ 22 Β· β 1.5K) - Beautiful visualizations of how language differs among document.. Apache-2
-
GitHub (
π¨βπ» 10 Β·π 200 Β·π¦ 150 Β·π 74 - 22% open Β·β±οΈ 08.02.2021):git clone https://github.com/JasonKessler/scattertext
-
PyPi (
π₯ 1.4K / month Β·π¦ 16 Β·β±οΈ 18.01.2021):pip install scattertext
-
Conda (
π₯ 45K Β·β±οΈ 18.01.2021):conda install -c conda-forge scattertext
FARM (π₯ 22 Β· β 1.1K) - Fast & easy transfer learning for NLP. Harvesting language models.. Apache-2

DeepMatcher (π₯ 21 Β· β 3.5K Β· π€ ) - Python package for performing Entity and Text Matching using.. BSD-3
NLP Architect (π₯ 21 Β· β 2.6K) - A model library for exploring state-of-the-art deep learning.. Apache-2
gpt-2-simple (π₯ 21 Β· β 2.5K) - Python package to easily retrain OpenAI's GPT-2 text-.. MIT

Texar (π₯ 21 Β· β 2.1K Β· π€ ) - Toolkit for Machine Learning, Natural Language Processing, and.. Apache-2

Texthero (π₯ 20 Β· β 2.1K) - Text preprocessing, representation and visualization from zero to hero. MIT
DELTA (π₯ 20 Β· β 1.4K) - DELTA is a deep learning based natural language and speech.. Apache-2

-
GitHub (
π¨βπ» 41 Β·π 280 Β·π 77 - 11% open Β·β±οΈ 17.12.2020):git clone https://github.com/Delta-ML/delta
-
PyPi (
π₯ 3 / month Β·β±οΈ 27.03.2020):pip install delta-nlp
-
Docker Hub (
π₯ 12K Β·β±οΈ 24.02.2021):docker pull zh794390558/delta
Sockeye (π₯ 20 Β· β 990) - Sequence-to-sequence framework with a focus on Neural Machine.. Apache-2

YouTokenToMe (π₯ 20 Β· β 720) - Unsupervised text tokenizer focused on computational efficiency. MIT
Kashgari (π₯ 19 Β· β 2K) - Kashgari is a production-level NLP Transfer learning framework.. Apache-2

VizSeq (π₯ 15 Β· β 310) - An Analysis Toolkit for Natural Language Generation (Translation,.. MIT
OpenNRE (π₯ 14 Β· β 3K) - An Open-Source Package for Neural Relation Extraction (NRE). MIT
-
GitHub (
π¨βπ» 9 Β·π 860 Β·π 310 - 6% open Β·β±οΈ 24.11.2020):git clone https://github.com/thunlp/OpenNRE
TransferNLP (π₯ 14 Β· β 290 Β· π€ ) - NLP library designed for reproducible experimentation.. MIT

NeuralQA (π₯ 14 Β· β 180) - NeuralQA: A Usable Library for Question Answering on Large Datasets with.. MIT
textvec (π₯ 14 Β· β 160) - Text vectorization tool to outperform TFIDF for classification tasks. MIT

Show 9 hidden projects...
- fuzzywuzzy (
π₯ 29 Β·β 7.9K) - Fuzzy String Matching in Python.βοΈGPL-2.0
- langid (
π₯ 26 Β·β 1.7K Β·π ) - Stand-alone language identification system.BSD-3
- polyglot (
π₯ 24 Β·β 1.8K) - Multilingual text (NLP) processing toolkit.βοΈGPL-3.0
- anaGo (
π₯ 22 Β·β 1.4K Β·π ) - Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition,..MIT
- stop-words (
π₯ 21 Β·β 120 Β·π ) - Get list of common stop words in various languages in Python.BSD-3
- MatchZoo (
π₯ 20 Β·β 3.4K Β·π ) - Facilitating the design, comparison and sharing of deep..Apache-2
- pyfasttext (
π₯ 19 Β·β 230 Β·π ) - Yet another Python binding for fastText.βοΈGPL-3.0
- NeuroNER (
π₯ 17 Β·β 1.5K Β·π ) - Named-entity recognition using neural networks. Easy-to-use and..MIT
- ONNX-T5 (
π₯ 14 Β·β 140 Β·π£ ) - Summarization, translation, sentiment-analysis, text-generation..Apache-2
Image Data
Libraries for image & video processing, manipulation, and augmentation as well as libraries for computer vision tasks such as facial recognition, object detection, and classification.
Pillow (π₯ 39 Β· β 8.2K) - The friendly PIL fork (Python Imaging Library). βοΈPIL
-
GitHub (
π¨βπ» 350 Β·π 1.6K Β·π¦ 410K Β·π 2.1K - 11% open Β·β±οΈ 24.02.2021):git clone https://github.com/python-pillow/Pillow
-
PyPi (
π₯ 10M / month Β·π¦ 110K Β·β±οΈ 02.01.2021):pip install Pillow
-
Conda (
π₯ 7.3M Β·β±οΈ 08.02.2021):conda install -c conda-forge pillow
torchvision (π₯ 36 Β· β 8.4K Β· π ) - Datasets, Transforms and Models specific to Computer.. BSD-3

-
GitHub (
π¨βπ» 360 Β·π 4.4K Β·π¦ 42K Β·π 1.6K - 28% open Β·β±οΈ 25.02.2021):git clone https://github.com/pytorch/vision
-
PyPi (
π₯ 450K / month Β·π¦ 4.6K Β·β±οΈ 10.12.2020):pip install torchvision
-
Conda (
π₯ 34K Β·β±οΈ 14.10.2018):conda install -c conda-forge torchvision
scikit-image (π₯ 32 Β· β 4.2K Β· π ) - Image processing in Python. BSD-2
-
GitHub (
π¨βπ» 480 Β·π 1.8K Β·π¦ 62K Β·π 2.2K - 31% open Β·β±οΈ 23.02.2021):git clone https://github.com/scikit-image/scikit-image
-
PyPi (
π₯ 1.4M / month Β·π¦ 15K Β·β±οΈ 23.12.2020):pip install scikit-image
-
Conda (
π₯ 2.2M Β·β±οΈ 21.01.2021):conda install -c conda-forge scikit-image
opencv-python (π₯ 30 Β· β 1.8K) - Automated CI toolchain to produce precompiled opencv-python,.. MIT
Face Recognition (π₯ 29 Β· β 39K) - The world's simplest facial recognition api for Python.. MIT

Albumentations (π₯ 28 Β· β 7.4K) - Fast image augmentation library and easy to use wrapper.. MIT

-
GitHub (
π¨βπ» 74 Β·π 950 Β·π¦ 2.8K Β·π 420 - 43% open Β·β±οΈ 24.02.2021):git clone https://github.com/albumentations-team/albumentations
-
PyPi (
π₯ 49K / month Β·π¦ 130 Β·β±οΈ 29.11.2020):pip install albumentations
-
Conda (
π₯ 15K Β·β±οΈ 29.11.2020):conda install -c conda-forge albumentations
Kornia (π₯ 28 Β· β 3.6K) - Open Source Differentiable Computer Vision Library for PyTorch. Apache-2

imutils (π₯ 28 Β· β 3.5K) - A series of convenience functions to make basic image processing.. MIT
ImageHash (π₯ 28 Β· β 1.8K) - A Python Perceptual Image Hashing Module. BSD-2
-
GitHub (
π¨βπ» 17 Β·π 250 Β·π¦ 2.1K Β·π 87 - 19% open Β·β±οΈ 03.01.2021):git clone https://github.com/JohannesBuchner/imagehash
-
PyPi (
π₯ 230K / month Β·π¦ 530 Β·β±οΈ 19.11.2020):pip install ImageHash
-
Conda (
π₯ 100K Β·β±οΈ 19.11.2020):conda install -c conda-forge imagehash
imageai (π₯ 27 Β· β 5.9K) - A python library built to empower developers to build applications and.. MIT
detectron2 (π₯ 26 Β· β 15K) - Detectron2 is FAIR's next-generation platform for object.. Apache-2

InsightFace (π₯ 26 Β· β 8.6K) - Face Analysis Project on MXNet. MIT

PyTorch Image Models (π₯ 26 Β· β 7.5K) - PyTorch image models, scripts, pretrained weights --.. Apache-2

-
GitHub (
π¨βπ» 29 Β·π 1.1K Β·π₯ 250K Β·π¦ 330 Β·π 260 - 10% open Β·β±οΈ 19.02.2021):git clone https://github.com/rwightman/pytorch-image-models
MMDetection (π₯ 25 Β· β 14K) - OpenMMLab Detection Toolbox and Benchmark. Apache-2

-
GitHub (
π¨βπ» 190 Β·π 4.7K Β·π¦ 27 Β·π 3.4K - 9% open Β·β±οΈ 25.02.2021):git clone https://github.com/open-mmlab/mmdetection
Augmentor (π₯ 24 Β· β 4.3K Β· π€ ) - Image augmentation library in Python for machine learning. MIT
facenet-pytorch (π₯ 24 Β· β 1.9K) - Pretrained Pytorch face detection (MTCNN) and recognition.. MIT

mtcnn (π₯ 24 Β· β 1.4K) - MTCNN face detection implementation for TensorFlow, as a PIP package. MIT

Face Alignment (π₯ 23 Β· β 4.7K) - 2D and 3D Face alignment library build using pytorch. BSD-3

segmentation_models (π₯ 23 Β· β 3K Β· π€ ) - Segmentation models with pretrained backbones. Keras.. MIT

CellProfiler (π₯ 23 Β· β 540) - An open-source application for biological image analysis. BSD-3
Caer (π₯ 23 Β· β 440 Β· π£ ) - A lightweight Computer Vision library. Scale your models, not boilerplate. MIT
Image Deduplicator (π₯ 22 Β· β 3.4K) - Finding duplicate images made easy!. Apache-2

vidgear (π₯ 22 Β· β 1.6K) - High-performance cross-platform Video Processing Python framework.. Apache-2
Image Super-Resolution (π₯ 21 Β· β 2.6K) - Super-scale your images and run experiments with.. Apache-2

-
GitHub (
π¨βπ» 9 Β·π 480 Β·π¦ 41 Β·π 160 - 36% open Β·β±οΈ 11.11.2020):git clone https://github.com/idealo/image-super-resolution
-
PyPi (
π₯ 1.8K / month Β·π¦ 8 Β·β±οΈ 08.01.2020):pip install ISR
-
Docker Hub (
π₯ 130 Β·β±οΈ 01.04.2019):docker pull idealo/image-super-resolution-gpu
tensorflow-graphics (π₯ 21 Β· β 2.4K) - TensorFlow Graphics: Differentiable Graphics Layers.. Apache-2

Classy Vision (π₯ 21 Β· β 1.2K) - An end-to-end PyTorch framework for image and video.. MIT

MMF (π₯ 20 Β· β 4.2K) - A modular framework for vision & language multimodal research from.. BSD-3

image-match (π₯ 20 Β· β 2.5K) - Quickly search over billions of images. Apache-2
Torch Points 3D (π₯ 20 Β· β 1.1K) - Pytorch framework for doing deep learning on point clouds. BSD-3

vit-pytorch (π₯ 18 Β· β 2.7K Β· π£ ) - Implementation of Vision Transformer, a simple way to.. MIT

Norfair (π₯ 18 Β· β 900) - Lightweight Python library for adding real-time 2D object tracking to.. BSD-3
PaddleDetection (π₯ 17 Β· β 2.3K) - Object detection and instance segmentation toolkit.. Apache-2

-
GitHub (
π¨βπ» 48 Β·π 660 Β·π 1.2K - 28% open Β·β±οΈ 25.02.2021):git clone https://github.com/PaddlePaddle/PaddleDetection
pycls (π₯ 15 Β· β 1.4K) - Codebase for Image Classification Research, written in PyTorch. MIT

-
GitHub (
π¨βπ» 9 Β·π 160 Β·π¦ 3 Β·π 55 - 20% open Β·β±οΈ 14.01.2021):git clone https://github.com/facebookresearch/pycls
DEβ«ΆTR (π₯ 14 Β· β 6.2K) - End-to-End Object Detection with Transformers. Apache-2

-
GitHub (
π¨βπ» 19 Β·π 940 Β·π 280 - 22% open Β·β±οΈ 15.11.2020):git clone https://github.com/facebookresearch/detr
PySlowFast (π₯ 14 Β· β 3.4K) - PySlowFast: video understanding codebase from FAIR for.. Apache-2

-
GitHub (
π¨βπ» 19 Β·π 630 Β·π¦ 2 Β·π 350 - 47% open Β·β±οΈ 25.02.2021):git clone https://github.com/facebookresearch/SlowFast
Show 4 hidden projects...
- glfw (
π₯ 29 Β·β 7.3K) - A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input.βοΈZlib
- chainercv (
π₯ 25 Β·β 1.4K Β·π ) - ChainerCV: a Library for Deep Learning in Computer Vision.MIT
- Pillow-SIMD (
π₯ 23 Β·β 1.6K Β·π€ ) - The friendly PIL fork.βοΈPIL
- Luminoth (
π₯ 21 Β·β 2.3K Β·π ) - Deep Learning toolkit for Computer Vision.BSD-3
Graph Data
Libraries for graph processing, clustering, embedding, and machine learning tasks.
networkx (π₯ 36 Β· β 8.7K) - Network Analysis in Python. BSD-3
-
GitHub (
π¨βπ» 500 Β·π 2.2K Β·π₯ 51 Β·π¦ 68K Β·π 2.6K - 10% open Β·β±οΈ 24.02.2021):git clone https://github.com/networkx/networkx
-
PyPi (
π₯ 4.3M / month Β·π¦ 21K Β·β±οΈ 22.08.2020):pip install networkx
-
Conda (
π₯ 3.1M Β·β±οΈ 23.08.2020):conda install -c conda-forge networkx
PyTorch Geometric (π₯ 28 Β· β 10K) - Geometric Deep Learning Extension Library for PyTorch. MIT

dgl (π₯ 27 Β· β 6.7K) - Python package built to ease deep learning on graph, on top of existing.. Apache-2
StellarGraph (π₯ 25 Β· β 1.8K) - StellarGraph - Machine Learning on Graphs. Apache-2

ogb (π₯ 22 Β· β 750) - Benchmark datasets, data loaders, and evaluators for graph machine learning. MIT
torch-cluster (π₯ 21 Β· β 340) - PyTorch Extension Library of Optimized Graph Cluster.. MIT

AmpliGraph (π₯ 20 Β· β 1.4K) - Python library for Representation Learning on Knowledge.. Apache-2

graph-nets (π₯ 19 Β· β 4.8K) - Build Graph Nets in Tensorflow. Apache-2

PyTorch-BigGraph (π₯ 19 Β· β 2.7K) - Generate embeddings from large-scale graph-structured.. BSD-3

PyKEEN (π₯ 19 Β· β 320) - A Python library for learning and evaluating knowledge graph embeddings. MIT
Paddle Graph Learning (π₯ 18 Β· β 910) - Paddle Graph Learning (PGL) is an efficient and.. Apache-2

pytorch_geometric_temporal (π₯ 16 Β· β 360) - A Temporal Extension Library for PyTorch Geometric. MIT

GraphEmbedding (π₯ 15 Β· β 1.8K) - Implementation and experiments of graph embedding algorithms. MIT

-
GitHub (
π¨βπ» 8 Β·π 560 Β·π¦ 7 Β·π 40 - 67% open Β·β±οΈ 18.10.2020):git clone https://github.com/shenweichen/GraphEmbedding
AutoGL (π₯ 14 Β· β 580 Β· π£ ) - An autoML framework & toolkit for machine learning on graphs. MIT

OpenKE (π₯ 13 Β· β 2.4K Β· π€ ) - An Open-Source Package for Knowledge Embedding (KE). MIT
-
GitHub (
π¨βπ» 10 Β·π 740 Β·π 280 - 19% open Β·β±οΈ 08.04.2020):git clone https://github.com/thunlp/OpenKE
GraphVite (π₯ 13 Β· β 850) - GraphVite: A General and High-performance Graph Embedding System. Apache-2
Show 8 hidden projects...
- igraph (
π₯ 27 Β·β 780) - Python interface for igraph.βοΈGPL-2.0
- pygal (
π₯ 26 Β·β 2.3K) - PYthon svg GrAph plotting Library.βοΈLGPL-3.0
- Karate Club (
π₯ 21 Β·β 1.2K) - Karate Club: An API Oriented Open-source Python Framework for..βοΈGPL-3.0
- DeepWalk (
π₯ 19 Β·β 2.2K Β·π€ ) - DeepWalk - Deep Learning for Graphs.βοΈGPL-3.0
- Sematch (
π₯ 16 Β·β 340 Β·π ) - semantic similarity framework for knowledge graph.Apache-2
- pyRDF2Vec (
π₯ 15 Β·β 85) - Python Implementation and Extension of RDF2Vec.MIT
- GraphSAGE (
π₯ 14 Β·β 2.2K Β·π ) - Representation learning on large graphs using stochastic..MIT
- OpenNE (
π₯ 14 Β·β 1.4K Β·π ) - An Open-Source Package for Network Embedding (NE).MIT
Audio Data
Libraries for audio analysis, manipulation, transformation, and extraction, as well as speech recognition and music generation tasks.
DeepSpeech (π₯ 31 Β· β 17K Β· π ) - DeepSpeech is an open source embedded (offline, on-.. MPL-2.0

torchaudio (π₯ 28 Β· β 1.2K) - Data manipulation and transformation for audio signal.. BSD-2

audioread (π₯ 27 Β· β 360) - cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.. MIT
pyAudioAnalysis (π₯ 26 Β· β 3.8K) - Python Audio Analysis Library: Feature Extraction,.. Apache-2
python-soundfile (π₯ 25 Β· β 370) - SoundFile is an audio library based on libsndfile, CFFI, and.. BSD-3
python_speech_features (π₯ 24 Β· β 1.8K) - This library provides common speech features for ASR.. MIT
tinytag (π₯ 22 Β· β 440) - Read music meta data and length of MP3, OGG, OPUS, MP4, M4A, FLAC, WMA and.. MIT
TTS (π₯ 20 Β· β 3.2K) - Deep learning for Text to Speech (Discussion forum:.. MPL-2.0
-
GitHub (
π¨βπ» 51 Β·π 670 Β·π₯ 130 Β·π 470 - 6% open Β·β±οΈ 12.02.2021):git clone https://github.com/mozilla/TTS
Show 4 hidden projects...
- SpeechRecognition (
π₯ 30 Β·β 5.4K Β·π ) - Speech recognition module for Python, supporting..BSD-3
- aubio (
π₯ 26 Β·β 2.1K) - a library for audio and music analysis.βοΈGPL-3.0
- Essentia (
π₯ 22 Β·β 1.7K) - C++ library for audio and music analysis, description and..βοΈAGPL-3.0
- Madmom (
π₯ 20 Β·β 720 Β·π ) - Python audio and music signal processing library.BSD-3
Geospatial Data
Libraries to load, process, analyze, and write geographic data as well as libraries for spatial analysis, map visualization, and geocoding.
pydeck (π₯ 33 Β· β 8.5K) - WebGL2 powered geospatial visualization layers. MIT

-
GitHub (
π¨βπ» 160 Β·π 1.5K Β·π¦ 1.4K Β·π 2.1K - 4% open Β·β±οΈ 24.02.2021):git clone https://github.com/visgl/deck.gl
-
PyPi (
π₯ 81K / month Β·π¦ 2 Β·β±οΈ 12.02.2021):pip install pydeck
-
Conda (
π₯ 16K Β·β±οΈ 12.02.2021):conda install -c conda-forge pydeck
-
NPM (
π₯ 190K / month Β·π¦ 560 Β·β±οΈ 23.02.2021):npm install deck.gl
folium (π₯ 32 Β· β 5.2K) - Python Data. Leaflet.js Maps. MIT
-
GitHub (
π¨βπ» 120 Β·π 1.9K Β·π¦ 8.7K Β·π 840 - 17% open Β·β±οΈ 18.01.2021):git clone https://github.com/python-visualization/folium
-
PyPi (
π₯ 150K / month Β·π¦ 970 Β·β±οΈ 18.01.2021):pip install folium
-
Conda (
π₯ 330K Β·β±οΈ 06.01.2021):conda install -c conda-forge folium
GeoPandas (π₯ 31 Β· β 2.5K) - Python tools for geographic data. BSD-3

-
GitHub (
π¨βπ» 130 Β·π 530 Β·π₯ 900 Β·π¦ 7.3K Β·π 980 - 29% open Β·β±οΈ 23.02.2021):git clone https://github.com/geopandas/geopandas
-
PyPi (
π₯ 380K / month Β·π¦ 1.2K Β·β±οΈ 25.01.2021):pip install geopandas
-
Conda (
π₯ 870K Β·β±οΈ 25.01.2021):conda install -c conda-forge geopandas
Rasterio (π₯ 30 Β· β 1.4K) - Rasterio reads and writes geospatial raster datasets. BSD-3
-
GitHub (
π¨βπ» 110 Β·π 390 Β·π₯ 700 Β·π¦ 2.8K Β·π 1.3K - 10% open Β·β±οΈ 18.02.2021):git clone https://github.com/mapbox/rasterio
-
PyPi (
π₯ 150K / month Β·π¦ 850 Β·β±οΈ 25.01.2021):pip install rasterio
-
Conda (
π₯ 900K Β·β±οΈ 25.01.2021):conda install -c conda-forge rasterio
pyproj (π₯ 29 Β· β 580) - Python interface to PROJ (cartographic projections and coordinate.. MIT
ipyleaflet (π₯ 27 Β· β 1.1K) - A Jupyter - Leaflet.js bridge. MIT

-
GitHub (
π¨βπ» 63 Β·π 280 Β·π¦ 700 Β·π 390 - 34% open Β·β±οΈ 11.02.2021):git clone https://github.com/jupyter-widgets/ipyleaflet
-
PyPi (
π₯ 13K / month Β·π¦ 98 Β·β±οΈ 05.01.2021):pip install ipyleaflet
-
Conda (
π₯ 610K Β·β±οΈ 16.01.2021):conda install -c conda-forge ipyleaflet
-
NPM (
π₯ 160K / month Β·π¦ 2 Β·β±οΈ 05.01.2021):npm install jupyter-leaflet
ArcGIS API (π₯ 25 Β· β 970) - Documentation and samples for ArcGIS API for Python. Apache-2
-
GitHub (
π¨βπ» 61 Β·π 710 Β·π 320 - 36% open Β·β±οΈ 26.01.2021):git clone https://github.com/Esri/arcgis-python-api
-
PyPi (
π₯ 12K / month Β·π¦ 20 Β·β±οΈ 27.01.2021):pip install arcgis
-
Docker Hub (
π₯ 4.1K Β·β 32 Β·β±οΈ 06.03.2020):docker pull esridocker/arcgis-api-python-notebook
EarthPy (π₯ 20 Β· β 230) - A package built to support working with spatial data using open source.. BSD-3
pymap3d (π₯ 19 Β· β 180) - pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef.. BSD-2
Show 7 hidden projects...
- Geocoder (
π₯ 30 Β·β 1.3K Β·π ) - Python Geocoder.MIT
- Cartopy (
π₯ 27 Β·β 1.4K) - Rasterio reads and writes geospatial raster datasets.βοΈLGPL-3.0
- Satpy (
π₯ 25 Β·β 680) - Python package for earth-observing satellite data processing.βοΈGPL-3.0
- gmaps (
π₯ 22 Β·β 700 Β·π ) - Google maps for Jupyter notebooks.BSD-3
- Sentinelsat (
π₯ 22 Β·β 570) - Search and download Copernicus Sentinel satellite images.βοΈGPL-3.0
- Mapbox GL (
π₯ 20 Β·β 560 Β·π ) - Use Mapbox GL JS to visualize data in a Python Jupyter notebook.MIT
- geoplotlib (
π₯ 19 Β·β 900 Β·π ) - python toolbox for visualizing geographical data and making maps.MIT
Financial Data
Libraries for algorithmic stock/crypto trading, risk analytics, backtesting, technical analysis, and other tasks on financial data.
yfinance (π₯ 30 Β· β 4.4K) - Yahoo! Finance market data downloader (+faster Pandas Datareader). Apache-2
Alpha Vantage (π₯ 27 Β· β 3.2K) - A python wrapper for Alpha Vantage API for financial data. MIT
empyrical (π₯ 25 Β· β 730) - Common financial risk and performance metrics. Used by zipline and.. Apache-2
-
GitHub (
π¨βπ» 22 Β·π 230 Β·π¦ 530 Β·π 53 - 50% open Β·β±οΈ 14.10.2020):git clone https://github.com/quantopian/empyrical
-
PyPi (
π₯ 17K / month Β·π¦ 220 Β·β±οΈ 13.10.2020):pip install empyrical
-
Conda (
π₯ 9.7K Β·β±οΈ 14.10.2020):conda install -c conda-forge empyrical
Alphalens (π₯ 24 Β· β 1.8K Β· π€ ) - Performance analysis of predictive (alpha) stock factors. Apache-2
-
GitHub (
π¨βπ» 25 Β·π 660 Β·π¦ 350 Β·π 180 - 20% open Β·β±οΈ 27.04.2020):git clone https://github.com/quantopian/alphalens
-
PyPi (
π₯ 1.9K / month Β·π¦ 14 Β·β±οΈ 27.04.2020):pip install alphalens
-
Conda (
π₯ 11K Β·β±οΈ 16.05.2020):conda install -c conda-forge alphalens
stockstats (π₯ 24 Β· β 720) - Supply a wrapper ``StockDataFrame`` based on the.. BSD-3
Enigma Catalyst (π₯ 23 Β· β 2K) - An Algorithmic Trading Library for Crypto-Assets in Python. Apache-2
TensorTrade (π₯ 21 Β· β 3K) - An open source reinforcement learning framework for training,.. Apache-2
Qlib (π₯ 20 Β· β 4.4K Β· π£ ) - Qlib is an AI-oriented quantitative investment platform, which aims.. MIT

finmarketpy (π₯ 20 Β· β 2.5K) - Python library for backtesting trading strategies & analyzing.. Apache-2
tf-quant-finance (π₯ 19 Β· β 2.5K) - High-performance TensorFlow library for quantitative.. Apache-2

Crypto Signals (π₯ 18 Β· β 2.7K) - Github.com/CryptoSignal - #1 Quant Trading & Technical Analysis.. MIT
-
GitHub (
π¨βπ» 25 Β·π 720 Β·π 230 - 17% open Β·β±οΈ 03.09.2020):git clone https://github.com/CryptoSignal/crypto-signal
-
Docker Hub (
π₯ 42K Β·β 8 Β·β±οΈ 03.09.2020):docker pull shadowreaver/crypto-signal
Show 6 hidden projects...
- backtrader (
π₯ 26 Β·β 5.8K) - Python Backtesting library for trading strategies.βοΈGPL-3.0
- PyAlgoTrade (
π₯ 23 Β·β 3.2K Β·π ) - Python Algorithmic Trading Library.Apache-2
- arch (
π₯ 23 Β·β 650 Β·π ) - ARCH models in Python.βοΈNCSA
- FinTA (
π₯ 22 Β·β 860) - Common financial technical indicators implemented in Pandas.βοΈLGPL-3.0
- Backtesting.py (
π₯ 17 Β·β 1K) - Backtest trading strategies in Python.βοΈAGPL-3.0
- surpriver (
π₯ 12 Β·β 1.1K Β·π£ ) - Find big moving stocks before they move using machine..βοΈGPL-3.0
Time Series Data
Libraries for forecasting, anomaly detection, feature extraction, and machine learning on time-series and sequential data.
Prophet (π₯ 29 Β· β 12K) - Tool for producing high quality forecasts for time series data that has.. MIT
pmdarima (π₯ 26 Β· β 820) - A statistical library designed to fill the void in Python's time series.. MIT
Darts (π₯ 22 Β· β 740) - A python library for easy manipulation and forecasting of time series. Apache-2
-
GitHub (
π¨βπ» 23 Β·π 96 Β·π¦ 12 Β·π 68 - 30% open Β·β±οΈ 03.02.2021):git clone https://github.com/unit8co/darts
-
PyPi (
π₯ 4.3K / month Β·β±οΈ 03.02.2021):pip install u8darts
-
Docker Hub (
π₯ 100 Β·β±οΈ 03.02.2021):docker pull unit8/darts
STUMPY (π₯ 20 Β· β 1.7K) - STUMPY is a powerful and scalable Python library for computing a Matrix.. BSD-3
pytorch-forecasting (π₯ 19 Β· β 800) - Time series forecasting with PyTorch. MIT
matrixprofile-ts (π₯ 19 Β· β 610 Β· π€ ) - A Python library for detecting patterns and anomalies.. Apache-2
Auto TS (π₯ 18 Β· β 180) - Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost.. Apache-2
ADTK (π₯ 17 Β· β 600 Β· π€ ) - A Python toolkit for rule-based/unsupervised anomaly detection in time.. MPL-2.0
tick (π₯ 17 Β· β 320 Β· π€ ) - Module for statistical learning, with a particular emphasis on time-.. BSD-3
Show 3 hidden projects...
Medical Data
Libraries for processing and analyzing medical data such as MRIs, EEGs, genomic data, and other medical imaging formats.
Lifelines (π₯ 29 Β· β 1.6K) - Survival analysis in Python. MIT
-
GitHub (
π¨βπ» 92 Β·π 410 Β·π¦ 510 Β·π 780 - 24% open Β·β±οΈ 11.02.2021):git clone https://github.com/CamDavidsonPilon/lifelines
-
PyPi (
π₯ 93K / month Β·π¦ 130 Β·β±οΈ 05.02.2021):pip install lifelines
-
Conda (
π₯ 130K Β·β±οΈ 08.02.2021):conda install -c conda-forge lifelines
NiBabel (π₯ 29 Β· β 390) - Python package to access a cacophony of neuro-imaging file formats. MIT
MNE (π₯ 27 Β· β 1.5K) - MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python. BSD-3
DIPY (π₯ 27 Β· β 390) - DIPY is the paragon 3D/4D+ imaging library in Python. Contains generic.. BSD-3
DeepVariant (π₯ 21 Β· β 2.2K) - DeepVariant is an analysis pipeline that uses a deep neural.. BSD-3

NiftyNet (π₯ 21 Β· β 1.3K Β· π€ ) - [unmaintained] An open-source convolutional neural.. Apache-2

Brainiak (π₯ 19 Β· β 230) - Brain Imaging Analysis Kit. Apache-2
-
GitHub (
π¨βπ» 32 Β·π 110 Β·π¦ 12 Β·π 180 - 34% open Β·β±οΈ 19.02.2021):git clone https://github.com/brainiak/brainiak
-
PyPi (
π₯ 120 / month Β·π¦ 1 Β·β±οΈ 15.10.2020):pip install brainiak
-
Docker Hub (
π₯ 490 Β·β 1 Β·β±οΈ 15.10.2020):docker pull brainiak/brainiak
Medical Detection Toolkit (π₯ 12 Β· β 900 Β· π€ ) - The Medical Detection Toolkit contains 2D + 3D.. Apache-2

-
GitHub (
π¨βπ» 3 Β·π 230 Β·π 110 - 24% open Β·β±οΈ 18.04.2020):git clone https://github.com/MIC-DKFZ/medicaldetectiontoolkit
MedicalNet (π₯ 11 Β· β 1.1K) - Many studies have shown that the performance on deep learning is.. MIT
-
GitHub (
π¨βπ» 1 Β·π 280 Β·π 57 - 75% open Β·β±οΈ 27.08.2020):git clone https://github.com/Tencent/MedicalNet
Show 4 hidden projects...
- DLTK (
π₯ 20 Β·β 1.2K Β·π ) - Deep Learning Toolkit for Medical Image Analysis.Apache-2
- MedPy (
π₯ 20 Β·β 310 Β·π€ ) - Medical image processing in Python.βοΈGPL-3.0
- MedicalTorch (
π₯ 15 Β·β 710 Β·π ) - A medical imaging framework for Pytorch.Apache-2
- DeepNeuro (
π₯ 14 Β·β 98 Β·π€ ) - A deep learning python package for neuroimaging data. Made by:.MIT
Optical Character Recognition
Libraries for optical character recognition (OCR) and text extraction from images or videos.
Tesseract (π₯ 30 Β· β 3.4K) - Python-tesseract is an optical character recognition (OCR) tool.. Apache-2
EasyOCR (π₯ 27 Β· β 11K) - Ready-to-use OCR with 80+ supported languages and all popular writing.. Apache-2
OCRmyPDF (π₯ 26 Β· β 3.9K) - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them.. MPL-2.0
attention-ocr (π₯ 21 Β· β 840) - A Tensorflow model for text recognition (CNN + seq2seq with.. MIT

keras-ocr (π₯ 20 Β· β 770) - A packaged and flexible version of the CRAFT text detector and.. MIT

doc2text (π₯ 19 Β· β 1.2K) - Detect text blocks and OCR poorly scanned PDFs in bulk. Python module.. MIT
Mozart (π₯ 10 Β· β 230 Β· π£ ) - An optical music recognition (OMR) system. Converts sheet.. Apache-2

-
GitHub (
π¨βπ» 5 Β·π 36 Β·π 2 - 50% open Β·β±οΈ 14.01.2021):git clone https://github.com/aashrafh/Mozart
Show 1 hidden projects...
- pdftabextract (
π₯ 20 Β·β 1.9K Β·π ) - A set of tools for extracting tables from PDF files..Apache-2
Data Containers & Structures
General-purpose data containers & structures as well as utilities & extensions for pandas.
numpy (π₯ 42 Β· β 16K) - The fundamental package for scientific computing with Python. BSD-3
-
GitHub (
π¨βπ» 1.3K Β·π 5.3K Β·π₯ 310K Β·π¦ 630K Β·π 9.6K - 23% open Β·β±οΈ 25.02.2021):git clone https://github.com/numpy/numpy
-
PyPi (
π₯ 28M / month Β·π¦ 170K Β·β±οΈ 07.02.2021):pip install numpy
-
Conda (
π₯ 17M Β·β±οΈ 10.02.2021):conda install -c conda-forge numpy
h5py (π₯ 36 Β· β 1.5K) - HDF5 for Python -- The h5py package is a Pythonic interface to the HDF5.. BSD-3
Arrow (π₯ 35 Β· β 7.2K) - Apache Arrow is a cross-language development platform for in-memory.. Apache-2
numexpr (π₯ 31 Β· β 1.5K) - Fast numerical array expression evaluator for Python, NumPy, PyTables,.. MIT
TinyDB (π₯ 29 Β· β 4.1K) - TinyDB is a lightweight document oriented database optimized for your.. MIT
Koalas (π₯ 29 Β· β 2.7K) - Koalas: pandas API on Apache Spark. Apache-2


-
GitHub (
π¨βπ» 47 Β·π 300 Β·π₯ 1K Β·π¦ 76 Β·π 520 - 17% open Β·β±οΈ 23.02.2021):git clone https://github.com/databricks/koalas
-
PyPi (
π₯ 510K / month Β·π¦ 2 Β·β±οΈ 22.01.2021):pip install koalas
-
Conda (
π₯ 78K Β·β±οΈ 22.01.2021):conda install -c conda-forge koalas
Bottleneck (π₯ 29 Β· β 580) - Fast NumPy array functions written in C. BSD-2
-
GitHub (
π¨βπ» 21 Β·π 65 Β·π¦ 19K Β·π 200 - 12% open Β·β±οΈ 24.01.2021):git clone https://github.com/pydata/bottleneck
-
PyPi (
π₯ 140K / month Β·π¦ 2.9K Β·β±οΈ 21.02.2020):pip install Bottleneck
-
Conda (
π₯ 1.4M Β·β±οΈ 21.01.2021):conda install -c conda-forge bottleneck
Modin (π₯ 28 Β· β 5.7K) - Modin: Speed up your Pandas workflows by changing a single line of.. Apache-2

datasketch (π₯ 27 Β· β 1.4K) - MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog,.. MIT
zarr (π₯ 26 Β· β 640) - An implementation of chunked, compressed, N-dimensional arrays for Python. MIT
Arctic (π₯ 24 Β· β 2.2K) - Arctic is a high performance datastore for numeric data. βοΈLGPL-2.1
-
GitHub (
π¨βπ» 71 Β·π 440 Β·π₯ 94 Β·π¦ 110 Β·π 510 - 17% open Β·β±οΈ 26.01.2021):git clone https://github.com/man-group/arctic
-
PyPi (
π₯ 1.7K / month Β·π¦ 42 Β·β±οΈ 01.12.2020):pip install arctic
-
Conda (
π₯ 13K Β·β±οΈ 16.12.2019):conda install -c conda-forge arctic
Vaex (π₯ 23 Β· β 5.8K) - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data.. MIT
PandaralΒ·lel (π₯ 23 Β· β 1.4K) - A simple and efficient tool to parallelize Pandas.. BSD-3


datatable (π₯ 21 Β· β 1.1K) - A Python package for manipulating 2-dimensional tabular data.. MPL-2.0
StaticFrame (π₯ 20 Β· β 210) - Immutable and grow-only Pandas-like DataFrames with a more explicit.. MIT
-
GitHub (
π¨βπ» 15 Β·π 20 Β·π¦ 6 Β·π 280 - 9% open Β·β±οΈ 25.02.2021):git clone https://github.com/InvestmentSystems/static-frame
-
PyPi (
π₯ 640 / month Β·β±οΈ 19.02.2021):pip install static-frame
-
Conda (
π₯ 66K Β·β±οΈ 20.02.2021):conda install -c conda-forge static-frame
Bounter (π₯ 17 Β· β 900) - Efficient Counter that uses a limited (bounded) amount of memory.. MIT
PandaPy (π₯ 14 Β· β 470) - PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x.. MIT

Show 5 hidden projects...
- Blaze (
π₯ 28 Β·β 2.9K Β·π ) - NumPy and Pandas interface to Big Data.BSD-3
- sklearn-pandas (
π₯ 28 Β·β 2.3K) - Pandas integration with sklearn.βοΈZlib
- pandasql (
π₯ 22 Β·β 940 Β·π ) - sqldf for pandas.MIT
- pickleDB (
π₯ 21 Β·β 540 Β·π ) - pickleDB is an open source key-value store using Python's json..BSD-3
- Pandas Summary (
π₯ 21 Β·β 360 Β·π ) - An extension to pandas dataframes describe function.MIT
Data Loading & Extraction
Libraries for loading, collecting, and extracting data from a variety of data sources and formats.