#
data-parallelism
Here are 22 public repositories matching this topic...
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
data-science
machine-learning
spark
apache-spark
deep-learning
hadoop
tensorflow
keras
keras-models
optimization-algorithms
data-parallelism
distributed-optimizers
-
Updated
Jul 25, 2018 - Python
A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
runtime
scheduler
openmp
multithreading
parallelism
task-scheduler
message-passing
threadpool
data-parallelism
fork-join
work-stealing
task-parallelism
-
Updated
Jul 4, 2021 - Nim
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
deep-learning
hpc
large-scale
data-parallelism
model-parallelism
distributed-training
pipeline-parallelism
-
Updated
Nov 3, 2021 - Python
Paddle Distributed Training Extended. 飞桨分布式训练扩展包
benchmark
cloud
lightning
elastic
unsupervised-learning
large-scale
data-parallelism
paddlepaddle
model-parallelism
distributed-algorithm
self-supervised-learning
pipeline-parallelism
pretraining
fleet-api
paddlecloud
-
Updated
Nov 3, 2021 - Shell
Distributed Keras Engine, Make Keras faster with only one line of code.
python
distributed-systems
machine-learning
deep-neural-networks
deep-learning
neural-network
tensorflow
parallel-computing
keras
distributed
ray
keras-models
keras-classification-models
keras-neural-networks
tensorflow-models
keras-tensorflow
data-parallelism
distributed-deep-learning
distributed-keras-engine
plaidml
-
Updated
Oct 3, 2019 - Python
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
-
Updated
Nov 19, 2018 - Python
Orkhon: ML Inference Framework and Server Runtime
-
Updated
Feb 1, 2021 - Rust
openmp
mpi
intel
matrix-multiplication
high-performance-computing
parallel-algorithm
algorithm-analysis
data-parallelism
supercomputing
fox-algorithm
-
Updated
Jan 28, 2019 - C
Understanding the effects of data parallelism and sparsity on neural network training
-
Updated
Jul 27, 2021 - Python
Development of Project HPGO | Hybrid Parallelism Global Orchestration
rust
machine-learning
tensorflow
pytorch
data-parallelism
model-parallelism
distributed-training
pipedream
gpipe
pipeline-parallelism
-
Updated
Mar 26, 2021
Dependence-Based Code Transformation for Coarse-Grained Parallelism
-
Updated
Dec 8, 2018 - C++
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
machine-learning
neural-network
torch7
openmpi
data-parallelism
model-parallelism
distributed-machine-learning
-
Updated
Feb 28, 2018 - Lua
A decentralized and distributed framework for training DNNs
-
Updated
Aug 25, 2019 - Python
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
deep-neural-networks
deep-learning
tensorflow
raylib
distributed
hyperparameter-tuning
data-parallelism
medical-image-segmentation
3d-unet
ray-tune
distributed-hyperparameter-tuning
experiment-parallelism
-
Updated
Oct 29, 2021 - Python
A C# based download manager that uses task-based programming using Data parallelism, Task Parallel Library in C# Scheduling, controlling and managing tasks
-
Updated
Apr 2, 2018 - C#
Example of Distributed pyTorch
pytorch
data-parallelism
distributed-training
multi-node-dataparallelism
multi-gpu-training
modelparallelism
pytorch-mp
pytorch-dp
-
Updated
Mar 23, 2019 - Python
Official Repository for the paper: Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation
deep-neural-networks
deep-learning
tensorflow
raylib
distributed
hyperparameter-tuning
data-parallelism
medical-image-segmentation
3d-unet
ray-tune
distributed-hyperparameter-tuning
experiment-parallelism
-
Updated
Oct 29, 2021 - Python
A POSIX vampire number generator.
c
linux
homebrew
freebsd
algorithm
linked-list
math
makefile
quicksort
mathematics
posix
checkpoint
checkpoint-restart
data-parallelism
recreational-mathematics
recreational
unrolled-linked-list
vampire-number
-
Updated
Oct 13, 2021 - C
MapReduceSimulator for Scheduling and Provisioning Algorithms
-
Updated
Oct 28, 2020 - Java
Improve this page
Add a description, image, and links to the data-parallelism topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-parallelism topic, visit your repo's landing page and select "manage topics."
Hi ,
I have tried out both loss.backward() and model_engine.backward(loss) for my code. There are several subtle differences that I have observed , for one retain_graph = True does not work for model_engine.backward(loss) . This is creating a problem since buffers are not being retained every time I run the code for some reason.
Please look into this if you could.