Let's use this Issue to track performance issues and enhancement requests, so it's easier to prioritize the work.
This is for pytorch transformers
Also I will label it as a Good Difficult Issue in case someone is ready for a challenging but rewarding experience of figuring things out. If you do want to take the challenge comment in the corresponding Issue/PR that resonates with you so others would know you're working on it.
If I missed any other relevant open performance-related Issues/PRs that need attention please comment below.
Regression:
#11218 Regression after Bart-like refactoring - need to compare the original Bart refactoring PR since most likely the regression happened there.
[ ]
Odd slowness:
#10816 figuring out why eval with --fp16_full_eval is 25% slower
[ ]
Fused kernels possibilities:
#11368 Megatron fused CUDA kernels to improve Hugging Face model classes' scalability
research pytorch kernels?
I know Deepspeed has various kernels that we might be able to use
Faster / leaner startup / module loading
#12274 - skip storage allocation which gets dropped for pretrained weights
Faster optimizers
#12084 - a proposal to port MemoryEfficientFP16Optimizer from fairseq
#9965 - torch.optim._multi_tensor faster optimizers - having some bottleneck in the test script - need to profile
stas00
changed the title
[Performance] Tracking open Issues and PRs
[Performance] Tracking open Issues and PRs (pytorch)
Jun 12, 2021
stas00
changed the title
[Performance] Tracking open Issues and PRs (pytorch)
[Performance] Tracking open Issues and PRs (pytorch transformers)
Jun 12, 2021
Let's use this Issue to track performance issues and enhancement requests, so it's easier to prioritize the work.
This is for pytorch
transformers
Also I will label it as a
Good Difficult Issue
in case someone is ready for a challenging but rewarding experience of figuring things out. If you do want to take the challenge comment in the corresponding Issue/PR that resonates with you so others would know you're working on it.If I missed any other relevant open performance-related Issues/PRs that need attention please comment below.
Regression:
Odd slowness:
Fused kernels possibilities:
Faster / leaner startup / module loading
Faster optimizers
MemoryEfficientFP16Optimizer
from fairseqtorch.optim._multi_tensor
faster optimizers - having some bottleneck in the test script - need to profileScalability
Deepspeed-specific features
from_pretrained
loading fasterTests
The text was updated successfully, but these errors were encountered: