Author
Label
Projects
Milestones
Reviews
Assignee
Sort
fix attention_mask is overwritten by dummy tensor at DeepSpeedSelfAttentionFunction
#1913
opened Apr 26, 2022 by
codertimo
Loading…
[zero-3] add bwd support for list/dict types returned in fwd
#1857
opened Mar 23, 2022 by
jeffra
Loading…
3 of 4 tasks
remove force-multi and fix None val check in base tuner in autotuning
#1657
opened Dec 22, 2021 by
cli99
Loading…
[zero-3] add support for new params added during fwd pass
#1606
opened Dec 1, 2021 by
jeffra
Loading…
support batch size dimension in 2D sparse attention mask
#1597
opened Nov 29, 2021 by
jglaser
Loading…
Optimizer state loading fix for bitsandbytes 8-bit optimizers.
#1582
opened Nov 22, 2021 by
TimDettmers
Loading…
Refine quantizer for supporting larger hidden-dim and group size
#1544
opened Nov 9, 2021 by
RezaYazdaniAminabadi
Loading…
Add some improvements for pipeline module, engine and assertion into ds engine
#1529
opened Nov 6, 2021 by
hyunwoongko
Loading…
parallelize writing of layer checkpoint files across data parallel instances
#1419
opened Sep 30, 2021 by
adammoody
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:master.