Skip to content
#

distributed-training

Here are 80 public repositories matching this topic...

jankrynauw
jankrynauw commented Jun 6, 2019

We would like to forward a particular 'key' column which is part of the features to appear alongside the predictions - this is to be able to identify to which set of features a particular prediction belongs to. Here is an example of predictions output using the tensorflow.contrib.estimator.multi_class_head:

{"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
 "scores": [0.068196
enhancement help wanted good first issue
ColossalAI
SMesForoush
SMesForoush commented Mar 12, 2022

Dear Colossal-AI team,
There are a few features in my mind that I thought would be helpful to the project, and I wanted to ask if there is any of them which might be more useful so I could start implementing them.
Loki-Promtail is a tool for monitoring distributed logs with Grafana. Connecting the Distributed Logger to it and extracting labels from the log structure would be a user-friendly sys

good first issue
borzunov
borzunov commented Sep 21, 2021

Simple mistakes trigger unclear error messages in the ALBERT example, that is:

  • Absence of the unpacked data for trainer (currently triggers requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/data/tokenizer)
  • Running all peers in --client_mode (currently triggers AllReduce failed: could not find a group)

It would be great to

good first issue help wanted
adaptdl
aurickq
aurickq commented Sep 6, 2020

torchtext (as of 0.4.0) adopts torch.utils.data.DataLoader, and the older iterator interface is deprecated. Ensure AdaptDL's AdaptiveDataLoader supports this new torchtext interface for data loading, and port the example transformer code to the new interface. Then, adaptdl.data.iterator can be deprecated/removed.

enhancement good first issue

Improve this page

Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."

Learn more