Skip to content
#

distributed-training

Here are 70 public repositories matching this topic...

jankrynauw
jankrynauw commented Jun 6, 2019

We would like to forward a particular 'key' column which is part of the features to appear alongside the predictions - this is to be able to identify to which set of features a particular prediction belongs to. Here is an example of predictions output using the tensorflow.contrib.estimator.multi_class_head:

{"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
 "scores": [0.068196
borzunov
borzunov commented Sep 21, 2021

Simple mistakes trigger unclear error messages in the ALBERT example, that is:

  • Absence of the unpacked data for trainer (currently triggers requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/data/tokenizer)
  • Running all peers in --client_mode (currently triggers AllReduce failed: could not find a group)

It would be great to

Improve this page

Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."

Learn more