Skip to content
#

distributed-training

Here are 44 public repositories matching this topic...

jankrynauw
jankrynauw commented Jun 6, 2019

We would like to forward a particular 'key' column which is part of the features to appear alongside the predictions - this is to be able to identify to which set of features a particular prediction belongs to. Here is an example of predictions output using the tensorflow.contrib.estimator.multi_class_head:

{"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
 "scores": [0.068196
aurickq
aurickq commented Sep 15, 2020

Currently, AdaptDL will change the batch size whenever:

  1. The job is restarted.
  2. A new epoch is started.

This can cause the batch size to fluctuate frequently.

Instead, we should only change the batch size if the new batch size will cause a noticeable improvement in the predicted speedup (e.g. by 5% or more). Also consider adding a penalty term when finding the preferred batch size to

Improve this page

Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.