[SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata #28616

mccheah · 2020-05-22T20:09:31Z

Introduces the concept of a MapOutputMetadata opaque object that can be returned from map output writers.

Note that this PR only proposes the API changes on the shuffle writer side. Following patches will be proposed for actually accepting the metadata on the driver and persisting it in the driver's shuffle metadata storage plugin.

Why are the changes needed?

For a more complete design discussion on this subject as a whole, refer to this design document.

Does this PR introduce any user-facing change?

Enables additional APIs for the shuffle storage plugin tree. Usage will become more apparent as the API evolves.

How was this patch tested?

No tests here, since this is only an API-side change that is not consumed by core Spark itself.

…etadata

SparkQA · 2020-05-22T22:48:52Z

Test build #123020 has finished for PR 28616 at commit 6ecd3ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

seanli-rallyhealth · 2020-05-25T15:30:23Z

core/src/main/java/org/apache/spark/shuffle/api/ShuffleMapOutputWriter.java

@@ -63,7 +64,7 @@
   * The returned array should contain, for each partition from (0) to (numPartitions - 1), the
   * number of bytes written by the partition writer for that partition id.
   */
-  long[] commitAllPartitions() throws IOException;
+  MapOutputCommitMessage commitAllPartitions() throws IOException;


need rewrite above comment?

attilapiros · 2020-05-26T16:19:30Z

core/src/main/java/org/apache/spark/shuffle/api/metadata/MapOutputCommitMessage.java

+  private final long[] partitionLengths;
+  private final Optional<MapOutputMetadata> mapOutputMetadata;
+
+  MapOutputCommitMessage(


Nit: we can make this constructor private if we would like to propagate the usage of the static factory method MapOutputCommitMessage.of.

SparkQA · 2020-06-03T23:42:48Z

Test build #123503 has finished for PR 28616 at commit 4fd056d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mccheah · 2020-06-09T00:23:46Z

retest this please

mccheah · 2020-06-09T00:24:26Z

@holdenk @squito can you take a look? Starting off the SPARK-25299 features.

SparkQA · 2020-06-09T04:04:09Z

Test build #123661 has finished for PR 28616 at commit 4fd056d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-06-15T01:12:38Z

Test build #124014 has finished for PR 28616 at commit 4fd056d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-06-20T03:01:05Z

Retest this please.

SparkQA · 2020-06-20T05:34:41Z

Test build #124308 has finished for PR 28616 at commit 4fd056d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-06-21T23:31:12Z

Retest this please

SparkQA · 2020-06-22T02:34:01Z

Test build #124341 has finished for PR 28616 at commit 4fd056d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @mccheah .
Merged to master for Apache Spark 3.1. (December 2020).

HyukjinKwon · 2020-06-29T06:29:29Z

LGTM too

tgravescs · 2020-09-01T13:47:38Z

core/src/main/java/org/apache/spark/shuffle/api/metadata/MapOutputMetadata.java

+ * All implementations must be serializable since this is sent from the executors to
+ * the driver.
+ */
+public interface MapOutputMetadata extends Serializable {}


sorry for commenting on closed PR, looking at this to review newer pro - https://github.com/apache/spark/pull/28618/files - these should probably be annotated with @SInCE

Also should these be @evolving or DeveloperApi vs Private? this by itself doesn't do any good and the intention is for people to be able to implement it right?

I roughly remember I asked the same thing to @squito before. The reason was that it's not stable yet (?) and presumably wants to test it internally before making an API .. I guess.

mccheah added 4 commits March 16, 2020 14:42

Return an optional map output metadata from shuffle map output writers.

98cc9b7

Merge remote-tracking branch 'origin/master' into return-map-output-m…

8c689ba

…etadata

Fix check

90084ea

Merge remote-tracking branch 'origin/master' into return-map-output-m…

6ecd3ad

…etadata

mccheah changed the title ~~[SPARK-31798] Shuffle Writer API changes to return custom map output metadata~~ [SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata May 22, 2020

mccheah mentioned this pull request May 23, 2020

[SPARK-31801][API][SHUFFLE] Register map output metadata #28618

Closed

seanli-rallyhealth reviewed May 25, 2020

View reviewed changes

attilapiros reviewed May 26, 2020

View reviewed changes

Address comments

4fd056d

probot-autolabeler bot added the CORE label Jun 3, 2020

dongjoon-hyun approved these changes Jun 22, 2020

View reviewed changes

dongjoon-hyun closed this in aa4c100 Jun 22, 2020

tgravescs reviewed Sep 1, 2020

View reviewed changes

attilapiros mentioned this pull request Dec 14, 2020

[SPARK-31801][API][SHUFFLE] Register map output metadata #30763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata #28616

[SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata #28616

mccheah commented May 22, 2020 •

edited

Loading

SparkQA commented May 22, 2020

seanli-rallyhealth May 25, 2020

attilapiros May 26, 2020

SparkQA commented Jun 3, 2020

mccheah commented Jun 9, 2020

mccheah commented Jun 9, 2020

SparkQA commented Jun 9, 2020

SparkQA commented Jun 15, 2020

dongjoon-hyun commented Jun 20, 2020

SparkQA commented Jun 20, 2020

dongjoon-hyun commented Jun 21, 2020

SparkQA commented Jun 22, 2020

dongjoon-hyun left a comment

HyukjinKwon commented Jun 29, 2020

tgravescs Sep 1, 2020

HyukjinKwon Sep 2, 2020

[SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata #28616

[SPARK-31798][SHUFFLE][API] Shuffle Writer API changes to return custom map output metadata #28616

Conversation

mccheah commented May 22, 2020 • edited Loading

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented May 22, 2020

seanli-rallyhealth May 25, 2020

Choose a reason for hiding this comment

attilapiros May 26, 2020

Choose a reason for hiding this comment

SparkQA commented Jun 3, 2020

mccheah commented Jun 9, 2020

mccheah commented Jun 9, 2020

SparkQA commented Jun 9, 2020

SparkQA commented Jun 15, 2020

dongjoon-hyun commented Jun 20, 2020

SparkQA commented Jun 20, 2020

dongjoon-hyun commented Jun 21, 2020

SparkQA commented Jun 22, 2020

dongjoon-hyun left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Jun 29, 2020

tgravescs Sep 1, 2020

Choose a reason for hiding this comment

HyukjinKwon Sep 2, 2020

Choose a reason for hiding this comment

mccheah commented May 22, 2020 •

edited

Loading