Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OpWorkflowModelLocalTest due to flaky XGBoost training #494

Merged
merged 6 commits into from Jul 23, 2020

Conversation

@TuanNguyen27
Copy link
Collaborator

@TuanNguyen27 TuanNguyen27 commented Jul 20, 2020

Source of flakiness: default BinaryClassificationModelSelector.withTrainValidationSplit sometimes makes the training set contain only positive or negative labels, which fails the training for xgboost.

We address this flakiness by fixing the seed in the DataSplitter for withTrainValidationSplit, which will result in the same train-test split every time the test is run.

ml.dmlc.xgboost4j.java.XGBoostError: [16:55:13] /xgboost/src/metric/rank_metric.cc:515: Check failed: !auc_error: AUC-PR: the dataset only contains pos or neg samples
@codecov
Copy link

@codecov codecov bot commented Jul 20, 2020

Codecov Report

Merging #494 into master will decrease coverage by 3.80%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #494      +/-   ##
==========================================
- Coverage   82.63%   78.83%   -3.81%     
==========================================
  Files         345      345              
  Lines       11702    11702              
  Branches      388      388              
==========================================
- Hits         9670     9225     -445     
- Misses       2032     2477     +445     
Impacted Files Coverage Δ
...scala/com/salesforce/op/utils/text/TextUtils.scala 0.00% <0.00%> (-100.00%) ⬇️
.../scala/com/salesforce/op/test/FeatureAsserts.scala 0.00% <0.00%> (-100.00%) ⬇️
...ala/com/salesforce/op/readers/CSVAutoReaders.scala 0.00% <0.00%> (-100.00%) ⬇️
...la/com/salesforce/op/test/TestFeatureBuilder.scala 0.00% <0.00%> (-100.00%) ⬇️
...om/salesforce/op/stages/impl/feature/OpNGram.scala 0.00% <0.00%> (-100.00%) ⬇️
...alesforce/op/stages/impl/feature/OpHashingTF.scala 0.00% <0.00%> (-100.00%) ⬇️
...lesforce/op/stages/impl/feature/LangDetector.scala 0.00% <0.00%> (-100.00%) ⬇️
...sforce/op/aggregators/CustomMonoidAggregator.scala 0.00% <0.00%> (-100.00%) ⬇️
...sforce/op/stages/base/binary/BinaryEstimator.scala 0.00% <0.00%> (-100.00%) ⬇️
...e/op/stages/impl/feature/TextMapLenEstimator.scala 0.00% <0.00%> (-100.00%) ⬇️
... and 111 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f764842...2313d09. Read the comment docs.

@nicodv
nicodv approved these changes Jul 20, 2020
Copy link
Collaborator

@nicodv nicodv left a comment

LGTM

@gerashegalov
Copy link
Collaborator

@gerashegalov gerashegalov commented Jul 20, 2020

Add a description of flakiness, and how your fix addresses it

Copy link
Collaborator

@Jauntbox Jauntbox left a comment

LGTM

@TuanNguyen27 TuanNguyen27 merged commit 7759915 into master Jul 23, 2020
5 of 6 checks passed
5 of 6 checks passed
codecov/project 78.83% (+-3.81%) compared to f764842
Details
CodeFactor No issues found.
Details
Travis CI - Pull Request Build Passed
Details
build Workflow: build
Details
codecov/patch Coverage not affected when comparing f764842...2313d09
Details
salesforce-cla All contributors have signed the CLA
Details
@TuanNguyen27 TuanNguyen27 deleted the fixFlakyXGB branch Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.