New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NullPointerException while compute accuracy with ComputeModelStatistics #736
Comments
@ttpro1995 thank you for the detailed repro. Will take a look at this when I get a chance. Can you please post how you read the "data" variable just so I'm not missing any of the parts that reproduces the issue? My first guess is there might be some missing values in the dataset. |
code that work in python sklearn (so, data is not broken)
|
Another case on Spark Scala with error using different data Notebook on zepl.com **Export to zeppelin (0.8.x): ** Data using in notebook: |
@ttpro1995 oh sorry, I just saw the problem now, you are setting the scores column but not the scored labels column, which is required if you are trying to compute all metrics: |
I don't know what each function, each set col do.
What should be setScoredLabelsCol ? I suggest there should be more detail in documentation . Including: meaning of each parameter, default value (I have to new a object in code, then use get function to know the default value), |
@ttpro1995 I think the column name convention is similar, but maybe not exactly, in most ML platforms, at least in scikit-learn, apache spark and ML.NET. The labelCol is the true labels or original label from the dataset, the ScoredLabelsCol is the label assigned by the classifier, and the ScoresCol is the raw prediction, for example the distance from the separating hyperplane for the support vector machine classifier. There's also usually a probability col, which is not used in ComputeModelStatistics, although you could pass it as the input to the ScoresCol too (it's used for the AUC metric). |
@ttpro1995 "I suggest there should be more detail in documentation . Including: meaning of each parameter, default value (I have to new a object in code, then use get function to know the default value)," |
Version
data (csv with header) https://gist.github.com/ttpro1995/69051647a256af912803c9a16040f43a
download data and save as csv file, put into folder
/data/public/HIGGS/higgs.test.predictioncsv
Schema
Code
Exception
The text was updated successfully, but these errors were encountered: