Questions tagged [machine-learning]
Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.
11,395 questions
3
votes
4
answers
160
views
Rounding Float Values in ML Models
Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75),
I think the advantages and ...
1
vote
0
answers
21
views
Non smooth RUC Curve
I have a question regarding my ROC curve. It is a health science-related project, and I am trying to predict if the hospital report matches the company. The dependent variable in binary (0 and 1). The ...
2
votes
1
answer
173
views
Suggestion for data analysis with meteorological data
not sure if this is the right place or not to ask for advise about my issue, if not sorry you can close this post.
I have a project at university where I have to analyse a dataset with meteorological ...
2
votes
0
answers
26
views
suppose 1 category in a variable create data leakage, can we use other categories in the same variable as dummy to predict?
We are predicting conversion. Conversion means customer converted from paying one-off to paying regular (subscribe)
If one feature is categorical feature "Activity" , consisting 15+ ...
1
vote
1
answer
108
views
shap summary plot - will the effect of each feature always be the same in the 2 classes?
I'm looking on the explanation of shap from: datacamp and when looking on the summary plot it looks:
We have 2 classes, and it seems that the effect of feature of one class is has the same value (...
2
votes
0
answers
37
views
How does fine tuning actually work?
So i’m currently fine tuning a pretrained model with 35k images across 5 classes. Very high class imbalance with one being 73% across the distribution.
Handled this with by using a weighted loss ...
5
votes
1
answer
82
views
How to correctly perform link prediction inference on a new, unseen graph?"
I'm working on an industrial AI use case where I train a Graph Neural Network (GCN) for link prediction — specifically, to predict successor tasks in project planning graphs (e.g., for construction or ...
5
votes
0
answers
56
views
What are some popular but outdated or ineffective practices in data science?
I was taught stepwise feature selection (like forward and backward selection) during college, and at the time, it seemed like a really effective way to pick features. But recently i have been reading ...
2
votes
1
answer
39
views
Why the mean shap values of 1 class are X2 than the other class?
I'm looking on the explanation of shap from: datacamp
I looked on the summary plot:
do when looking on:
We have 2 classes, then I assume that the effect of a ...
0
votes
0
answers
14
views
Can i use historical error occurrence count data every day from a machine to predict when the errors will cross a certain threshold?
I have been working on a project for predictive maintenance and have been studying research papers on it. According to my observation, predictive maintenance is mainly done using sensor data tracking ...
1
vote
0
answers
13
views
Anomaly detection time in time-series for drops
I am looking into different statistical methods for determining a decrease in a numeric "count" feature across a time-series dataset. The dataset is relatively small (about 50 records), and ...
0
votes
0
answers
12
views
Isolation Forest sample size
I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features.
To prevent any overfitting, what would you recommend to tune ...
1
vote
0
answers
27
views
What's wrong with my ML implementation? (from a technical report)
I came across a (short and curt) technical report that claims to be SOTA on keyword spotting, but it didn't share its code and had a very short explanation of its network. I implemented the model, but ...
4
votes
1
answer
59
views
Unsupervised Isolation Forrest sklearn hyperparameters
I am using sklearn's IsolationForest for unsupervised anomaly detection task. According to the docs, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html, there are ...
3
votes
0
answers
58
views
How can I link tasks using machine learning / ai based on historical task sequences?
I'm working on an AI model to predict dependency links between tasks for industrial plannifications, based on historical project data. I have two tables:
Task Table (15 sheets, one sheet = one ...