Questions tagged [feature-engineering]
the process of using domain knowledge of the data to create features that improve machine learning algorithms
647 questions
3
votes
4
answers
160
views
Rounding Float Values in ML Models
Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75),
I think the advantages and ...
1
vote
0
answers
20
views
SHAP vs. Manual Analysis: Why Opposite Correlations for a feature?
When plotting a SHAP beeswarm plot on my binary classification model (predicting subscription renewal probability), one of the columns indicate that high feature values correlate with low SHAP values ...
0
votes
0
answers
10
views
How to Represent Structured Inputs in a Neural Network for Multi-Entity Prediction?
I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain ...
1
vote
1
answer
48
views
I didn't scale all features I used for prediction, does it make sense?
In my regression-based machine learning project, I have features like coordinates (latitude and longitude) that I prefer not to scale or transform. The main reason is that reversing the transformation ...
0
votes
1
answer
17
views
Calculating risk or amount of slipperiness based on historical weather data
Given hourly updates of precipitation amount (for the preceding hour) and temperature, how would you calculate if it's slippery or not?
3
votes
1
answer
258
views
How Should I Handle Ordered Features with a Censored Outcome Variable? [closed]
I have a dataset with many ordered features, most of which have 3 levels (e.g., 0, 1, 2), and my outcome variable is censored. I’m debating whether to treat these ordinal features as numeric or ...
0
votes
0
answers
9
views
Scaling and Feature Transformations that are Non Symmetrical for Classification
I want to transform some feature values within my model using a cube root transformation, for the purposes of easing some skewness in my data. However, I've noticed that after I cube root certain ...
0
votes
0
answers
8
views
How to Use a tsfresh Feature Calculator with Results from Another Feature Calculator
When using the tsfresh library for feature extraction, is it possible to run a feature calculator that takes the results of another feature calculator as its parameters?
For instance, I want to ...
0
votes
1
answer
61
views
Why should I not use Id as a field in feature engineering for ML
While feature engineering and deriving features why should I not use I’d as a field for tasks like regressions
1
vote
0
answers
30
views
I am trying to build a logistic regression model
I have a time series data of which a family have spent money on different products. Each product is allocated to a category ( it can be a two level category path ) for eg- (Food > Chicken) or (...
0
votes
0
answers
9
views
Training upstream model parameters with end of pipeline actuals
Existing Model
I have an existing, pre-trained, RandomForest model. For this example, let's assume the model was trained with 3 input values like this synthetic data set:
...
0
votes
0
answers
8
views
Importance of resampling when establishing a cutoff for categorical data
I am reading Feature Engineering and Selection by Max Kuhn and Kjell Johnson, and on page 97, section 5.2 it has the following (my question is ref. the last sentence):
'Although near-zero variance ...
0
votes
0
answers
53
views
Why can't my neural network model learn abs(x1-x2) function?
I am trying to train a simple neural network model for multiclass classification.
I have x1,x2,x3,x4 columns with 4 classes to predict.
If just train on x1,x2,x3,x4 then I get accuracy of 88%
With ...
0
votes
0
answers
8
views
Creating object profiles based on their attributes
I'm working on a recommendation system to suggest alternative cities based on how similar city A is to the recommendations. To do this, I gathered information about each city's different points of ...
0
votes
0
answers
13
views
Approach to feature engineer mean columns to avoid data leakage?
I understand the intuition behind data leakage, but am not sure of a correct process to avoid it. Suppose there were calculated columns for averages of particular groups, calculated and created ...