Skip to main content

Questions tagged [feature-engineering]

the process of using domain knowledge of the data to create features that improve machine learning algorithms

Filter by
Sorted by
Tagged with
3 votes
4 answers
160 views

Rounding Float Values in ML Models

Let's assume I have a column with float values (e.g., 3.12334354454, 5.75434331354, and so on). If I round these values to two decimal places (e.g., 3.12, 5.75), I think the advantages and ...
Guna's user avatar
  • 390
1 vote
0 answers
20 views

SHAP vs. Manual Analysis: Why Opposite Correlations for a feature?

When plotting a SHAP beeswarm plot on my binary classification model (predicting subscription renewal probability), one of the columns indicate that high feature values correlate with low SHAP values ...
fendrbud's user avatar
0 votes
0 answers
10 views

How to Represent Structured Inputs in a Neural Network for Multi-Entity Prediction?

I'm building a neural network model to predict which student in a class will achieve the highest score on an upcoming exam (this is not the actual task, I actually modified the task to maintain ...
Saffy's user avatar
  • 11
1 vote
1 answer
48 views

I didn't scale all features I used for prediction, does it make sense?

In my regression-based machine learning project, I have features like coordinates (latitude and longitude) that I prefer not to scale or transform. The main reason is that reversing the transformation ...
ml.freak's user avatar
  • 103
0 votes
1 answer
17 views

Calculating risk or amount of slipperiness based on historical weather data

Given hourly updates of precipitation amount (for the preceding hour) and temperature, how would you calculate if it's slippery or not?
tsorn's user avatar
  • 173
3 votes
1 answer
258 views

How Should I Handle Ordered Features with a Censored Outcome Variable? [closed]

I have a dataset with many ordered features, most of which have 3 levels (e.g., 0, 1, 2), and my outcome variable is censored. I’m debating whether to treat these ordinal features as numeric or ...
Seydou GORO's user avatar
0 votes
0 answers
9 views

Scaling and Feature Transformations that are Non Symmetrical for Classification

I want to transform some feature values within my model using a cube root transformation, for the purposes of easing some skewness in my data. However, I've noticed that after I cube root certain ...
user54565's user avatar
0 votes
0 answers
8 views

How to Use a tsfresh Feature Calculator with Results from Another Feature Calculator

When using the tsfresh library for feature extraction, is it possible to run a feature calculator that takes the results of another feature calculator as its parameters? For instance, I want to ...
Akrem GOMRI's user avatar
0 votes
1 answer
61 views

Why should I not use Id as a field in feature engineering for ML

While feature engineering and deriving features why should I not use I’d as a field for tasks like regressions
Aryan's user avatar
  • 1
1 vote
0 answers
30 views

I am trying to build a logistic regression model

I have a time series data of which a family have spent money on different products. Each product is allocated to a category ( it can be a two level category path ) for eg- (Food > Chicken) or (...
ted's user avatar
  • 111
0 votes
0 answers
9 views

Training upstream model parameters with end of pipeline actuals

Existing Model I have an existing, pre-trained, RandomForest model. For this example, let's assume the model was trained with 3 input values like this synthetic data set: ...
Jed's user avatar
  • 129
0 votes
0 answers
8 views

Importance of resampling when establishing a cutoff for categorical data

I am reading Feature Engineering and Selection by Max Kuhn and Kjell Johnson, and on page 97, section 5.2 it has the following (my question is ref. the last sentence): 'Although near-zero variance ...
horned-sphere's user avatar
0 votes
0 answers
53 views

Why can't my neural network model learn abs(x1-x2) function?

I am trying to train a simple neural network model for multiclass classification. I have x1,x2,x3,x4 columns with 4 classes to predict. If just train on x1,x2,x3,x4 then I get accuracy of 88% With ...
Rushabh Kheni's user avatar
0 votes
0 answers
8 views

Creating object profiles based on their attributes

I'm working on a recommendation system to suggest alternative cities based on how similar city A is to the recommendations. To do this, I gathered information about each city's different points of ...
James's user avatar
  • 1
0 votes
0 answers
13 views

Approach to feature engineer mean columns to avoid data leakage?

I understand the intuition behind data leakage, but am not sure of a correct process to avoid it. Suppose there were calculated columns for averages of particular groups, calculated and created ...
ssou's user avatar
  • 13

15 30 50 per page
1
2 3 4 5
44