feature-engineering

With Featuretools 1.0.0 we add a dataframe to an EntitySet with the following:

es = ft.EntitySet('new_es')

es.add_dataframe(dataframe=orders_df,
                 dataframe_name='orders',
                 index='order_id',
                 time_index='order_date')

Improvement

However, you could also change the EntitySet setter to add it with this approach:

es = ft.Ent

When specifying on demand feature views at retrieval time (e.g. get_X_features), the output feature vectors include e.g. request data or dependent feature vectors, even if users did not specify said features.

Expected Behavior

Non-specified dependent feature values are not returned in output

Current Behavior

Non-specified dependent feature values are in output

Steps to reprodu

Now we are using default spark catalog to load tables from hive metastore.

We should test and use the non-default spark catalog to do that and make sure all the user tables can be loaded for OpenMLDB session.

Problem
Some of our transformers & estimators are not thoroughly tested or not tested at all.

Solution
Use OpTransformerSpec and OpEstimatorSpec base test specs to provide tests for all existing transformers & estimators.

There are several evaluation metrics that would be particularly beneficial for (binary) imbalanced classification problems and would be greatly appreciated additions. In terms of prioritizing implementation (and likely ease of implementation I will rank-order these):

AUCPR - helpful in the event that class labels are needed and the positive class is of greater importance.
**F2 Scor

At the moment, in the categorical tree encoder and the tree discretiser, we have an argument is_regression that the user needs to fill in in order to detect if the user is aiming to perform classification or regression.

Sklearn has an automated process with the is_classification (see Decision tree source code).

Can we bring this functionality to feature-engine?

I think we can :p

Just reviewing the docs and found this under the AutoML User Guide:

We should figure out a way to deal with this kind of thing. I think a couple of options here are:

Modifying the cell to only show the first few keys or so of the output.
Modifying the output cell so that

feature-engineering

Here are 1,225 public repositories matching this topic...

microsoft / nni

EpistasisLab / tpot

alteryx / featuretools

Improvement

alibaba / Alink

feast-dev / feast

Expected Behavior

Current Behavior

Steps to reprodu

4paradigm / OpenMLDB

apachecn / fe4ml-zh

salesforce / TransmogrifAI

mljar / mljar-supervised

ClimbsRocks / auto_ml

DeepWisdom / AutoDL

rorysroes / SGX-Full-OrderBook-Tick-Data-Trading-Strategy

HouJP / kaggle-quora-question-pairs

abhayspawar / featexp

feature-engine / feature_engine

jeongyoonlee / Kaggler

HunterMcGushion / hyperparameter_hunter

Yimeng-Zhang / feature-engineering-and-feature-selection

sberbank-ai-lab / LightAutoML

duxuhao / Feature-Selection

aikho / awesome-feature-engineering

LastAncientOne / Deep-Learning-Machine-Learning-Stock

alteryx / evalml

alteryx / open_source_demos

minerva-ml / open-solution-home-credit

firmai / deltapy

SimonBlanke / Hyperactive

fraunhoferportugal / tsfel

yzkang / My-Data-Competition-Experience

ashishpatel26 / Amazing-Feature-Engineering

Improve this page

Add this topic to your repo