Skip to content
#

preprocessing

Here are 863 public repositories matching this topic...

igel
nidhaloff
nidhaloff commented May 27, 2021

Hello everyone,

First of all, I want to take a moment to thank all contributors and people who supported this project in any way ;) you are awesome!

If you like the project and have any interest in contributing/maintaining it, you can contact me here or send me a msg privately:

PS: You need to be familiar with python and machine learning

help wanted good first issue contribution feature
EvenOldridge
EvenOldridge commented Jun 8, 2021

Current version of bucketize uses fixed boundaries. If the user doesn't know these boundaries they need to calculate them using cudf.

We should support splitting continuous variables into buckets based on quantile and uniform splits of the data.

For uniform splits the statistics gathering phase needs to compute the min and max of the column and figure out the boundaries to create N buckets.

enhancement good first issue
OswinC
OswinC commented Feb 11, 2022
BaseColumn::genericUnaryUDF
BaseColumn::genericBinaryUDF
BaseColumn::genericTrinaryUDF

https://github.com/facebookresearch/torcharrow/blob/main/csrc/velox/column.h#L364-L377

This is in the Eager Mode/Velox Backend.

The generic UDF call methods should be general enough to not be bound to any columns. For example when there are no arguments or all arguments are scalars, conceptual

good first issue
cfezequiel
cfezequiel commented Oct 14, 2020

Is your feature request related to a problem? Please describe.
Change split values from all caps to lower case.
This makes file/directory naming more consistent with the split.

Describe the solution you'd like
TRAIN -> train
VALIDATION -> validation
TEST -> test

Describe alternatives you've considered

  1. No change
  • There's a bit of skew when it comes to mapping split val
enhancement good first issue

Improve this page

Add a description, image, and links to the preprocessing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the preprocessing topic, visit your repo's landing page and select "manage topics."

Learn more