Adding a down-sampling function based on an input list/array for new column/index resolution.  Does not depend on uniform distribution.

#### Is your feature request related to a problem?

I recently posted an inquiry on Stack Overflow requesting assistance regarding Pandas and down-sampling.  I had some forward binned data, particle counts logged at various diameters:

296.54,303.14,311.88,320.87,330.12,339.63,349.42,359.49,369.86,380.52,391.49,402.77 nm

The counts were logged at between 296.54 and 303.14 nm, and so on.

I wanted to re-bin those counts to a new set of diameters:

300,325,350,375,400 nm

where the counts would be logged between 300 and 325 nm and so on.

I had to, column by column, brute force add and interpolate matching diameters.  For instance the 300-325 nm bin would contain part of the 296-303 bin, plus the bins up to 320 nm, and part of the 320-330nm bin.  and so on. 


#### Describe the solution you'd like

I would like to see a solution where one pandas dataframe can be interpolated to not a higher resolution, but a lower one, set by a second set of values, either on columns (my specific example) or indices.  Currently the interpolate function only fills missing data, it does not, to my knowledge, completely rebuild the dataframe to an entirely new resolution, one that may not, in fact, be uniformly distributed.


#### API breaking implications

[this should provide a description of how this feature will affect the API]

#### Describe alternatives you've considered

The challenge is that the size spectrum cannot be linearly interpreted between the smallest and largest diameters.  So a simple sum, re-index and interpolate would not conserve spectral data.   

Nor does the  resample function do the trick since it applies to datetime indices only. 

#### Additional context

This particular example if for forward-binned data.  Some thought would have to be given to manage backward-binned or center-binned data.  

### My code example
I apologize for the crudity, but I am not a Pandas developer.  Here is what my implementation ended up being:
```python
# set up a 10x12 test array
test_array=(np.arange(120)).reshape(10,12)
# set a simple index
index_list=list(string.ascii_lowercase)[:10]
# initialize the high-resolution dataframe
df=pd.DataFrame(test_array, index=index_list)
# set the high-resolution column headers
df.columns=[296.54,303.14,311.88,320.87,330.12,339.63,349.42,359.49,369.86,380.52,391.49,402.77]
# set a column diameter array to search sort through
df_column_array=np.array(df.columns)
# set the new low-resolution diameters
new_columns=np.array([300,325,350,375,400])
# intialize a low-res dataframe
new_df=pd.DataFrame(columns=new_columns,index=index_list)
# loop through the low-res columns up to the second last (since these bins are forward counted)
for i,column in enumerate(new_columns[:-1]):
    df_columns_backward_index=np.searchsorted(df.columns,column)-1 # locate the closest high-res bin before the current low res bin
    df_columns_forward_index=np.searchsorted(df.columns,new_columns[i+1]) # locate the closest high-res bin after the current low res bin
    backward_bin_size=df.columns[df_columns_backward_index+1]-df.columns[df_columns_backward_index] # calculate the first high-res bin size
    # calculate the offset and fraction for interpolation 
    backward_offset=column-df.columns[df_columns_backward_index] 
    backward_fraction=1-backward_offset/backward_bin_size
    # repeat for the last high-res bins closest to the current low-res bin
    forward_bin_size=df.columns[df_columns_forward_index]-df.columns[df_columns_forward_index-1]    
    forward_offset=df.columns[df_columns_forward_index]-new_columns[i+1]
    forward_fraction=1-forward_offset/forward_bin_size
    # sum the fraction of high-res bins partially in the low-res bin along with the high-res bins fully within the low-res bin
    new_df[column]=df.iloc[:,df_columns_backward_index]*backward_fraction+df.iloc[:,df_columns_backward_index+1:df_columns_forward_index-1].sum(axis=1)+df.iloc[:,df_columns_forward_index-1]*forward_fraction
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a down-sampling function based on an input list/array for new column/index resolution. Does not depend on uniform distribution. #40023

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

My code example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding a down-sampling function based on an input list/array for new column/index resolution. Does not depend on uniform distribution. #40023

Description

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

My code example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions