Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: retain attrs when concat dataframes #41828

Closed
xiki-tempula opened this issue Jun 5, 2021 · 3 comments · Fixed by #42252
Closed

ENH: retain attrs when concat dataframes #41828

xiki-tempula opened this issue Jun 5, 2021 · 3 comments · Fixed by #42252

Comments

@xiki-tempula
Copy link
Contributor

@xiki-tempula xiki-tempula commented Jun 5, 2021

Is your feature request related to a problem?

I wish the attrs could be retained when concat data frames.

d = {'col1': [1, 2], 'col2': [3, 4]}
df1 = pd.DataFrame(data=d)
df1.attrs = {1:1}
df2 = pd.DataFrame(data=d)
df2.attrs = {1:1}
pd.concat([df1, df2]).attrs
{}

Describe the solution you'd like

d = {'col1': [1, 2], 'col2': [3, 4]}
df1 = pd.DataFrame(data=d)
df1.attrs = {1:1}
df2 = pd.DataFrame(data=d)
df2.attrs = {1:1}
pd.concat([df1, df2]).attrs
{1: 1}

API breaking implications

N/A

Describe alternatives you've considered

N/A

@lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Jun 6, 2021

Hi @xiki-tempula, thanks for the report.
This is indeed currently not implemented(xref #28283), and I guess the correct way to propogate metadata in this case would be to drop them when they don't match as in your example. PRs to fix this are welcome.
cc @TomAugspurger

@xiki-tempula
Copy link
Contributor Author

@xiki-tempula xiki-tempula commented Jun 6, 2021

@lithomas1 Thanks for the comment.

I'm thinking that the logic of the concat should be

def concat(objs, *args, **kwargs):
    '''Concatenate pandas objects along a particular axis with optional set
    logic along the other axes. If all pandas objects has the same attrs
    attribute, the new pandas objects would have this attrs attribute. A
    ValueError would be raised if any pandas object has a different attrs.

    Returns
    -------
    DataFrame
        Concatenated pandas object.
    '''
    # Sanity check
    attrs = objs[0].attrs
    for obj in objs:
        if attrs != obj.attrs:
            raise ValueError('All pandas objects should have the same attrs.')
    new = pd.concat(objs, *args, **kwargs)
    new.attrs = attrs
    return new

I wonder what is your thought?

@TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jun 6, 2021

No, I don't think we should raise if the attrs don't match. They aren't supposed to affect the result of the computation.

For now, let's just support the case where attrs match, dropping them in other cases. We can add a keyword to concat later to control that behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants