Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: DataFrame Constructions from Data Classes #37577

Open
daskol opened this issue Nov 2, 2020 · 3 comments
Open

ENH: DataFrame Constructions from Data Classes #37577

daskol opened this issue Nov 2, 2020 · 3 comments

Comments

@daskol
Copy link

@daskol daskol commented Nov 2, 2020

Is your feature request related to a problem?

I wish to construct pandas.DataFrame from iterable of dataclasses.dataclass as from iterable of tuples DataFrame.from_records. The rationale behind is that data classes is more typed object than general tuple or dictionary. Also, data classes more memory efficient than tuple's. It makes data classes attractive to use them instead of dict's or tuple's whenever schema is known.

Describe the solution you'd like

I would like class method .from_dataclasses which allows DataFrame construction and type inference from uniform (for simplicity) sequence of data classes. See example below.

import pandas as pd
from dataclasses import dataclass


@dataclass
class Record:
    id: int
    name: str
    constant: float

df = pd.DataFrame.from_dataclasses([
    Record(0, 'Landau', 3.1415926),
    Record(1, 'Kapitsa', 2.718281828459045),
    Record(2, 'Bogolyubov', 6.62607015),
])

print(df.dtypes)
#  id            int64
#  name         object
#  constant    float64
#  dtype: object

In the example above schema of DataFrame is infered with Record.__annotations__ dictionary which contains type user provided type information. API could also provide ways to validate schema in runtime by comparying type of actual type and specified type for a column.

API breaking implications

There is no API breaking in general but there is requirements to minimum Python version (which is 3.7).

@TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Nov 2, 2020

We already support that in the main DataFrame constructor, right? https://pandas.pydata.org/docs/user_guide/dsintro.html?highlight=dataclass#from-a-list-of-dataclasses

@daskol
Copy link
Author

@daskol daskol commented Nov 2, 2020

Wow! It really works. Nice!

Well, I guess that an explicit mention of data classes in API reference is neded. I guess that many people (especially mature users) looks up on the reference and do not read user guide intro.

@jreback
Copy link
Contributor

@jreback jreback commented Nov 2, 2020

Wow! It really works. Nice!

Well, I guess that an explicit mention of data classes in API reference is neded. I guess that many people (especially mature users) looks up on the reference and do not read user guide intro.

sure would take updated docs.

also could take a PR to import dataclass at the top level (as this was for 3.6 compat before): https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/inference.py#L419

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.