dataframes

We can reduce friction by figuring out how to load data most efficiently to polars memory.

Describe the bug
pa.errors.SchemaErrors.failure_cases only returns the first 10 failure_cases

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera. 0.6.5
(optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read [this guide](https://matthewrocklin.c

For pipeline stages provided by the pdpipe.basic_stages, supplying conditions to the prec and post keyword arguments may not return the correct error messages.

Example Code

import pandas as pd; import pdpipe as pdp;
df = pd.DataFrame([[1,4],[4,5],[1,11]], [1,2,3], ['a','b'])
pline = pdp.PdPipeline([
  pdp.FreqDrop(2, 'a', prec=pdp.cond.HasAllColumns(['x']))
])
pline.apply(

Currently we don't test (or document) that Eland should work with data streams, we should probably test that everything works properly.

riptable currently only supports changing settings (e.g. number of threads to use for calculations and I/O) by calling functions of the library or setting class-level attributes.

It'd be helpful if the default values for these settings -- at least the most important ones -- could be overridden using environment variables, e.g. how numba supports changing the cache path or number of threads to b

Some unit tests asserting e.g. the length or some other property of the datasets would be nice to have.

As a user, I wish I could access a table's column schema with a column_schemas attribute that is a dictionary of column schemas.

df.ww.column_schemas

This could be useful for helping users understand that they can df.ww.column_schemas[col] instead of df.ww[col].schema better than the columns attribute does.

We should not remove the columns attribute so we don't

Add a few useful date/time types from time (https://hackage.haskell.org/package/time) , e.g.

POSIXTime
Date
etc.

A checklist for where to add things :

prim constructors go in here : https://github.com/ocramz/heidi/blob/master/src/Data/Generics/Encode/Internal/Prim.hs#L25
Heidi instances go here : https://github.com/ocramz/heidi/blob/master/src/Data/Generics

dataframes

Here are 195 public repositories matching this topic...

pola-rs / polars

JuliaData / DataFrames.jl

TileDB-Inc / TileDB

pandera-dev / pandera

rocketlaunchr / dataframe-go

pdpipe / pdpipe

Example Code

elixir-nx / explorer

polyaxon / datatile

elastic / eland

JuliaData / DataFramesMeta.jl

aiguofer / gspread-pandas

rtosholdings / riptable

RumbleDB / rumble

stefmolin / pandas-workshop

DataHaskell / dh-core

alteryx / woodwork

zbrookle / dataframe_sql

JuliaAcademy / DataFrames

hablapps / sparkOptics

Thomas-George-T / Movies-Analytics-in-Spark-and-Scala

JuliaData / DataTables.jl

zbrookle / sql_to_ibis

isarn / isarn-sketches-spark

hackersandslackers / pandas-sqlalchemy-tutorial

JuliaGraphs / GraphDataFrameBridge.jl

dlab-berkeley / R-Data-Wrangling

dkaslovsky / ElasticBatch

zgbjgg / jun

ocramz / heidi

kmatarese / glide

Improve this page

Add this topic to your repo