data-processing

🚨🚨 Feature Request

Related to an existing Issue
A new implementation (Improvement, Extension)

If your feature will improve `HUB`

Need a way to check if a dataset already exists.

hub.empty throws an error if a dataset exists and hub.load throws an error if the dataset does not exist.

Need a way to check if a dataset already exists without throwing a

Describe the bug
pa.errors.SchemaErrors.failure_cases only returns the first 10 failure_cases

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera. 0.6.5
(optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read [this guide](https://matthewrocklin.c

setting pretrained_model_name will not only define the model arch but also load the pre-trained checkpoint. We should have another hparam to control whether to load pre-trained checkpoint or not.

Hello Benito,

For a specific task I need a "bitwise exclusive or"-function, but I realized xidel doesn't have one. So I created a function for that.

I was wondering if, in addition to the EXPath File Module, you'd be interested in integrating the EXPath Binary Module as well. Then I can use bin:xor() instead (although for

Write unit test coverage for SafeDataset and SafeDataLoader, along with the functions in utils.py.

The exception in subject is thrown by the following code:

from datetime import date
from pysparkling.sql.session import SparkSession
from pysparkling.sql.functions import collect_set

spark = SparkSession.Builder().getOrCreate()

dataset_usage = [
    ('steven', 'UUID1', date(2019, 7, 22)),
]
dataset_usage_schema = 'id: string, datauid: string, access_date: date'

df = spa

Is your feature request related to a problem? Please describe.
To prepare medical NER detection, we need to create a reader for the BC5CDR in the BLUE Benchmark: https://github.com/ncbi-nlp/BLUE_Benchmark

Describe the solution you'd like

Develop a reader for BC5CDR
Annotate the Entity Mentions from the dataset.

Describe alternatives you've considered
A clear and concise

data-processing

Here are 549 public repositories matching this topic...

johnkerl / miller

lorien / awesome-web-scraping

activeloopai / Hub

🚨🚨 Feature Request

If your feature will improve HUB

NVIDIA / DALI

asyml / texar

TomWright / dasel

dashbitco / broadway

onceupon / Bash-Oneliner

python-bonobo / bonobo

microsoft / DialoGPT

GoogleCloudPlatform / data-science-on-gcp

GoogleCloudPlatform / DataflowJavaSDK

pandera-dev / pandera

asyml / texar-pytorch

infoslack / awesome-kafka

benibela / xidel

kousun12 / eternal

constellation-rs / amadeus

msamogh / nonechucks

alttch / rapidtables

SebKrantz / collapse

Yord / pxi

maykulkarni / Machine-Learning-Notebooks

svenkreiss / pysparkling

streamnative / pulsar-flink

PytLab / VASPy

iTechArt / convtools

lithops-cloud / lithops

matousc89 / padasip

asyml / forte

Improve this page

Add this topic to your repo

If your feature will improve `HUB`