data-engineering

Currently, the funnel report percentage is calculated using:
The number at a given funnel step /
Sum(everything in the funnel)

Example from blog:

Here, the Discussed Pricing (900) gets divided by 11900 (sum of all ev

Opened from the Prefect Public Slack Community

michael.ball: Hey there. I’ve been playing around with Docker storage today, trying to get all source code packaged together with the flows each time they are registered, and am using the files and env_vars attributes as outlined in the Docs. But it seems that my .dockerignore file (in the directory from whic

Describe the bug
data docs columns shrink to 1 character width with long query

To Reproduce
Steps to reproduce the behavior:

make a batch from a long query string
run validation
render result to data docs
See screenshot
<img width="1525" alt="Data_documentation_compiled_by_Great_Expectations" src="https://user-images.githubusercontent.com/928247/103230647-30eca500-4

Tell us about the problem you're trying to solve

We can probably reduce the docker image size of our java based connectors by using the ADD command instead of COPYing the tar archive. See this PR for an example

Describe the solution you’d like

use the ADD command to reduce the size of the docker images

Expected Behavior

Feature views should have the creation time (i.e., created_timestamp) at the first feast apply

Current Behavior

Features Views do not have creation time at feature view creation

Steps to reproduce

feast init fs
cd fs
feast apply
feast registry-dump
{
  "spec": {
    "name": "driver_id",
    "valueType": "INT64",
    "description": "driver

Steps to reproduce:

From the UI, create a repository.
Upload a file.
From the uncommitted tab, commit the change.
From the Objects tab, click the "branch: main" drop down.
Click the arrow on the right.
Select the first commit with the "Repository created" message.

Result: the "get started" screen appears.
Expected: screen should be empty, because this is a past commit.

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase):
@classmethod
def suppress_py4j_logging(cls):
logger = logging.getLogger('py4j')
logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return Sp

Hi ,

I am using some basic functions from pyjanitor such as - clean_names() , collapse_levels() in one of my code which I want to productionise.
And there are limitations on the size of the production code base.
Currently ,if I just look at the requirements.txt for just "pyjanitor" , its huge .
I don't think I require all the dependencies in my code.
How can I remove the unnecessary ones ?

The load_dotted_path raises the following error if unable to load the module:

Traceback (most recent call last):
  File "/Users/Edu/Desktop/import-error/script.py", line 4, in <module>
    load_dotted_path('tests.quality.fn')
  File "/Users/Edu/dev/ploomber/src/ploomber/util/dotted_path.py", line 128, in load_dotted_path
    module = importlib.import_module(mod)
  File "/Users/

When using Ubuntu 'ootb' both natively and within windows WSL2 the asset consumer fvt has a tendency to fail with:

[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ asset-consumer-fvt ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 7 source files to /home/nigel/src/egeria/open-metadata-test/open-metadata-fvt/access-services-fvt/asset-consumer-fvt/tar

data-engineering

Here are 984 public repositories matching this topic...

apache / superset

eugeneyan / applied-ml

andkret / Cookbook

datastacktv / data-engineer-roadmap

PrefectHQ / prefect

Opened from the Prefect Public Slack Community

great-expectations / great_expectations

airbytehq / airbyte

Tell us about the problem you're trying to solve

Describe the solution you’d like

Jeffail / benthos

feast-dev / feast

Expected Behavior

Current Behavior

Steps to reproduce

awslabs / aws-data-wrangler

adilkhash / Data-Engineering-HowTo

treeverse / lakeFS

kantord / just-dashboard

quiltdata / quilt

GoogleCloudPlatform / data-science-on-gcp

benthecoder / yt-channels-DS-AI-ML-CS

san089 / goodreads_etl_pipeline

AlexIoannides / pyspark-example-project

pyjanitor-devs / pyjanitor

abhishek-ch / around-dataengineering

ploomber / ploomber

oleg-agapov / data-engineering-book

san089 / Udacity-Data-Engineering-Projects

gunnarmorling / awesome-opensource-data-engineering

automaticmode / active_workflow

sodadata / soda-sql

odpi / egeria

dataform-co / dataform

kevintpeng / Learn-Something-Every-Day

Cascading / cascading

Improve this page

Add this topic to your repo