Build data pipelines, the easy way
-
Updated
Apr 3, 2023 - TypeScript
Build data pipelines, the easy way
StreamPark, Make stream processing easier! easy-to-use streaming application development framework and operation platform
Example project implementing best practices for PySpark ETL jobs and applications.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
A simplified, lightweight ETL Framework based on Apache Spark
A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc. Embed Hamilton anywhere python runs, e.g. spark, airflow, jupyter, fastapi, python scripts, etc.
A Clojure high performance data processing system
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
A simple Spark-powered ETL framework that just works
Pipebird is open source infrastructure for securely sharing data with customers.
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Data pipelines from re-usable components
Download DIG to run on your laptop or server.
a web engine for your business. Powered by a top-shelf, modern Ruby & JS stack. Out of the box support for Automation, CMS, blog, forum and email. Developer friendly & easily extendable for your SaaS/XaaS project. Built with familiar tooling including Devise, Sidekiq, Ember.js & PostgreSQL
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."