#

spark-sql

Here are 443 public repositories matching this topic...

getredash / redash

Star

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

visualization javascript mysql python bigquery bi spark dashboard athena analytics postgresql business-intelligence redash redshift databricks hacktoberfest spark-sql

Updated Aug 13, 2021
Python

dotnet / spark

Star

Open

[FEATURE REQUEST]: Implement ML Features

22

GoEddie commented Dec 30, 2019

This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features

Bucketizer has been implemented in dotnet/spark#378 but there are more features that should be implemented.

Feature Extractors
- TF-IDF
- Word2Vec (dotnet/spark#491)
- CountVectorizer (https://github.com/dotnet/spark/p

Read more

enhancement help wanted good first issue

Open

Implement accumulator in TaskRunner.cs

Open

Write unit tests for SimpleWorker.

2

Find more good first issues →

almond-sh / almond

Star

A Scala kernel for Jupyter

scala spark jupyter repl jupyter-notebook jupyter-kernels spark-sql

Updated Aug 14, 2021
Scala

oeljeklaus-you / UserActionAnalyzePlatform

Star

电商用户行为分析大数据平台

java spark hadoop sparkjava accumulator spark-sql kyro

Updated Jan 20, 2021
Java

apache / incubator-kyuubi

Star

Apache Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark

kubernetes sql spark hive hadoop jdbc thrift data-lake spark-sql kyuubi-server thrift-jdbc odbc-server

Updated Aug 14, 2021
Scala

databricks / LearningSparkV2

Star

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

spark apache-spark mllib structured-streaming spark-sql spark-mllib mlflow delta-lake

Updated Apr 15, 2021
Scala

qubole / sparklens

Star

Qubole Sparklens tool for performance tuning Apache Spark

performance scala spark simulation cluster scheduler scheduling performance-metrics performance-tuning performance-visualization performance-analysis sparkjava spark-job spark-applications spark-sql spark-mllib spark-ml

Updated Jul 7, 2021
Scala

jaceklaskowski / mastering-spark-sql-book

Sponsor Star

The Internals of Spark SQL

spark apache-spark book mkdocs internals spark-sql mkdocs-material

Updated Aug 11, 2021

microsoft / data-accelerator

Star

Open

Web: Enable uploading jar files and csv from the web portal

carlbrochu commented Apr 18, 2019

Is your feature request related to a problem? Please describe.
Today the user needs to deploy udf jars and reference data csvs manually to the blob location

Describe the solution you'd like
Enable the user to choose a file on a local disk which the web portal will then upload to the right location

Read more

enhancement help wanted good first issue

Open

Web: Handle saving automatically and avoid navigating away with changes

Open

Services: Improve comment support

Find more good first issues →

jaceklaskowski / spark-workshop

Sponsor Star

Apache Spark™ and Scala Workshops

workshop spark apache-spark spark-sql spark-mllib spark-structured-streaming spark-workshops

Updated Sep 18, 2020
HTML

cuebook / cuelake

Star

Use SQL to build ELT pipelines on a data lakehouse.

sql apache-spark etl pipelines data-engineering data-lake data-transfer delta data-integration upsert elt data-pipeline datalake data-ingestion spark-sql zeppelin-notebook apache-iceberg lakehouse incremental-updates

Updated Aug 10, 2021
JavaScript

Chabane / bigdata-playground

Star

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Updated Feb 1, 2019
TypeScript

polomarcus / Spark-Structured-Streaming-Examples

Star

Spark Structured Streaming / Kafka / Cassandra / Elastic

kafka spark cassandra structured-streaming spark-sql

Updated Oct 1, 2018
Scala

microsoft / MCW-Big-data-and-visualization

Star

MCW Big data and visualization

machine-learning power-bi spark-sql hdinsight azure-data-factory database-administrator

Updated Jul 27, 2021
JavaScript

mc2-project / opaque-sql

Star

An encrypted data analytics platform

security machine-learning privacy spark analytics enclave spark-sql

Updated Aug 12, 2021
Scala

kevinschaich / pyspark-cheatsheet

Sponsor Star

🐍 Quick reference guide to common patterns & functions in PySpark.

documentation data-science data docs spark reference guide pyspark cheatsheet cheat quickstart references guides cheatsheets spark-sql pyspark-tutorial

Updated May 27, 2021

bdp

bluishglc / bdp

Star

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

redis demo kafka spark prototype bigdata spark-streaming quickstart sparksql oozie sqoop spark-sql spark-streaming-examples sqoop-import spark-demo middle-end middle-office spark-examples

Updated Aug 12, 2020
Java

wangj1106 / recommendMoteur

Star

电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎

movies kafka spark spark-streaming recommendation-engine recommender-system flume als recommendation spark-sql

Updated Jun 25, 2018
Scala

minio / spark-select

Star

A library for Spark DataFrame using MinIO Select API

select spark sbt bigdata pyspark minio parquet-files spark-sql amazon-s3

Updated Sep 27, 2019
Scala

streamnative / pulsar-spark

Star

When Apache Pulsar meets Apache Spark

data-science spark apache-spark stream-processing flink data-processing batch-processing structured-streaming spark-sql apache-pulsar

Updated Aug 4, 2021
Scala

xiaogp / recsys_spark

Star

Spark SQL 实现 ItemCF，UserCF，Swing，推荐系统，推荐算法，协同过滤

collaborative-filtering recommender-system spark-sql

Updated Dec 19, 2019
Scala

streamnative / awesome-pulsar

Star

A curated list of Pulsar tools, integrations and resources.

spark apache-spark messaging prometheus apache-storm apache-flink apache-kafka pub-sub grafana-dashboard spark-sql elastic-beats spark-structured-streaming apache-bookkeeper apache-pulsar

Updated Dec 29, 2020

DTStack / dt-sql-parser

Star

Open

补充 Bundle 时的引入注意事项

wewoor commented Jul 9, 2021

默认的从 dtsql-parser 引入的方式，会导致 bundle 文件过大

Read more

good first issue

Open

添加在线预览

huangyueranbbc / SparkDemo

Star

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

spark hadoop bigdata spark-streaming operator sparkline sparkjava spark-sql sparkfun-products sparkp

Updated May 9, 2020
Java

spider-123-eng / Spark

Star

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

streaming consumer parquet kafka-producer spark-sql spark-kafka-integration spark-streaming-data spark-transformations spark-to-cassandra-connection spark-dataframes spark-joins spark-hive-context spark-jdbc-connection spark-with-mangodb spark-aggregations-using-dataframe spark-use-cases cassandra-installation spark-datadog spark-mangodb spark-catalog-api

Updated Aug 9, 2021
Scala

hablapps / sparkOptics

Star

Optics for Spark DataFrames

scala spark optics dataframe dataframes spark-sql

Updated Mar 5, 2021
Scala

harryprince / geospark

Star

bring sf to spark in production

r apache-spark gis spatial-analysis spark-sql spatial-queries sparklyr-extension large-scale-spatial-analysis

Updated Apr 19, 2021
R

kaantas / spark-twitter-sentiment-analysis

Star

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

python twitter kafka spark apache-spark sentiment-analysis twitter-api pyspark apache-kafka afinn twitter-sentiment-analysis spark-sql spark-structured-streaming pykafka twitter-topic

Updated Dec 12, 2018
Python

sjyttkl / spark_learning

Star

尚硅谷大数据Spark-2019版最新 Spark 学习

spark spark-sql spark-core

Updated Aug 11, 2020
Scala

airbnb / airbnb-spark-thrift

Star

A library for loadling Thrift data into Spark SQL

spark thrift spark-streaming spark-sql

Updated Sep 7, 2018
Scala

Improve this page

Add a description, image, and links to the spark-sql topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-sql topic, visit your repo's landing page and select "manage topics."