-
Updated
Aug 13, 2021 - Python
#
spark-sql
Here are 443 public repositories matching this topic...
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
visualization
javascript
mysql
python
bigquery
bi
spark
dashboard
athena
analytics
postgresql
business-intelligence
redash
redshift
databricks
hacktoberfest
spark-sql
A Scala kernel for Jupyter
-
Updated
Aug 14, 2021 - Scala
Apache Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
kubernetes
sql
spark
hive
hadoop
jdbc
thrift
data-lake
spark-sql
kyuubi-server
thrift-jdbc
odbc-server
-
Updated
Aug 14, 2021 - Scala
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
Updated
Apr 15, 2021 - Scala
Qubole Sparklens tool for performance tuning Apache Spark
performance
scala
spark
simulation
cluster
scheduler
scheduling
performance-metrics
performance-tuning
performance-visualization
performance-analysis
sparkjava
spark-job
spark-applications
spark-sql
spark-mllib
spark-ml
-
Updated
Jul 7, 2021 - Scala
The Internals of Spark SQL
-
Updated
Aug 11, 2021
carlbrochu
commented
Apr 18, 2019
Is your feature request related to a problem? Please describe.
Today the user needs to deploy udf jars and reference data csvs manually to the blob location
Describe the solution you'd like
Enable the user to choose a file on a local disk which the web portal will then upload to the right location
Apache Spark™ and Scala Workshops
-
Updated
Sep 18, 2020 - HTML
Use SQL to build ELT pipelines on a data lakehouse.
sql
apache-spark
etl
pipelines
data-engineering
data-lake
data-transfer
delta
data-integration
upsert
elt
data-pipeline
datalake
data-ingestion
spark-sql
zeppelin-notebook
apache-iceberg
lakehouse
incremental-updates
-
Updated
Aug 10, 2021 - JavaScript
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
nodejs
python
graphql
docker
machine-learning
angular
scala
kafka
big-data
apache-spark
mongodb
hadoop
avro
twitter-api
hbase
spark-streaming
parquet
apache-flink
kops
spark-sql
-
Updated
Feb 1, 2019 - TypeScript
Spark Structured Streaming / Kafka / Cassandra / Elastic
-
Updated
Oct 1, 2018 - Scala
MCW Big data and visualization
-
Updated
Jul 27, 2021 - JavaScript
documentation
data-science
data
docs
spark
reference
guide
pyspark
cheatsheet
cheat
quickstart
references
guides
cheatsheets
spark-sql
pyspark-tutorial
-
Updated
May 27, 2021
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
redis
demo
kafka
spark
prototype
bigdata
spark-streaming
quickstart
sparksql
oozie
sqoop
spark-sql
spark-streaming-examples
sqoop-import
spark-demo
middle-end
middle-office
spark-examples
-
Updated
Aug 12, 2020 - Java
电影推荐系统、电影推荐引擎、使用Spark完成的电影推荐引擎
movies
kafka
spark
spark-streaming
recommendation-engine
recommender-system
flume
als
recommendation
spark-sql
-
Updated
Jun 25, 2018 - Scala
When Apache Pulsar meets Apache Spark
data-science
spark
apache-spark
stream-processing
flink
data-processing
batch-processing
structured-streaming
spark-sql
apache-pulsar
-
Updated
Aug 4, 2021 - Scala
Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤
-
Updated
Dec 19, 2019 - Scala
A curated list of Pulsar tools, integrations and resources.
spark
apache-spark
messaging
prometheus
apache-storm
apache-flink
apache-kafka
pub-sub
grafana-dashboard
spark-sql
elastic-beats
spark-structured-streaming
apache-bookkeeper
apache-pulsar
-
Updated
Dec 29, 2020
Open
补充 Bundle 时的引入注意事项
wewoor
commented
Jul 9, 2021
默认的 从 dtsql-parser 引入的方式,会导致 bundle 文件过大
Open
添加在线预览
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
spark
hadoop
bigdata
spark-streaming
operator
sparkline
sparkjava
spark-sql
sparkfun-products
sparkp
-
Updated
May 9, 2020 - Java
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
streaming
consumer
parquet
kafka-producer
spark-sql
spark-kafka-integration
spark-streaming-data
spark-transformations
spark-to-cassandra-connection
spark-dataframes
spark-joins
spark-hive-context
spark-jdbc-connection
spark-with-mangodb
spark-aggregations-using-dataframe
spark-use-cases
cassandra-installation
spark-datadog
spark-mangodb
spark-catalog-api
-
Updated
Aug 9, 2021 - Scala
bring sf to spark in production
r
apache-spark
gis
spatial-analysis
spark-sql
spatial-queries
sparklyr-extension
large-scale-spatial-analysis
-
Updated
Apr 19, 2021 - R
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
python
twitter
kafka
spark
apache-spark
sentiment-analysis
twitter-api
pyspark
apache-kafka
afinn
twitter-sentiment-analysis
spark-sql
spark-structured-streaming
pykafka
twitter-topic
-
Updated
Dec 12, 2018 - Python
A library for loadling Thrift data into Spark SQL
-
Updated
Sep 7, 2018 - Scala
Improve this page
Add a description, image, and links to the spark-sql topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the spark-sql topic, visit your repo's landing page and select "manage topics."
This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features
Bucketizer has been implemented in dotnet/spark#378 but there are more features that should be implemented.