#
Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 7,039 public repositories matching this topic...
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
python
aws
data-science
machine-learning
caffe
theano
big-data
spark
deep-learning
hadoop
tensorflow
numpy
scikit-learn
keras
pandas
kaggle
scipy
matplotlib
mapreduce
-
Updated
Oct 12, 2022 - Python
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
visualization
javascript
mysql
python
bigquery
bi
spark
dashboard
athena
analytics
postgresql
business-intelligence
redash
redshift
databricks
hacktoberfest
spark-sql
-
Updated
Oct 26, 2022 - Python
Learn and understand Docker&Container technologies, with real DevOps practice!
-
Updated
Oct 22, 2022 - Go
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
mysql
python
linux
docker
redis
elasticsearch
spark
spring
hadoop
rabbitmq
solr
jvm
netty
springboot
mybatis
springcloud
-
Updated
May 18, 2022
mysql
rust
bigquery
chart
sql
spark
presto
hive
microservice
serverless
athena
analytics
postgresql
cube
-
Updated
Oct 29, 2022 - Rust
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
mysql
redis
elasticsearch
streaming
kafka
spark
influxdb
rabbitmq
clickhouse
hbase
stream-processing
opentsdb
loki
flink
rocketmq
-
Updated
Sep 22, 2022 - Java
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
machine-learning
spark
deep-learning
uber
mxnet
tensorflow
mpi
keras
pytorch
machinelearning
baidu
deeplearning
ray
-
Updated
Oct 27, 2022 - Python
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
python
java
clojure
scala
spark
hadoop
gpu
intellij
linear-algebra
artificial-intelligence
deeplearning
neural-nets
dl4j
matrix-library
deeplearning4j
-
Updated
Oct 29, 2022 - Java
List of Data Science Cheatsheets to rule the world
-
Updated
Jun 9, 2022
A Flexible and Powerful Parameter Server for large-scale machine learning
machine-learning
scala
spark
model
spark-streaming
online-learning
parameter-server
high-dimensional
-
Updated
Sep 12, 2022 - Java
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
python
java
data-science
machine-learning
opensource
r
big-data
spark
deep-learning
hadoop
random-forest
gpu
naive-bayes
h2o
distributed
pca
gbm
ensemble-learning
automl
h2o-automl
-
Updated
Oct 29, 2022 - Jupyter Notebook
Alluxio, data orchestration for analytics and machine learning in the cloud
spark
presto
hadoop
tensorflow
data-analysis
alluxio
memory-speed
data-orchestration
virtual-distributed-filesystem
-
Updated
Oct 29, 2022 - Java
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
nodejs
mysql
python
git
vim
macos
linux
bash
redis
cli
mac
aws
elasticsearch
cloud
spark
mongodb
iterm2
sublime-text
postgresql
android-development
-
Updated
Oct 10, 2022 - Python
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Oct 29, 2022 - Scala
Created by Matei Zaharia
Released May 26, 2014
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia