Skip to content
#

big-data

Here are 2,533 public repositories matching this topic...

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated May 13, 2021
  • Python
kloczek
kloczek commented Jun 9, 2021

After add patch which fixes #4209 I found that sphinx emits some warnings.

+ /usr/bin/python3 setup.py build_sphinx -b man --build-dir build/sphinx
Unable to find pgen, not compiling formal grammar.
running build_sphinx
Running Sphinx v4.0.2
making output directory... done
loading intersphinx inventory from https://docs.python.org/3/objects.inv...
building [mo]: targets for 0 po
pseudotensor
pseudotensor commented Jan 12, 2021

Problem: the approximate method can still be slow for many trees
catboost version: master
Operating System: ubuntu 18.04
CPU: i9
GPU: RTX2080

Would be good to be able to specify how many trees to use for shapley. The model.predict and prediction_type versions allow this. lgbm/xgb allow this.

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

  • Updated Jul 17, 2021
  • Jupyter Notebook
vespa
kkraune
kkraune commented Apr 2, 2021

... to make it easier to read Vespa documentation on an e-reader / offline

Vespa documentation is generated using Jekyll from .md and .html files, look into options for generating the artifact as part of site generation (there might be plugins we can use here)

jaceklaskowski
jaceklaskowski commented Jun 15, 2021
  • Delta Lake 1.0.0
  • Spark 3.1.2
  • Scala 2.12
  • AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)

The following code gives a NullPointerException. This is for a directory-based delta table that does not exist and uses a generated column.

import io.delta.tables.DeltaTable
DeltaTable.create
  .addColumn(
    DeltaTable.columnBuilder("value")
      .generatedAlwaysAs("true")
      .nullab
seut
seut commented Jun 22, 2021

Use case:

1.) A user may want to backup all tables but no metadata like users, privileges, etc. without explicitly defining each table inside the CREATE SNAPSHOT statement.

2.) A user may want to transfer users & privileges, custom analyzers or user-defined-functions from one cluster to another without backing up a complete cluster including all data (tables).

*Feature description

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more