exploratory-data-analysis

As a user,

It would be nice to have the "Observed Value" Field be standardized to show percentages of "successful" validations, vs a mix of 0% / 100%. This causes confusion as there are different levels of validation outputs with different verbage (making someone not used to the expectations confused) I've given an example below in a screenshot for what I mean:

![image](https://user-images.g

Hi there,

I think there might be a mistake in the documentation. The Understanding Scaled F-Score section says

The F-Score of these two values is defined as:

$$ \mathcal{F}_\beta(\mbox{prec}, \mbox{freq}) = (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}}. $$

$\beta \in \mathcal{R}^+$ is a scaling factor where frequency is favored if $\beta

Hello,

First of all, thanks for the great package.
I'm trying to compute density maps of a 3 dimensional points distribution. I understood from the documentation that a variable bandwith method was available but I couldn't figure out how to set up this option.
Additionnaly, in the case of a fixed bandwidth KDE for multidimensional data, I would have expected as in the stats_models_multivari

To improve spotting differences between datasets visually
(especially when there are many columns) it would be helpful if one could sort the categorical columns by the Jensen–Shannon divergence.

The code below tries to do so but it seems to distort the labels on the y-axis. Also, in case the jsd column contains missing values, those variables are deleted from the graph.

library(in

Adding a description for the parameters will help the users understand how to specify values for each parameter. For example, the format of the longitude in Yelp.businesses table; the maximum limit of the results that a user can expect (if we incorporate limit parameter in the future).

Great and
very clear stepXstep package tutorial, Matt!.

A time-saving suggestion (if I may):
in Step:
"Examining the Results" (after Step 3),
where you have:

marketing_campaign_correlated_tbl %>%
filter(feature %in% c("DURATION", "POUTCOME", "PDAYS",
"PREVIOUS", "CONTACT", "HOUSING")) %>%
plot_correlation_funnel(interactive = FALSE, limits = c(-0.4, 0.4))

Why not "automatica

exploratory-data-analysis

Here are 941 public repositories matching this topic...

pandas-profiling / pandas-profiling

great-expectations / great_expectations

JasonKessler / scattertext

jadianes / data-science-your-way

rasbt / musicmood

ropensci / visdat

neerjad / DataVisualization

mirador / mirador

tommyod / KDEpy

alastairrushworth / inspectdf

harunshimanto / 100-Days-Of-ML-Code

mstaniak / autoEDA-resources

dvgodoy / handyspark

ank0409 / Ditching-Excel-for-Python

jadianes / spark-r-notebooks

ujjwalkarn / xda

ahmedbesbes / How-to-score-0.8134-in-Titanic-Kaggle-Challenge

lozuwa / impy

dgwozdz / HN_SO_analysis

jadianes / data-journalism

sfu-db / dataprep

joachim-gassen / ExPanDaR

Jean-njoroge / Breast-cancer-risk-prediction

zmjones / edarf

ajaymache / data-analysis-using-python

kianweelee / Edator

business-science / correlationfunnel

ben519 / mltools

pyaf / DenseNet-MURA-PyTorch

wpinvestigative / kushner_eb5_census

Improve this page

Add this topic to your repo