Skip to content
#

exploratory-data-analysis

Here are 941 public repositories matching this topic...

PSUlion16
PSUlion16 commented Apr 9, 2020

As a user,

It would be nice to have the "Observed Value" Field be standardized to show percentages of "successful" validations, vs a mix of 0% / 100%. This causes confusion as there are different levels of validation outputs with different verbage (making someone not used to the expectations confused) I've given an example below in a screenshot for what I mean:

![image](https://user-images.g

Luke-in-the-sky
Luke-in-the-sky commented Jun 3, 2018

Hi there,

I think there might be a mistake in the documentation. The Understanding Scaled F-Score section says

The F-Score of these two values is defined as:

$$ \mathcal{F}_\beta(\mbox{prec}, \mbox{freq}) = (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}}. $$

$\beta \in \mathcal{R}^+$ is a scaling factor where frequency is favored if $\beta

ytarricq
ytarricq commented Jul 19, 2019

Hello,

First of all, thanks for the great package.
I'm trying to compute density maps of a 3 dimensional points distribution. I understood from the documentation that a variable bandwith method was available but I couldn't figure out how to set up this option.
Additionnaly, in the case of a fixed bandwidth KDE for multidimensional data, I would have expected as in the stats_models_multivari

inspectdf
RoelVerbelen
RoelVerbelen commented Apr 1, 2020

To improve spotting differences between datasets visually
(especially when there are many columns) it would be helpful if one could sort the categorical columns by the Jensen–Shannon divergence.

The code below tries to do so but it seems to distort the labels on the y-axis. Also, in case the jsd column contains missing values, those variables are deleted from the graph.

library(in
100-Days-Of-ML-Code
sfd99
sfd99 commented Aug 9, 2019

Great and
very clear stepXstep package tutorial, Matt!.

A time-saving suggestion (if I may):
in Step:
"Examining the Results" (after Step 3),
where you have:

marketing_campaign_correlated_tbl %>%
filter(feature %in% c("DURATION", "POUTCOME", "PDAYS",
"PREVIOUS", "CONTACT", "HOUSING")) %>%
plot_correlation_funnel(interactive = FALSE, limits = c(-0.4, 0.4))

Why not "automatica

Improve this page

Add a description, image, and links to the exploratory-data-analysis topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the exploratory-data-analysis topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.