2,884 questions
-1
votes
0
answers
14
views
Longitudinal Analysis with High Variability in Time Entries per Subject
I am working with retrospective data from a symptom tracking app and I aim to identify different symptom trajectory classes within this data. After reviewing relevant literature, I have found that ...
0
votes
0
answers
12
views
pooling phase of multiple imputation
Regarding the pooling phase of multiple imputation, the standard approach is to apply Rubins rules for combining parameter estimates. For example, lets say we have a dataset with missing values for ...
0
votes
2
answers
47
views
Missing values in olive oil dataset
I have a dataset of olive oil samples and the goal of creating a classification model for oil quality. I'm having trouble deciding how to deal with missing data. have a look at the data here if you ...
-1
votes
3
answers
140
views
Drop rows with missing values in all columns [duplicate]
It looks like tidyr's drop_na will drop rows if any of the specified columns contain missing values.
Example:
> library(tidyverse)
> df <- data.frame(a=c(1,NA,2,NA), b=c(3,4,NA,NA))
> df
...
-1
votes
0
answers
29
views
Best practices for handling missing data in pandas to maintain model accuracy [closed]
I’m working with a dataset in pandas that contains several columns with missing values (NaN). I’m trying to decide on the best strategy to handle this missing data before feeding it into a machine ...
0
votes
0
answers
70
views
Why does R evaluate `NA==T|F` as NA, but `NA==F|T` as True? (and related Qs) [duplicate]
If you run the following lines of code in R, you may be surprised by the results (printed above each line as a comment)
#1: NA
NA==T
#2: NA
NA==F
#3: NA
NA==T&T
#4: FALSE
NA==F&F
#5: NA
NA==F&...
1
vote
0
answers
32
views
How can I add zero for empty or missing rows?
I have been trying to resolve this for two days and feel the need for help. I've created a cumulative graph, only it's showing as cumulative! That is because there aren't necessarily rows of data ...
-4
votes
0
answers
46
views
In data analysis [closed]
How do i handle missing data in a large dataset using Python?
I’m working on a large dataset using Python and I noticed that some columns have missing values (NaN). I tried using df.dropna() and df....
1
vote
3
answers
139
views
Python - How to check for missing values not represented by NaN? [duplicate]
I am looking for guidance on how to check for missing values in a DataFrame that are not the typical "NaN" or "np.nan" in Python. I have a dataset/DataFrame that has a string ...
1
vote
1
answer
40
views
Why does RandomForestClassifier in scikit-learn predict even on all-NaN input?
I am training a random forest classifier in python sklearn, see code below-
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X = df.drop("...
0
votes
0
answers
42
views
Why does ydata-profiling not detect missing values in PySpark DataFrame when using None?
I'm using ydata-profiling to generate profiling reports from a large PySpark DataFrame without converting it to Pandas (to avoid memory issues on large datasets).
Some columns contain the string "...
0
votes
0
answers
49
views
R mice leaves missing values when I use a where-matrix
I have a large data frame with a lot of variables measured at three time points t1, t2 and t3. I only want to impute those missings where the according time point was answered at all, that is where ...
1
vote
1
answer
47
views
Creating Artificial Gaps in R Dataset [duplicate]
I am processing data using Random Forest, and I am trying to create random artificial gaps in my dataset so that I can test how accurate the random forest predictions are.
TIMESTAMP <- c(2001:2020)
...
0
votes
0
answers
46
views
How do I get my data to not dissapear when I click another fragment ? android studio
I am trying to make an app where it controls the aspects of a garden. changing the temperature, the humidity, wind, and etc. My new issue is that my data keeps dissapearing after I click another ...
0
votes
0
answers
13
views
Pandas - How to backfill a main dataframe with values from another while prioritizing the main dataframe [duplicate]
SET UP MY PROBLEM
I have two pandas dataframes. First, I have main:
import pandas as pd
import numpy as np
main = pd.DataFrame({"foo":{"a":1.0,"b":2.0,"c":3.0,&...