0

I'm trying to expand my data set using R. I have recorded the observations for each sample and calculated percentages based on those observations. I need now to expand each sample to list each possible observation without doing any calculations. Example of myData: Starting data set:

Sample    Observation    Percent
A         Y              50
A         N              50
B         Y              10
B         N              80
B         Don't know     10 

Desired data set:

Sample    Observation    Percent
A         Y              50
A         N              50
A         Don't know     NA
B         Y              10
B         N              80
B         Don't know     10 

So in this case, I would need to expand all of sample A to include the "Don't know" category and fill that in with "NA".

I have tried

myTable <- table(myData)
TableFrame2 <- data.frame(myTable)

Which expands the data set but messes up the Percentage column (?why). I thought I could merge the percentages back, but I need to match that column to the expanded set by both the sample and Observation columns to get an exact match. Any suggestions?

1 Answer 1

1

One way is to merge/join the combinations back into the data. (I altered the data slightly to make it easy to copy/paste here in SO.)

dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Sample    Observation    Percent
A         Y              50
A         N              50
B         Y              10
B         N              80
B         Don_t_know     10 ')

Base R

merge(
  dat,
  expand.grid(Sample = unique(dat$Sample),
              Observation = unique(dat$Observation),
              stringsAsFactors = FALSE),
  by = c("Sample", "Observation"),
  all = TRUE
)
#   Sample Observation Percent
# 1      A  Don_t_know      NA
# 2      A           N      50
# 3      A           Y      50
# 4      B  Don_t_know      10
# 5      B           N      80
# 6      B           Y      10

Tidyverse:

library(dplyr)
library(tidyr)

dat %>%
  full_join(
    crossing(Sample = unique(dat$Sample), Observation = unique(dat$Observation)),
    by = c("Sample", "Observation")
  )
#   Sample Observation Percent
# 1      A           Y      50
# 2      A           N      50
# 3      B           Y      10
# 4      B           N      80
# 5      B  Don_t_know      10
# 6      A  Don_t_know      NA

or even

dat %>%
  full_join(expand(., Sample, Observation))
# Joining, by = c("Sample", "Observation")
#   Sample Observation Percent
# 1      A           Y      50
# 2      A           N      50
# 3      B           Y      10
# 4      B           N      80
# 5      B  Don_t_know      10
# 6      A  Don_t_know      NA

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.