0

I am having trouble developing a function for the purpose of zeroing out the "have" column (which is days in-between collection) for 21 days AFTER a positive result, by identifier. At this point I am not concerned with zeroing out the POSITVO rows, just the collection dates within 21 days after a positivo. I have added a "WANT" column, but anything close would be great! Data below and what I have been trying with chatgtp below:

Thanks for your help!

 dput(df)
structure(list(result = c("Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Positivo", "Positivo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Negativo", "Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Negativo", "Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Positivo", "Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Negativo", "Negativo"), identifier = c("a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b"), date_collection = structure(c(1631059200, 
1631664000, 1632355200, 1632787200, 1634601600, 1635292800, 1635811200, 
1636416000, 1637107200, 1637712000, 1638144000, 1638748800, 1639440000, 
1640044800, 1640736000, 1640044800, 1640649600, 1641254400, 1641859200, 
1643241600, 1645142400, 1645660800, 1646352000, 1646784000, 1647388800, 
1648080000, 1648512000, 1649203200, 1649721600, 1650326400, 1650931200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), have = c(0, 
7, 8, 5, 21, 8, 6, 7, 8, 7, 5, 7, 8, 7, 8, 0, 7, 7, 7, 16, 22, 
6, 8, 5, 7, 8, 5, 8, 6, 7, 7), want = c(0, 7, 8, 5, 21, 8, 6, 
0, 0, 1, 5, 7, 8, 7, 8, 0, 7, 7, 7, 16, 22, 6, 8, 5, 0, 0, 0, 
8, 6, 7, 7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-31L))

I tried this code below, but it did not run correctly both not finding the correct variable names and also zeroing out to many obsevations.

set_days_at_risk <- function(df) {
  for (person_id in unique(df$person)) {
    person_df <- df %>% filter(person == person_id)
    for (i in 1:(nrow(person_df) - 1)) {
      if (person_df$result[i] == "positive") {
        end_date <- person_df$date_of_collection[i] + days(21)
        person_df$days_at_risk[person_df$date_of_collection > person_df$date_of_collection[i] & person_df$date_of_collection <= end_date] <- 0
      }
    }
    df[df$person == person_id, "days_at_risk"] <- person_df$days_at_risk
  }
  return(df)
}
2
  • 2
    Instead of a ChatGPT solution you will get better answers if you post expected output. (And the ChatGPT code is not complete, library calls to load the needed packages is missing and an error is thrown.) Commented May 7, 2024 at 19:14
  • Also, is your posted ChatGPT code what ChatGPT output from the problem description? I have just tried twice and the solutions are dplyr's (old) way, group_by/mutate/ungroup, no for loops. Commented May 7, 2024 at 19:21

2 Answers 2

0

I believe there is an error in your 'want' column as df[7,5] should be 0? Also, this repex assumes "within 21 days" == "< 21 days". If this is not the case, comment and I will update this answer.

A stepwise approach:

  1. Create tmp column to define pre and post positive result by identifier
  2. Create tmp1, equivalent to your 'have' column
  3. Group data by identifier and tmp columns and calculate number of days from first date_collection value in each group
  4. Use tmp, tmp1, and days columns to zero values
  5. Remove unneeded columns
library(dplyr)

df1 <- df |>
  group_by(identifier) |>
  mutate(tmp = if_else(result == "Negativo", 0, 1),
         tmp = cumsum(tmp != lag(tmp, default = 1)),
         tmp = if_else(tmp > 1, 2, 1),
         tmp1 = as.integer(difftime(date_collection,
                                    lag(date_collection),
                                    units = "days")),
         tmp1 = if_else(is.na(tmp1), 0, tmp1)) |>
  group_by(identifier, tmp) |>
  mutate(days = as.integer(difftime(date_collection,
                           first(date_collection),
                           units = "days")),
         want = case_when(tmp == 1 ~ tmp1,
                           tmp == 2 & days != 0 & days < 21 ~ 0,
                           .default = tmp1)) |>
  ungroup() |>
  select(result, identifier, date_collection, want)

data.frame(df1)
#      result identifier date_collection want
# 1  Negativo          a      2021-09-08    0
# 2  Negativo          a      2021-09-15    7
# 3  Negativo          a      2021-09-23    8
# 4  Negativo          a      2021-09-28    5
# 5  Negativo          a      2021-10-19   21
# 6  Positivo          a      2021-10-27    8
# 7  Positivo          a      2021-11-02    0
# 8  Negativo          a      2021-11-09    0
# 9  Negativo          a      2021-11-17    8
# 10 Negativo          a      2021-11-24    7
# 11 Negativo          a      2021-11-29    5
# 12 Negativo          a      2021-12-06    7
# 13 Negativo          a      2021-12-14    8
# 14 Negativo          a      2021-12-21    7
# 15 Negativo          a      2021-12-29    8
# 16 Negativo          b      2021-12-21    0
# 17 Negativo          b      2021-12-28    7
# 18 Negativo          b      2022-01-04    7
# 19 Negativo          b      2022-01-11    7
# 20 Negativo          b      2022-01-27   16
# 21 Negativo          b      2022-02-18   22
# 22 Negativo          b      2022-02-24    6
# 23 Negativo          b      2022-03-04    8
# 24 Positivo          b      2022-03-09    5
# 25 Negativo          b      2022-03-16    0
# 26 Negativo          b      2022-03-24    0
# 27 Negativo          b      2022-03-29    0
# 28 Negativo          b      2022-04-06    8
# 29 Negativo          b      2022-04-12    6
# 30 Negativo          b      2022-04-19    7
# 31 Negativo          b      2022-04-26    7
0

First finding a new group if result is Positivo and then adding days for indicating the 21 days restriction.

Diff is then the new column if a positive is found, days are above 21 and results are negative again.

Note that I don't know how to get 1 at row 10 though.

library(dplyr)

df %>% 
  mutate(Diff = c(0, diff(date_collection)), 
         grp = cumsum(lag(result, default = "Negativo") == "Negativo" & 
                      result == "Positivo"), .by = identifier) %>% 
  mutate(days = cumsum(if_else(result != "Positivo", Diff, 0)), 
         Diff = if_else(result != "Positivo" & 
                        days <= 21 & 
                        result[1] == "Positivo", 
                          0, Diff), .by = c(identifier, grp)) %>% 
  select(-c(grp, days))

output

# A tibble: 31 × 6
   result   identifier date_collection      have  want  Diff
   <chr>    <chr>      <dttm>              <dbl> <dbl> <dbl>
 1 Negativo a          2021-09-08 00:00:00     0     0     0
 2 Negativo a          2021-09-15 00:00:00     7     7     7
 3 Negativo a          2021-09-23 00:00:00     8     8     8
 4 Negativo a          2021-09-28 00:00:00     5     5     5
 5 Negativo a          2021-10-19 00:00:00    21    21    21
 6 Positivo a          2021-10-27 00:00:00     8     8     8
 7 Positivo a          2021-11-02 00:00:00     6     6     6
 8 Negativo a          2021-11-09 00:00:00     7     0     0
 9 Negativo a          2021-11-17 00:00:00     8     0     0
10 Negativo a          2021-11-24 00:00:00     7     1     7
11 Negativo a          2021-11-29 00:00:00     5     5     5
12 Negativo a          2021-12-06 00:00:00     7     7     7
13 Negativo a          2021-12-14 00:00:00     8     8     8
14 Negativo a          2021-12-21 00:00:00     7     7     7
15 Negativo a          2021-12-29 00:00:00     8     8     8
16 Negativo b          2021-12-21 00:00:00     0     0     0
17 Negativo b          2021-12-28 00:00:00     7     7     7
18 Negativo b          2022-01-04 00:00:00     7     7     7
19 Negativo b          2022-01-11 00:00:00     7     7     7
20 Negativo b          2022-01-27 00:00:00    16    16    16
21 Negativo b          2022-02-18 00:00:00    22    22    22
22 Negativo b          2022-02-24 00:00:00     6     6     6
23 Negativo b          2022-03-04 00:00:00     8     8     8
24 Positivo b          2022-03-09 00:00:00     5     5     5
25 Negativo b          2022-03-16 00:00:00     7     0     0
26 Negativo b          2022-03-24 00:00:00     8     0     0
27 Negativo b          2022-03-29 00:00:00     5     0     0
28 Negativo b          2022-04-06 00:00:00     8     8     8
29 Negativo b          2022-04-12 00:00:00     6     6     6
30 Negativo b          2022-04-19 00:00:00     7     7     7
31 Negativo b          2022-04-26 00:00:00     7     7     7
2
  • getting this error: Error in mutate(): ! Can't supply .by when .data is a grouped data frame.
    – Levi M
    Commented May 8, 2024 at 13:05
  • Then your data might already be grouped. Try ungrouping by doing df %>% ungroup() %>% mutate(... first. Commented May 8, 2024 at 14:45

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.