Function for zeroing time at risk after positive

Question

I am having trouble developing a function for the purpose of zeroing out the "have" column (which is days in-between collection) for 21 days AFTER a positive result, by identifier. At this point I am not concerned with zeroing out the POSITVO rows, just the collection dates within 21 days after a positivo. I have added a "WANT" column, but anything close would be great! Data below and what I have been trying with chatgtp below:

Thanks for your help!

 dput(df)
structure(list(result = c("Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Positivo", "Positivo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Negativo", "Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Negativo", "Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Positivo", "Negativo", "Negativo", "Negativo", 
"Negativo", "Negativo", "Negativo", "Negativo"), identifier = c("a", 
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
"b", "b", "b", "b"), date_collection = structure(c(1631059200, 
1631664000, 1632355200, 1632787200, 1634601600, 1635292800, 1635811200, 
1636416000, 1637107200, 1637712000, 1638144000, 1638748800, 1639440000, 
1640044800, 1640736000, 1640044800, 1640649600, 1641254400, 1641859200, 
1643241600, 1645142400, 1645660800, 1646352000, 1646784000, 1647388800, 
1648080000, 1648512000, 1649203200, 1649721600, 1650326400, 1650931200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), have = c(0, 
7, 8, 5, 21, 8, 6, 7, 8, 7, 5, 7, 8, 7, 8, 0, 7, 7, 7, 16, 22, 
6, 8, 5, 7, 8, 5, 8, 6, 7, 7), want = c(0, 7, 8, 5, 21, 8, 6, 
0, 0, 1, 5, 7, 8, 7, 8, 0, 7, 7, 7, 16, 22, 6, 8, 5, 0, 0, 0, 
8, 6, 7, 7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-31L))

I tried this code below, but it did not run correctly both not finding the correct variable names and also zeroing out to many obsevations.

set_days_at_risk <- function(df) {
  for (person_id in unique(df$person)) {
    person_df <- df %>% filter(person == person_id)
    for (i in 1:(nrow(person_df) - 1)) {
      if (person_df$result[i] == "positive") {
        end_date <- person_df$date_of_collection[i] + days(21)
        person_df$days_at_risk[person_df$date_of_collection > person_df$date_of_collection[i] & person_df$date_of_collection <= end_date] <- 0
      }
    }
    df[df$person == person_id, "days_at_risk"] <- person_df$days_at_risk
  }
  return(df)
}

Instead of a ChatGPT solution you will get better answers if you post expected output. (And the ChatGPT code is not complete, library calls to load the needed packages is missing and an error is thrown.) — Rui Barradas, Commented May 7, 2024 at 19:14
Also, is your posted ChatGPT code what ChatGPT output from the problem description? I have just tried twice and the solutions are dplyr's (old) way, group_by/mutate/ungroup, no for loops. — Rui Barradas, Commented May 7, 2024 at 19:21

L Tyrone · Accepted Answer · 2024-05-07 21:58:01Z

I believe there is an error in your 'want' column as df[7,5] should be 0? Also, this repex assumes "within 21 days" == "< 21 days". If this is not the case, comment and I will update this answer.

A stepwise approach:

Create tmp column to define pre and post positive result by identifier
Create tmp1, equivalent to your 'have' column
Group data by identifier and tmp columns and calculate number of days from first date_collection value in each group
Use tmp, tmp1, and days columns to zero values
Remove unneeded columns

library(dplyr)

df1 <- df |>
  group_by(identifier) |>
  mutate(tmp = if_else(result == "Negativo", 0, 1),
         tmp = cumsum(tmp != lag(tmp, default = 1)),
         tmp = if_else(tmp > 1, 2, 1),
         tmp1 = as.integer(difftime(date_collection,
                                    lag(date_collection),
                                    units = "days")),
         tmp1 = if_else(is.na(tmp1), 0, tmp1)) |>
  group_by(identifier, tmp) |>
  mutate(days = as.integer(difftime(date_collection,
                           first(date_collection),
                           units = "days")),
         want = case_when(tmp == 1 ~ tmp1,
                           tmp == 2 & days != 0 & days < 21 ~ 0,
                           .default = tmp1)) |>
  ungroup() |>
  select(result, identifier, date_collection, want)

data.frame(df1)
#      result identifier date_collection want
# 1  Negativo          a      2021-09-08    0
# 2  Negativo          a      2021-09-15    7
# 3  Negativo          a      2021-09-23    8
# 4  Negativo          a      2021-09-28    5
# 5  Negativo          a      2021-10-19   21
# 6  Positivo          a      2021-10-27    8
# 7  Positivo          a      2021-11-02    0
# 8  Negativo          a      2021-11-09    0
# 9  Negativo          a      2021-11-17    8
# 10 Negativo          a      2021-11-24    7
# 11 Negativo          a      2021-11-29    5
# 12 Negativo          a      2021-12-06    7
# 13 Negativo          a      2021-12-14    8
# 14 Negativo          a      2021-12-21    7
# 15 Negativo          a      2021-12-29    8
# 16 Negativo          b      2021-12-21    0
# 17 Negativo          b      2021-12-28    7
# 18 Negativo          b      2022-01-04    7
# 19 Negativo          b      2022-01-11    7
# 20 Negativo          b      2022-01-27   16
# 21 Negativo          b      2022-02-18   22
# 22 Negativo          b      2022-02-24    6
# 23 Negativo          b      2022-03-04    8
# 24 Positivo          b      2022-03-09    5
# 25 Negativo          b      2022-03-16    0
# 26 Negativo          b      2022-03-24    0
# 27 Negativo          b      2022-03-29    0
# 28 Negativo          b      2022-04-06    8
# 29 Negativo          b      2022-04-12    6
# 30 Negativo          b      2022-04-19    7
# 31 Negativo          b      2022-04-26    7

Andre Wildberg · Accepted Answer · 2024-05-07 22:55:52Z

First finding a new group if result is Positivo and then adding days for indicating the 21 days restriction.

Diff is then the new column if a positive is found, days are above 21 and results are negative again.

Note that I don't know how to get 1 at row 10 though.

library(dplyr)

df %>% 
  mutate(Diff = c(0, diff(date_collection)), 
         grp = cumsum(lag(result, default = "Negativo") == "Negativo" & 
                      result == "Positivo"), .by = identifier) %>% 
  mutate(days = cumsum(if_else(result != "Positivo", Diff, 0)), 
         Diff = if_else(result != "Positivo" & 
                        days <= 21 & 
                        result[1] == "Positivo", 
                          0, Diff), .by = c(identifier, grp)) %>% 
  select(-c(grp, days))

output

# A tibble: 31 × 6
   result   identifier date_collection      have  want  Diff
   <chr>    <chr>      <dttm>              <dbl> <dbl> <dbl>
 1 Negativo a          2021-09-08 00:00:00     0     0     0
 2 Negativo a          2021-09-15 00:00:00     7     7     7
 3 Negativo a          2021-09-23 00:00:00     8     8     8
 4 Negativo a          2021-09-28 00:00:00     5     5     5
 5 Negativo a          2021-10-19 00:00:00    21    21    21
 6 Positivo a          2021-10-27 00:00:00     8     8     8
 7 Positivo a          2021-11-02 00:00:00     6     6     6
 8 Negativo a          2021-11-09 00:00:00     7     0     0
 9 Negativo a          2021-11-17 00:00:00     8     0     0
10 Negativo a          2021-11-24 00:00:00     7     1     7
11 Negativo a          2021-11-29 00:00:00     5     5     5
12 Negativo a          2021-12-06 00:00:00     7     7     7
13 Negativo a          2021-12-14 00:00:00     8     8     8
14 Negativo a          2021-12-21 00:00:00     7     7     7
15 Negativo a          2021-12-29 00:00:00     8     8     8
16 Negativo b          2021-12-21 00:00:00     0     0     0
17 Negativo b          2021-12-28 00:00:00     7     7     7
18 Negativo b          2022-01-04 00:00:00     7     7     7
19 Negativo b          2022-01-11 00:00:00     7     7     7
20 Negativo b          2022-01-27 00:00:00    16    16    16
21 Negativo b          2022-02-18 00:00:00    22    22    22
22 Negativo b          2022-02-24 00:00:00     6     6     6
23 Negativo b          2022-03-04 00:00:00     8     8     8
24 Positivo b          2022-03-09 00:00:00     5     5     5
25 Negativo b          2022-03-16 00:00:00     7     0     0
26 Negativo b          2022-03-24 00:00:00     8     0     0
27 Negativo b          2022-03-29 00:00:00     5     0     0
28 Negativo b          2022-04-06 00:00:00     8     8     8
29 Negativo b          2022-04-12 00:00:00     6     6     6
30 Negativo b          2022-04-19 00:00:00     7     7     7
31 Negativo b          2022-04-26 00:00:00     7     7     7

getting this error: Error in mutate(): ! Can't supply .by when .data is a grouped data frame. — Levi M, Commented May 8, 2024 at 13:05
Then your data might already be grouped. Try ungrouping by doing df %>% ungroup() %>% mutate(... first. — Andre Wildberg, Commented May 8, 2024 at 14:45

Collectives™ on Stack Overflow

Function for zeroing time at risk after positive

2 Answers 2

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related