All Questions
Tagged with duplicates r
1,462 questions
2
votes
1
answer
72
views
Remove duplicate names while replacing underscores with spaces in R
I have last names (left) and first names (right) separated by a comma.
Among the last names, I often (but not always) have duplicates separated by an underscore. How to remove the duplicates and, for ...
0
votes
0
answers
26
views
How can I use fastLink in R to get partial numeric matches?
I am attempting to link two datasets using fastLink. I have manually found matches between some cases that fastLink failed to pair, and I am trying to understand why this may be. To test what's going ...
13
votes
7
answers
1k
views
Remove duplicates across multiple vectors
I want to remove all duplicates across multiple vectors, leaving none. For example, for these vectors:
a <- c("dog", "fish", "cow")
b <- c("dog", "...
0
votes
1
answer
62
views
Drop duplicated values of files in a folder based on two columns R
I have over 250 large .txt files (each approximately 1GB) in a folder. I would like to remove duplicated rows based on two columns of id1 and id2 being mindful of my Macbook's memory limitations.
An ...
1
vote
4
answers
63
views
Remove duplicates based on date relation
I am conducting a study on urine infections. Each row is a urine culture result. Patients will have a hospital number. Some of them may submit multiple urine samples throughout the study and therefore ...
0
votes
2
answers
53
views
dplyr equivalent to duplicated() to show duplicated rows except the first
What is the dplyr equivalent to df[duplicated(df[,subset]),], that is for each set of duplicates based on subset columns, keeps all the rows but the first match?
This will show all duplicated rows, ...
3
votes
1
answer
70
views
How to avoid transposition of duplicates into lists with pivot_wider?
I have duplicates on the first 3 columns that I would like to keep after pivot_wider transposition, but not in list format.
How to do it?
Initial data with duplicates:
dat0 <-
structure(list(id = c(...
0
votes
2
answers
52
views
R identify columns having the same value for the entire dataframe
I am trying to identify the columns in data tables where all of the entries in each column are the same. The challenge is that the value may be different classes within and across the different tables ...
1
vote
3
answers
57
views
For every instance if "A" in Column1, create a new column with all values associated with "A" from Column2
Here is an example data frame:
df <- data.frame(Key = c(rep("A", 2), rep("B", 4), rep("C", 3), rep("D", 2)),
DataID = round(runif(11, min = ...
2
votes
0
answers
59
views
Matching only unique participants
I am having trouble matching my historical cohort to a study cohort in R studio.
Objective: Get a match to every patient in my study cohort using the historical cohort.
I want each patient in the ...
2
votes
3
answers
85
views
Filter function in Base R
I was curious about methods to return duplicate values in a vector, list, or array in R. Focusing on a vector, I defined the following:
myvec <- c('a', letters)
which duplicates the letter 'a' in ...
0
votes
2
answers
60
views
Select only columns that have no duplicates considering groups
I have a rather large dataset with both long and short data inside: some columns have unique value given a subject and a visit, while other have multiple values.
The short data is duplicated to match ...
1
vote
4
answers
59
views
Aggregating rows in a table using multiple aggregate operations based on column name in R
I have a table with web site pages and their visits. In some cases there are rows that are duplicate. I want to deduplicate rows based on yearMonth and page columns while summing users and sessions ...
0
votes
2
answers
151
views
Keep only entries in a data frame with the largest group of elements
I have this exact data frame, only a bit longer:
mydf <- data.frame(ids=c('D3022TexB4//D3022TexB7','D3022TexC10//D3026TexC1','D3021TexA6//D3022TexC8','D3022TexB4//D3022TexB7','D3021TexA6//...
0
votes
2
answers
57
views
How to sum the number of rows which have duplicate data for only two columns in the data set combined? R language [closed]
I have two columns - index and reference which when combined are meant to make up a unique ID for a given row of data. I want to see if there any any duplications of these "unique ID's" or &...