4

In R (4.4.0), I found that the base function merge accepts an argument iby in place of by. The RGUI with vanilla setting reproduces the same result for two different computers I have.

However, when I investigate the source code of base::merge by getAnywhere(merge.data.frame), there is not mention of iby argument. Thus, I have no idea why iby argument can successfully run.

Can anybody tell why this is the case? What are the mechanisms behind?

df1 <- data.frame(ID = c(1, 2, 3), Name = c("Alice", "Bob", "Charlie"), stringsAsFactors = FALSE)
df2 <- data.frame(ID = c(1, 2, 4), Age = c(24, 25, 26), stringsAsFactors = FALSE)

result_iby <- merge(df1, df2, iby = "ID")
result_by <- merge(df1, df2, by = "ID")

print(result_iby)
1
  • I can reproduce this error. And it also happens with aby, bby, aaby. For instance, result_iby <- merge(df1, df2, aaby = "ID"); identical(result_iby, result_by) returns TRUE. Commented Mar 19 at 18:54

1 Answer 1

8

OK, I have a theory. First of all, this works too:

result_garbage <- merge(df1, df2, garbage = "ID")

There are two reasons this looks like it works.

  • merge has a ... argument that will swallow any unrecognized arguments
  • the default value of the by argument is intersect(names(x), names(y)), which happens to be "ID" in this case.

So the iby argument is being ignored but the same value is being filled in by the default.

A test of this is that merge(df1, df2, by = "garbage") throws

"Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column")

while merge(df1, df2, iby = "garbage") works fine (merges on "ID").

Arguably R should be more helpful about reporting when unrecognized arguments are passed through ... and discarded ... As people have realized this, functions like rlang::check_dots_used (and similar functions in other packages) have become more widely used ...

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.