condition on based rin Why is `[` better than `subset`?
[ is faster:
require(microbenchmark) microbenchmark(subset(airquality, Month == 8 & Temp > 90),airquality[airquality$Month == 8 & airquality$Temp > 90,]) Unit: microseconds expr min lq median uq max neval subset(airquality, Month == 8 & Temp > 90) 301.994 312.1565 317.3600 349.4170 500.903 100 airquality[airquality$Month == 8 & airquality$Temp > 90, ] 234.807 239.3125 244.2715 271.7885 340.058 100
When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the
subset(airquality, Month == 8 & Temp > 90)
Rather than the
airquality[airquality$Month == 8 & airquality$Temp > 90, ]
There are two main reasons for my preference:
I find the code reads better, from left to right. Even people who know nothing about R could tell what the
subsetstatement above is doing.
Because columns can be referred to as variables in the
selectexpression, I can save a few keystrokes. In my example above, I only had to type
subset, but three times with
So I was living happy, using
subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the
subset documentation, I notice this section:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
Could someone help clarify what the authors mean?
First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.
Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?