Tidy Evaluation: Dynamic column manipulation using dplyr
When it comes to R, the tidyverse is my dream land. I often have the problem where I want to write a fuction to manipulate data frames, but can’t figure out a good way to pass field names to a function and be able to run dplyr code on it.
I know there are some workarounds to this in base R, but I recently started learning about tidy evaluation. It’s a pretty simple framework in R that allows you to functionalize dplyr commands.
The documentation for this framework is extensive and I suggest reading it. I am going to examples of this that have worked for me here.
Add a column to a dataframe
In this example, I create a function that adds a lagged version of any column in a dataframe to the dataframe.
suppressMessages(library(tidyverse))
add_lag_field <- function(df, col) {
# define new field name
var_name = paste0(col,"_lag")
return(df %>% mutate(
!!sym(var_name) := lag(!!sym(col))
)
)
}
head(add_lag_field(iris, "Sepal.Width"))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Width_lag
## 1 5.1 3.5 1.4 0.2 setosa NA
## 2 4.9 3.0 1.4 0.2 setosa 3.5
## 3 4.7 3.2 1.3 0.2 setosa 3.0
## 4 4.6 3.1 1.5 0.2 setosa 3.2
## 5 5.0 3.6 1.4 0.2 setosa 3.1
## 6 5.4 3.9 1.7 0.4 setosa 3.6