A simple implementation of data shadowing in R

Idiomatic R often uses a neat syntactic sugar called shadowing.

Imagine you have a dataframe (df) containing the costs of summer camps:

ChildWeekCost
11$60
12$200
21$400
22$275
22$1000

A common access pattern is to filter the data frame using the column names. Note however that we pass an expression – not a lambda, as we would in other languages:

df[Week == 1]

To learn why this works, we can implement a crude version of a filtering function ourselves.

First, let’s define a function that takes a dataframe and an expression:

simple_filter <- function(df, e) {
  print(enexpr(e))
}

If you call this as “filter(df, Week == 1)”, it will print out:

Week == 1

This output is the .toString equivalent for the abstract syntax tree of the expression provided in e.

Note that if we replace print(enexpr(e)) with print(e), we’ll get an error:

Error in print(e) : object 'Week' not found

There is no variable named ‘a’ in scope in the environment. We only get this error when the expression is used, as the arguments are promises that represent the result of the expression passed in, and are lazily evaluated.

The difference between the expression Week == 1 and the lambda (Week) => Week == 1 is that in a lambda, context is provided. Context includes both arguments and variables the lambda closes over. These form the “environment” in which the function runs.

To complete our filter implementation, we need to build an environment in which to evaluate the variables:

env <- new.env()

Then all we need to do is to loop over the rows in the dataframe, populating the variables as we look at each row. The assign function inserts a value into the environment:

cols <- colnames(df)
for(i in 1:nrow(df)) { 
  for(j in 1:ncol(df)) {
    col <- cols[j]
    assign(col, df[i, j], env)
  }

  ...
}

For our final step, we can put this all together into a function which prints out rows that match our expression. This adds a call to eval, which evaluates the given expression in the context of the environment we’ve built from the row:

simple_filter <- function(df, e) {
  query <- enexpr(e)
  cols <- colnames(df)
  env <- new.env()
  for(i in 1:nrow(df)) { 
    for(j in 1:ncol(df)) {
      col <- cols[j]
      assign(col, df[i, j], env)
    }

    if(eval(query, envir = env)) {
      print(df[i, ])
    }
  }
}

And there you have it!

Leave a Reply

Your email address will not be published. Required fields are marked *