Building a statistical significance testing web service powered by R

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on large, in-memory data sets, which feels somewhat similar to database programming. Examples in the R Cookbook bear a resemblence to functional programming in clojure, as others have noted.

I’ve been exploring the language to gain insight into related, but disparate technologies that I use with regularity (e.g. Postgres), but for this to be really useful, I’d like to see R behind a webservice. Looking through the official website, there are many defunct attempts at using R in this manner, often abandoned once the maintainer finishes their masters.

A couple have survived, notably Rook and rApache. Rook is a web server inside of R, and rApache, as you might guess, is an Apache module that calls R. I’ve chosen rApache, as I’d like to have a battle-tested front-end for this – while R seems to have very committed maintainers, there do not seem to be very many of them, and I have yet to find examples of anyone running this as a production application.

Inspired by WolframAlpha’s APIs, I built a small web service to test statistical significance. In the future I intend to do tests on performance and security, as well as available JSON libraries.

Here is the installation procedure:

apt-get upgrade
apt-get update
apt-get install r-base r-base-dev 
apt-get install apache2-mpm-prefork apache2-prefork-dev 
apt-get install git-core
git clone https://github.com/jeffreyhorner/rapache.git
cd rapache
./configure
make
make test
make install
vi /etc/apache2/httpd.conf

Apache configuration settings:


LoadModule R_module /usr/lib/apache2/modules/mod_R.so


SetHandler r-info


ROutputErrors


        SetHandler r-script
        RHandler sys.source


/etc/init.d/apache2 restart

And these are the contents of ws.R:


setContentType("application/json")

zscore<-function(p, pc, N, Nc){ (p-pc) 
     / sqrt(p * (1-p) / N + pc * (1-pc) / Nc) }
significant<-function(p, pc, N, Nc){ 
     zscore(p, pc, N, Nc) > 1.65 }

valid<-function(x){ nchar(x) < 10 }

if (!valid(GET$pc) 
 || !valid(GET$p) 
 || !valid(GET$N) 
 || !valid(GET$Nc)) {
  cat('error:arg length')
} else {
cat(significant(as.numeric(GET$p), 
                as.numeric(GET$pc), 
                as.numeric(GET$N), 
                as.numeric(GET$Nc)))
}

OK

For instance, the output of http://localhost:8080/R/ws.R?p=.15&pc=.10&N=1000&Nc=1100
is "TRUE"