About
I am a software engineer building open-source software on the Shiny team at RStudio. In addition to being a software engineer, I have been extremely lucky to work in many different realms, including as a Journalist at the New York Times, data scientist at the Johns Hopkins Data Science Lab and Dealer.com in Vermont, and “data artist in residence” at startup Conduce in California.
I recently finished my PhD in biostatistics at Vanderbilt University and before that I got my undergraduate degree from the University of Vermont where I majored in mathematics and statistics and minored in computer science.
I like data. Manipulating it, modeling it, making it (simulation), visualizing it and yes, even cleaning it. I do these things with some combination of R, Python, and Javascript.
When I am not in “work mode” I love to bike places, read science fiction, take photos of birds, and wander around gardens/museums.
Have a fantastic day!
Projects
For a much more up-to-date and topical list of my work, check out the data science/statisics/visualization blog that I run with Lucy D’Agostino McGowan: Live Free or Dichotomize.
-
A real-time javascript simulation of epidemic spreading on a network
-
Built to provide intuition for spreading in constrained environments
-
All parameters related to spread are tunable
-
An R package for building a CV/Resume from a spreadsheet of information
-
Built around the pagedown package in R
-
Framework supplied by package is entirely self-sufficient so user’s are not dependent on package version changes.
-
Shiny app for exploring results of Phenome-Wide Association Studies (PheWAS)
-
Allows user to look directly at individual data generating results to identify spurious or novel associations.
-
Build as R package framework allowing modular construction of apps based upon a projects needs.
-
Full implementation and explaination of t-sne visualization algorithm
-
Featured in “Explorables” section of Observable
-
A series of small self-contained functions for doing statistical computation in javascript
-
Functions are optimized for speed along with legibility
-
A set of shiny modules for letting shiny sense the world around it.
-
Currently has touch, sound, motion, and vision ‘senses.’
-
Bundled into an R package.
-
A resource for explaining what statistical significance really means.
-
Storified to try and make it memorable.
-
Takes the form of a reproducible r-markdown document so others can recreate
-
A walkthrough from start to finish of making a website using RMarkdown and hosting it on Github.
-
Made in collaboration with Lucy McGowan
-
Presented at the statistical computing workshop for Vanderbilt Medical Center
-
See sample site here
-
A visual exploration of Kaplin-Meier survival curves on left-truncated survival data.
-
Drag the conditional slider to see how the survival curve changes depending on the age of entry.
-
All logic for K-M curve written from scratch in javascript and much more performant than the
survival
package in R.
-
For more information on the algorithm to generate a K-M curve see the wikipedia page.
-
Also see my histogram made in the same way.
-
My first attempts at making a d3 library.
-
Ultimately will be tied with a companion R app for interactive visualization for statisticians.
-
Uses the reusable d3 structure proposed by Elliot Bentely.
-
An interactive exploration of what produce is in season.
-
Data scraped from here using python.
-
Allows the user to select different in season ingredients and search for recipes containing them.
-
Notebooks for scraping in github repo.
-
Rmarkdown document for a statistical computing workshop I gave at Vanderbilt.
-
A brief overview of some common visualization mistakes and code to fix them in ggplot
-
Provides an overview of some newer visualization tools.
-
Demonstrates how a sequence of independent Bernoulli Trials make up the Binomial Distribution.
-
Allows the user to toggle the parameters of the Bernoulli and generate a samples.
-
Calculates and displays a 95% confidence interval and wilson hypothesis test based upon the generated data.
-
All statistics funtions are written from scratch in vanilla javascript.
-
An interactive exploration of the likelihood function.
-
Visually explains the concepts of support intervals and likelihood ratios.
-
Allows the user to input their own data for creating figures for reports/presentations.
-
Allows the user to explore what a frequentist confidence interval truly is.
-
To many people, including the scientists who use them, the behavior of Confidence Intervals is confusing.
-
All statistics functions are written from base javascript. See github repo for code.
-
Made in an effort to visualize what happens when you transform a probability distribution with a function.
-
Uses the normal distribution transformed by the normal cdf, resulting in a uniform distribution. See here for more info.
-
Inspired by my course work in Probability at Vanderbilt.
-
Uses open data from NASA satelites on global temperature anomalies.
-
Fresh data is downloaded every day and pushed to the static page via shell scripts avoiding the need for servers.
-
Data source.
-
An R package to generate interactive and embedable manhattan plots for genome wide association studies.
-
Binds R and Javascript + D3 using the HTMLWidgets package.
-
Companion visualization to What Do Farmer’s Markets Sell?
-
Explore different states path’s through different metrics relating to farmers markets.
-
Uses equal sized states map as menu to reduce bias associated with normal projections.
-
Data courtesy of Data.gov.
-
Select different good types (e.g. Vegetables, Fruit) and see which markets sell them.
-
Assemble different combinations of goods to explore regional trends.
-
Dynamic layout adjusts to mobile or desktop views.
-
Be patient with it, more than eight thousand points are being drawn to the screen. It will bog down older phones/computers.
-
Data courtesy of Data.gov
-
Developed as an experiment in exploratory data visualization.
-
Select different controls for comparison, e.g. non-dominant arm growth to see linked snps.
-
A manhattan plot is a commonly used tool in accessing genetic roots for traits
-
Uses data from the FAMuSS study ( Thompson Et Al. 2004).
-
First place project at 2014 UVM CS Fair.
-
Teaches numbers 0-9 in American Sign Language.
-
Utilizes three.js and webGL for rendering.
-
Built to exploit multiple HCI and Cognitive Psychology theories (e.g. object consistancy and the generation effect) in order to maximize learning experience.
-
Wave your hands around and watch D3.js mirror you!
-
Requires a leap motion device.
-
In the future I plan on implementing ways to interact with D3 visualizations by recognizing gestures using machine learning algorithms.
-
Video of it in action for if you don’t have a Leap.
-
Note: When using, start by waving your hands around above the Leap Motion device and watch it calibrate!
-
A visualization developed for LabInTheWild at the University of Michigan to help participants place themselves among differing demographics.
-
Using d3.hexbin I took took 18k+ data points and binned them to help explore geographic trends in alternative energy filling stations.
-
A visualization that explores how electricity is generated in the state of California. Data was cleaned using python and then the visualization was generated using d3.js.
CV/Resume
Want a longer list of the stuff I’ve done related to my career? I have a CV!
Need a short and to-the-point single page annotation of my data-science career? Try my resume!
Interested in how I made these? Check out the repo: github.com/nstrayer/cv