Nick Strayer | Visual Data Scientist

About

I am a principal software engineer at Posit building tools to help data scientists work better. In addition to being a software engineer, I have been extremely lucky to work in many different realms, including as a Journalist at the New York Times, data scientist at the Johns Hopkins Data Science Lab and Dealer.com in Vermont, and “data artist in residence” at startup Conduce in California.

I have a PhD in biostatistics from Vanderbilt University and an undergraduate degree from the University of Vermont where I majored in mathematics and statistics and minored in computer science.

When I am not in “work mode” I love to bike places, read science fiction, take photos of birds, and wander around gardens/museums.

Have a fantastic day!

Projects

For a much more up-to-date and topical list of my work, check out the data science/statisics/visualization blog that I run with Lucy D’Agostino McGowan: Live Free or Dichotomize.

A graphical experiment of exponential spread

  • A real-time javascript simulation of epidemic spreading on a network
  • Built to provide intuition for spreading in constrained environments
  • All parameters related to spread are tunable

DataDrivenCV package

  • An R package for building a CV/Resume from a spreadsheet of information
  • Built around the pagedown package in R
  • Framework supplied by package is entirely self-sufficient so user’s are not dependent on package version changes.

Phewas-ME

  • Shiny app for exploring results of Phenome-Wide Association Studies (PheWAS)
  • Allows user to look directly at individual data generating results to identify spurious or novel associations.
  • Build as R package framework allowing modular construction of apps based upon a projects needs.

t-SNE explained in plain javascript

  • Full implementation and explaination of t-sne visualization algorithm
  • Featured in “Explorables” section of Observable

Javascript Statistics Snippets

  • A series of small self-contained functions for doing statistical computation in javascript
  • Functions are optimized for speed along with legibility

Shinysense

  • A set of shiny modules for letting shiny sense the world around it.
  • Currently has touch, sound, motion, and vision ‘senses.’
  • Bundled into an R package.

What are P-Values, Really?

  • A resource for explaining what statistical significance really means.
  • Storified to try and make it memorable.
  • Takes the form of a reproducible r-markdown document so others can recreate

Data Visualization Best Practices in R


Making Nice Looking Websites Using RMarkdown

  • A walkthrough from start to finish of making a website using RMarkdown and hosting it on Github.
  • Made in collaboration with Lucy McGowan
  • Presented at the statistical computing workshop for Vanderbilt Medical Center
  • See sample site here

Conditional Survival Curves on Truncated Survival Data

  • A visual exploration of Kaplin-Meier survival curves on left-truncated survival data.
  • Drag the conditional slider to see how the survival curve changes depending on the age of entry.
  • All logic for K-M curve written from scratch in javascript and much more performant than the survival package in R.
  • For more information on the algorithm to generate a K-M curve see the wikipedia page.

Reusable Statistics Plots in D3

  • Also see my histogram made in the same way.
  • My first attempts at making a d3 library.
  • Ultimately will be tied with a companion R app for interactive visualization for statisticians.
  • Uses the reusable d3 structure proposed by Elliot Bentely.

What’s In Season?

  • An interactive exploration of what produce is in season.
  • Data scraped from here using python.
  • Allows the user to select different in season ingredients and search for recipes containing them.
  • Notebooks for scraping in github repo.

Data Visualization In R

  • Rmarkdown document for a statistical computing workshop I gave at Vanderbilt.
  • A brief overview of some common visualization mistakes and code to fix them in ggplot
  • Provides an overview of some newer visualization tools.

Binomially Distributed Fun!

  • Demonstrates how a sequence of independent Bernoulli Trials make up the Binomial Distribution.
  • Allows the user to toggle the parameters of the Bernoulli and generate a samples.
  • Calculates and displays a 95% confidence interval and wilson hypothesis test based upon the generated data.
  • All statistics funtions are written from scratch in vanilla javascript.

The Likelihood Function

  • An interactive exploration of the likelihood function.
  • Visually explains the concepts of support intervals and likelihood ratios.
  • Allows the user to input their own data for creating figures for reports/presentations.

Confidence Intervals Explained

  • Allows the user to explore what a frequentist confidence interval truly is.
  • To many people, including the scientists who use them, the behavior of Confidence Intervals is confusing.
  • All statistics functions are written from base javascript. See github repo for code.

Probability Integral Transformations

  • Made in an effort to visualize what happens when you transform a probability distribution with a function.
  • Uses the normal distribution transformed by the normal cdf, resulting in a uniform distribution. See here for more info.
  • Inspired by my course work in Probability at Vanderbilt.

Where Are Wildfires Burning?

  • Uses open data from NASA satelites on global temperature anomalies.
  • Fresh data is downloaded every day and pushed to the static page via shell scripts avoiding the need for servers.
  • Data source.

Interactive Manhattan Plot R Package.

  • An R package to generate interactive and embedable manhattan plots for genome wide association studies.
  • Binds R and Javascript + D3 using the HTMLWidgets package.

State Farmers Market Profiles.

  • Companion visualization to What Do Farmer’s Markets Sell?
  • Explore different states path’s through different metrics relating to farmers markets.
  • Uses equal sized states map as menu to reduce bias associated with normal projections.
  • Data courtesy of Data.gov.

What Do Farmer’s Markets Sell?

  • Select different good types (e.g. Vegetables, Fruit) and see which markets sell them.
  • Assemble different combinations of goods to explore regional trends.
  • Dynamic layout adjusts to mobile or desktop views.
  • Be patient with it, more than eight thousand points are being drawn to the screen. It will bog down older phones/computers.
  • Data courtesy of Data.gov

Interactive Manhattan Plot Viewer.

  • Developed as an experiment in exploratory data visualization.
  • Select different controls for comparison, e.g. non-dominant arm growth to see linked snps.
  • A manhattan plot is a commonly used tool in accessing genetic roots for traits
  • Uses data from the FAMuSS study ( Thompson Et Al. 2004).

Learn ASL numbers with Leap Motion.

  • First place project at 2014 UVM CS Fair.
  • Teaches numbers 0-9 in American Sign Language.
  • Utilizes three.js and webGL for rendering.
  • Built to exploit multiple HCI and Cognitive Psychology theories (e.g. object consistancy and the generation effect) in order to maximize learning experience.

Experimental Leap Motion + D3.js project.

  • Wave your hands around and watch D3.js mirror you!
  • Requires a leap motion device.
  • In the future I plan on implementing ways to interact with D3 visualizations by recognizing gestures using machine learning algorithms.
  • Video of it in action for if you don’t have a Leap.
  • Note: When using, start by waving your hands around above the Leap Motion device and watch it calibrate!

Polio’s impact on the United States.


labinthewild.org Interactive Visualization.

  • A visualization developed for LabInTheWild at the University of Michigan to help participants place themselves among differing demographics.

Alternative energy filling stations in the U.S..

  • Using d3.hexbin I took took 18k+ data points and binned them to help explore geographic trends in alternative energy filling stations.

Marvel Vs. DC in the theater.


Where does California get its energy?

  • A visualization that explores how electricity is generated in the state of California. Data was cleaned using python and then the visualization was generated using d3.js.

CV/Resume

Want a longer list of the stuff I’ve done related to my career? I have a CV!

Need a short and to-the-point single page annotation of my data-science career? Try my resume!

Interested in how I made these? Check out the repo: github.com/nstrayer/cv

Contact

I am always interested in getting involved in new projects or just connecting with others. Feel free to get in touch!

email: nick.strayer (at) gmail

twitter: NicholasStrayer

github: nstrayer