Nick Strayer | Visual Data Scientist
About
I am a principal software engineer at Posit building tools to help data scientists work better. In addition to being a software engineer, I have been extremely lucky to work in many different realms, including as a Journalist at the New York Times, data scientist at the Johns Hopkins Data Science Lab and Dealer.com in Vermont, and “data artist in residence” at startup Conduce in California.
I have a PhD in biostatistics from Vanderbilt University and an undergraduate degree from the University of Vermont where I majored in mathematics and statistics and minored in computer science.
When I am not in “work mode” I love to bike places, read science fiction, take photos of birds, and wander around gardens/museums.
Have a fantastic day!
Projects
For a much more up-to-date and topical list of my work, check out the data science/statisics/visualization blog that I run with Lucy D’Agostino McGowan: Live Free or Dichotomize.
DataDrivenCV package
- An R package for building a CV/Resume from a spreadsheet of information
- Built around the pagedown package in R
- Framework supplied by package is entirely self-sufficient so user’s are not dependent on package version changes.
Data Visualization Best Practices in R
- Intermediate level course exploring visualization best practices in R.
- Uses ggplot2 and the tidyverse packages.
- See the story A Multimillion-Dollar Startup Hid A Sexual Harassment Incident By Its CEO for information on why I no longer actively promote this course.
Making Nice Looking Websites Using RMarkdown
- A walkthrough from start to finish of making a website using RMarkdown and hosting it on Github.
- Made in collaboration with Lucy McGowan
- Presented at the statistical computing workshop for Vanderbilt Medical Center
- See sample site here
Conditional Survival Curves on Truncated Survival Data
- A visual exploration of Kaplin-Meier survival curves on left-truncated survival data.
- Drag the conditional slider to see how the survival curve changes depending on the age of entry.
-
All logic for K-M curve written from scratch in javascript and much more performant than the
survival
package in R. - For more information on the algorithm to generate a K-M curve see the wikipedia page.
Reusable Statistics Plots in D3
- Also see my histogram made in the same way.
- My first attempts at making a d3 library.
- Ultimately will be tied with a companion R app for interactive visualization for statisticians.
- Uses the reusable d3 structure proposed by Elliot Bentely.
What’s In Season?
- An interactive exploration of what produce is in season.
- Data scraped from here using python.
- Allows the user to select different in season ingredients and search for recipes containing them.
- Notebooks for scraping in github repo.
Binomially Distributed Fun!
- Demonstrates how a sequence of independent Bernoulli Trials make up the Binomial Distribution.
- Allows the user to toggle the parameters of the Bernoulli and generate a samples.
- Calculates and displays a 95% confidence interval and wilson hypothesis test based upon the generated data.
- All statistics funtions are written from scratch in vanilla javascript.
Probability Integral Transformations
- Made in an effort to visualize what happens when you transform a probability distribution with a function.
- Uses the normal distribution transformed by the normal cdf, resulting in a uniform distribution. See here for more info.
- Inspired by my course work in Probability at Vanderbilt.
Where Are Wildfires Burning?
- Uses open data from NASA satelites on global temperature anomalies.
- Fresh data is downloaded every day and pushed to the static page via shell scripts avoiding the need for servers.
- Data source.
Interactive Manhattan Plot R Package.
- An R package to generate interactive and embedable manhattan plots for genome wide association studies.
- Binds R and Javascript + D3 using the HTMLWidgets package.
State Farmers Market Profiles.
- Companion visualization to What Do Farmer’s Markets Sell?
- Explore different states path’s through different metrics relating to farmers markets.
- Uses equal sized states map as menu to reduce bias associated with normal projections.
- Data courtesy of Data.gov.
What Do Farmer’s Markets Sell?
- Select different good types (e.g. Vegetables, Fruit) and see which markets sell them.
- Assemble different combinations of goods to explore regional trends.
- Dynamic layout adjusts to mobile or desktop views.
- Be patient with it, more than eight thousand points are being drawn to the screen. It will bog down older phones/computers.
- Data courtesy of Data.gov
Interactive Manhattan Plot Viewer.
- Developed as an experiment in exploratory data visualization.
- Select different controls for comparison, e.g. non-dominant arm growth to see linked snps.
- A manhattan plot is a commonly used tool in accessing genetic roots for traits
- Uses data from the FAMuSS study ( Thompson Et Al. 2004).
Experimental Leap Motion + D3.js project.
- Wave your hands around and watch D3.js mirror you!
- Requires a leap motion device.
- In the future I plan on implementing ways to interact with D3 visualizations by recognizing gestures using machine learning algorithms.
- Video of it in action for if you don’t have a Leap.
- Note: When using, start by waving your hands around above the Leap Motion device and watch it calibrate!
Polio’s impact on the United States.
- A project for Data Science 2 (Math 295) taught by Professor James Bagrow at the University of Vermont.
- iPython notebook and data files available on my github.
Marvel Vs. DC in the theater.
- I extracted data from The Verge article ‘Marvel’s movie business is crushing DC’s and it’s not close.’
CV/Resume
Want a longer list of the stuff I’ve done related to my career? I have a CV!
Need a short and to-the-point single page annotation of my data-science career? Try my resume!
Interested in how I made these? Check out the repo: github.com/nstrayer/cv
Contact
I am always interested in getting involved in new projects or just connecting with others. Feel free to get in touch!
email: nick.strayer (at) gmail
twitter: NicholasStrayer
github: nstrayer