Hi, I'm Nick

Visual Software Engineer and Data Scientist

I am a principal software engineer at Posit building tools to help data scientists work better.

About me

At Posit, I develop tools that enhance data scientists' workflows. My career spans multiple disciplines—from being a journalist at the New York Times to a data scientist at Johns Hopkins Data Science Lab and even a "data artist in residence" at a geospatial visualization startup.

My Ph.D. from Vanderbilt University focused on statistical methodologies for network data, deep learning, and data visualization. At the University of Vermont, I studied statistics, mathematics, and computer science.

When I am not in "work mode," I enjoy photographing birds and my daughter, running and hiking, reading science fiction, and contributing to open-source projects.

Projects

A graphical experiment of exponential spread
A graphical experiment of exponential spread

A real-time javascript simulation of epidemic spreading on a network

Built to provide intuition for spreading in constrained environments

All parameters related to spread are tunable

DataDrivenCV package
DataDrivenCV package

An R package for building a CV/Resume from a spreadsheet of information

Built around the pagedown package in R

Framework supplied by package is entirely self-sufficient so user's are not dependent on package version changes.

Phewas-ME
Phewas-ME

Shiny app for exploring results of Phenome-Wide Association Studies (PheWAS)

Allows user to look directly at individual data generating results to identify spurious or novel associations.

Build as R package framework allowing modular construction of apps based upon a projects needs.

t-SNE explained in plain javascript
t-SNE explained in plain javascript

Full implementation and explaination of t-sne visualization algorithm

Featured in "Explorables" section of Observable

Javascript Statistics Snippets
Javascript Statistics Snippets

A series of small self-contained functions for doing statistical computation in javascript

Functions are optimized for speed along with legibility

Shinysense
Shinysense

A set of shiny modules for letting shiny sense the world around it.

Currently has touch, sound, motion, and vision 'senses.'

Bundled into an R package.

What are P-Values, Really?
What are P-Values, Really?

A resource for explaining what statistical significance really means.

Storified to try and make it memorable.

Takes the form of a reproducible r-markdown document so others can recreate

Data Visualization Best Practices in R
Data Visualization Best Practices in R

Intermediate level course exploring visualization best practices in R.

Uses ggplot2 and the tidyverse packages.

See the story A Multimillion-Dollar Startup Hid A Sexual Harassment Incident By Its CEO for information on why I no longer actively promote this course.

Making Nice Looking Websites Using RMarkdown
Making Nice Looking Websites Using RMarkdown

A walkthrough from start to finish of making a website using RMarkdown and hosting it on Github.

Made in collaboration with Lucy McGowan

Presented at the statistical computing workshop for Vanderbilt Medical Center

See sample site here

Conditional Survival Curves on Truncated Survival Data
Conditional Survival Curves on Truncated Survival Data

A visual exploration of Kaplin-Meier survival curves on left-truncated survival data.

Drag the conditional slider to see how the survival curve changes depending on the age of entry.

All logic for K-M curve written from scratch in javascript and much more performant than the survival package in R.

For more information on the algorithm to generate a K-M curve see the wikipedia page.

Reusable Statistics Plots in D3
Reusable Statistics Plots in D3

Also see my histogram made in the same way.

My first attempts at making a d3 library.

Ultimately will be tied with a companion R app for interactive visualization for statisticians.

Uses the reusable d3 structure proposed by Elliot Bentely.

What's In Season?
What's In Season?

An interactive exploration of what produce is in season.

Data scraped from here using python.

Allows the user to select different in season ingredients and search for recipes containing them.

Notebooks for scraping in github repo.

Data Visualization In R
Data Visualization In R

Rmarkdown document for a statistical computing workshop I gave at Vanderbilt.

A brief overview of some common visualization mistakes and code to fix them in ggplot

Provides an overview of some newer visualization tools.

Binomially Distributed Fun!
Binomially Distributed Fun!

Demonstrates how a sequence of independent Bernoulli Trials make up the Binomial Distribution.

Allows the user to toggle the parameters of the Bernoulli and generate a samples.

Calculates and displays a 95% confidence interval and wilson hypothesis test based upon the generated data.

All statistics funtions are written from scratch in vanilla javascript.

The Likelihood Function
The Likelihood Function

An interactive exploration of the likelihood function.

Visually explains the concepts of support intervals and likelihood ratios.

Allows the user to input their own data for creating figures for reports/presentations.

Confidence Intervals Explained
Confidence Intervals Explained

Allows the user to explore what a frequentist confidence interval truly is.

To many people, including the scientists who use them, the behavior of Confidence Intervals is confusing.

All statistics functions are written from base javascript. See github repo for code.

Probability Integral Transformations
Probability Integral Transformations

Made in an effort to visualize what happens when you transform a probability distribution with a function.

Uses the normal distribution transformed by the normal cdf, resulting in a uniform distribution. See here for more info.

Inspired by my course work in Probability at Vanderbilt.

Where Are Wildfires Burning?
Where Are Wildfires Burning?

Uses open data from NASA satelites on global temperature anomalies.

Fresh data is downloaded every day and pushed to the static page via shell scripts avoiding the need for servers.

Data source.

Interactive Manhattan Plot R Package.
Interactive Manhattan Plot R Package.

An R package to generate interactive and embedable manhattan plots for genome wide association studies.

Binds R and Javascript + D3 using the HTMLWidgets package.

State Farmers Market Profiles.
State Farmers Market Profiles.

Companion visualization to What Do Farmer's Markets Sell?

Explore different states path's through different metrics relating to farmers markets.

Uses equal sized states map as menu to reduce bias associated with normal projections.

Data courtesy of Data.gov.

What Do Farmer's Markets Sell?
What Do Farmer's Markets Sell?

Select different good types (e.g. Vegetables, Fruit) and see which markets sell them.

Assemble different combinations of goods to explore regional trends.

Dynamic layout adjusts to mobile or desktop views.

Be patient with it, more than eight thousand points are being drawn to the screen. It will bog down older phones/computers.

Data courtesy of Data.gov

Interactive Manhattan Plot Viewer.
Interactive Manhattan Plot Viewer.

Developed as an experiment in exploratory data visualization.

Select different controls for comparison, e.g. non-dominant arm growth to see linked snps.

A manhattan plot is a commonly used tool in accessing genetic roots for traits

Uses data from the FAMuSS study ( Thompson Et Al. 2004).

Learn ASL numbers with Leap Motion.
Learn ASL numbers with Leap Motion.

First place project at 2014 UVM CS Fair.

Teaches numbers 0-9 in American Sign Language.

Utilizes three.js and webGL for rendering.

Built to exploit multiple HCI and Cognitive Psychology theories (e.g. object consistancy and the generation effect) in order to maximize learning experience.

Experimental Leap Motion + D3.js project.
Experimental Leap Motion + D3.js project.

Wave your hands around and watch D3.js mirror you!

Requires a leap motion device.

In the future I plan on implementing ways to interact with D3 visualizations by recognizing gestures using machine learning algorithms.

Video of it in action for if you don't have a Leap.

Note: When using, start by waving your hands around above the Leap Motion device and watch it calibrate!

Polio's impact on the United States.
Polio's impact on the United States.

A project for Data Science 2 (Math 295) taught by Professor James Bagrow at the University of Vermont.

iPython notebook and data files available on my github.

labinthewild.org Interactive Visualization.
labinthewild.org Interactive Visualization.

A visualization developed for LabInTheWild at the University of Michigan to help participants place themselves among differing demographics.

Alternative energy filling stations in the U.S..
Alternative energy filling stations in the U.S..

Using d3.hexbin I took took 18k+ data points and binned them to help explore geographic trends in alternative energy filling stations.

Marvel Vs. DC in the theater.
Marvel Vs. DC in the theater.
Where does California get its energy?
Where does California get its energy?

A visualization that explores how electricity is generated in the state of California. Data was cleaned using python and then the visualization was generated using d3.js.

Skills

TypeScriptJavaScriptRPythonSwiftSoftware ArchitectureUIUXStatistical ComputingMachine LearningData VisualizationBiostatisticsReactNode.jsD3.jsVS Code Extension API

The boring stuff

CV

Want a longer list of the stuff I've done related to my career? I have a CV!

Resume

Need a short and to-the-point single page annotation of my data-science career? Try my resume!

Interested in how I made these? Check out the repo: github.com/nstrayer/cv

Get in touch

I am always interested in getting involved in new projects or just connecting with others. Feel free to get in touch!