Main
Nick Strayer
As a software developer I use my background in data science to build tools to help people explore, understand, and work with their data better. I have made visualizations viewed by hundreds of thousands of people, sped up query times for 25 terabytes of data by an average of 4,800 times, and built packages for R that let you do magic.
Education
PhD., Biostatistics
Vanderbilt University
Nashville, TN
2020
- Disertation: Network analysis and visualization for electronic health records data.
- Specialized in creating high-performance interactive visualization platforms
- Developed algorithms for efficient real-time network data processing
B.S., Mathematics, Statistics (minor C.S.)
University of Vermont
Burlington, VT
2015
- Thesis: An agent based model of Diel Vertical Migration patterns of Mysis diluviana
- Focused on computational efficiency, simulation optimization, and interactive model exploration
Research Experience
Graduate Research Assistant
TBILab (Yaomin Xu’s Lab)
Vanderbilt University
Current - 2015
- Primarily working with large EHR and Biobank datasets.
- Developing network-based methods to investigate and visualize clinically relevant patterns in data.
Data Science Researcher
Data Science Lab
Johns Hopkins University
2018 - 2017
- Building R Shiny applications in the contexts of wearables and statistics education.
- Work primarily done in R Shiny and Javascript (node and d3js).
Undergraduate Researcher
Rubenstein Ecosystems Science Laboratory
University of Vermont
2015 - 2013
- Analyzed and visualized data for CATOS fish tracking project.
- Head of data mining project to establish temporal trends in population densities of Mysis diluviana (Mysis).
- Ran project to mathematically model the migration patterns of Mysis (honors thesis project.)
Human Computer Interaction Researcher
LabInTheWild (Reineke Lab)
University of Michigan
2015
- Led development and implementation of interactive data visualizations to help users compare themselves to other demographics.
Undergraduate Researcher
Bentil Laboratory
University of Vermont
2014 - 2013
- Developed mathematical model to predict the transport of sulfur through the environment with applications in waste cleanup.
Research Assistant
Adair Laboratory
University of Vermont
2013 - 2012
- Independently analyzed and constructed statistical models for large data sets pertaining to carbon decomposition rates.
Industry Experience
While most recently I have had the job title of “software engineer”, I have worked in a variety of roles ranging from journalist to data scientist. Ultimately categorization is hard.
Principal Software Engineer
Posit
Remote
Current - 2024
- Architect and develop full-stack solutions for the Positron data science IDE
- Worked across the Typescript, Python, and Rust codebase to build user-centric interfaces that balance performance with intuitive design
- Collaborate across teams to ensure reliable, maintainable codebase architecture
- Mentored junior developers on frontend best practices and code quality standards
Senior Software Engineer
Posit
Remote
2024 - 2023
- Created and led development of ShinyUiEditor, a React-based drag-and-drop interface builder
- Designed architecture for real-time previewing and component manipulation using custom psuedo-ast format that allowed translation into either R or Python from the same ast.
- Spearheaded work to simplift and unify the UI layer of R and Shiny using custom webcomponents.
Software Engineer
Posit
Remote
2023 - 2020
- Part of team who created Shiny for Python, a ground-up rewrite of R’s Shiny framework in Python
Data Journalist - Graphics Department
New York Times
New York, New York
2016
- Reporter with the graphics desk covering topics in science, politics, and sport.
- Work primarily done in R, Javascript, and Adobe Illustrator.
- Developed interactive, data-dense visualizations viewed by hundreds of thousands of users
Engineering Intern - User Experience
Dealer.com
Burlington, VT
2015
- Built internal tool to help analyze and visualize user interaction with back-end products.
Data Science Intern
Dealer.com
Burlington, VT
2015
- Worked with the product analytics team to help parse and visualize large stores of data to drive business decisions.
Data Artist In Residence
Conduce
Carpinteria, CA
2015 - 2014
- Envisioned, prototyped and implemented visualization framework in the course of one month.
- Constructed training protocol for bringing third parties up to speed with new protocol.
Software Engineering Intern
Conduce
Carpinteria, CA
2014
- Incorporated d3.js to the company’s main software platform.
Teaching Experience
I am passionate about education. I believe that no topic is too complex if the teacher is empathetic and willing to think about new methods of approaching task.
Javascript for Shiny Users
RStudio::conf 2020
N/A
2020
- Served as TA for two day workshop on how to leverage Javascript in Shiny applications
- Lectured on using R2D3 package to build interactive visualizations.
Data Visualization Best Practices
DataCamp
N/A
2019
- Designed from bottom up course to teach best practices for scientific visualizations.
- Uses R and ggplot2.
- In top 10% on platform by popularity.
Improving your visualization in Python
DataCamp
N/A
2019
- Designed from bottom up course to teach advanced methods for enhancing visualization.
- Uses python, matplotlib, and seaborn.
Advanced Statistical Learning and Inference
Vanderbilt Biostatistics Department
Nashville, TN
2018 - 2017
- TA and lectured
- Topics covered from penalized regression to boosted trees and neural networks
- Highest level course offered in department
Advanced Statistical Computing
Vanderbilt Biostatistics Department
Nashville, TN
2018
- TA and lectured
- Covered modern statistical computing algorithms
- 4th year PhD level class
Statistical Computing in R
Vanderbilt Biostatistics Department
Nashville, TN
2017
- TA and lectured
- Covered introduction to R language for statistics applications
- Graduate level class
Selected Data Science Writing
I regularly blog about data science and visualization on my blog LiveFreeOrDichotomize.
Using AWK and R to Parse 25tb
LiveFreeOrDichotomize.com
N/A
2019
- Achieved 4,800x performance improvement for large-scale genomic data processing.
- Reached top of HackerNews multiple times
Classifying physical activity from smartphone data
RStudio Tensorflow Blog
N/A
2018
- Walk through of training a convolutional neural network to achieve state of the art recognition of activities from accelerometer data.
- Contracted article.
The United States of Seasons
LiveFreeOrDichotomize.com
N/A
2018
- GIS analysis of weather data to find the most ‘seasonal’ locations in United States
- Used Bayesian regression methods for smoothing sparse geospatial data.
A year as told by fitbit
LiveFreeOrDichotomize.com
N/A
2017
- Analyzing a full years worth of second-level heart rate data from wearable device.
- Demonstrated visualization-based inference for large data.
MCMC and the case of the spilled seeds
LiveFreeOrDichotomize.com
N/A
2017
- Full Bayesian MCMC sampler running in your browser.
- Coded from scratch in vanilla Javascript.
The Traveling Metallurgist
LiveFreeOrDichotomize.com
N/A
2017
- Pure javascript implementation of traveling salesman solution using simulated annealing.
- Allows reader to customize the number and location of cities to attempt to trick the algorithm.
Selected Press (About)
Great paper? Swipe right on the new ‘Tinder for preprints’ app
Science
N/A
2017
- Story of the app Papr made with Jeff Leek and Lucy D’Agostino McGowan.
Swipe right for science: Papr app is ‘Tinder for preprints’
Nature News
N/A
2017
- Second press article for app Papr.
The Deeper Story in the Data
University of Vermont Quarterly
N/A
2016
- Story on my path post graduation and the power of narrative.
Selected Press (By)
The Great Student Migration
The New York Times
N/A
2016
- Most shared NYT article of August 2016, demonstrating ability to create engaging UIs.
- Used d3.js to realtime render 100 maps for personalized inspection for readers.
Wildfires are Getting Worse, The New York Times
The New York Times
N/A
2016
- GIS analysis and modeling of fire patterns and trends
- Data in collaboration with NASA and USGS
Who’s Speaking at the Democratic National Convention?
The New York Times
N/A
2016
- Data scraped from CSPAN records to figure out who talked and past conventions.
Who’s Speaking at the Republican National Convention?
The New York Times
N/A
2016
- Used same data scraping techniques as Who’s Speaking at the Democratic National Convention?
A Trail of Terror in Nice, Block by Block
The New York Times
N/A
2016
- Led research effort to put together story of 2016 terrorist attack in Nice, France in less than 12 hours.
- Work won Silver medal at Malofiej 2017, and gold at Society of News and Design.
Selected Publications, Posters, and Talks
Building a software package in tandem with machine learning methods research can result in both more rigorous code and more rigorous research
ENAR 2020
N/A
2020
- Invited talk in Human Data Interaction section.
- How and why building an R package can benefit methodological research
Stochastic Block Modeling in R, Statistically rigorous clustering with rigorous code
RStudio::conf 2020
N/A
2020
- Invited talk about new sbmR package.
- Focus on how software development and methodological research can improve both benefit when done in tandem.
PheWAS-ME: A web-app for interactive exploration of multimorbidity patterns in PheWAS
Bioinformatics
N/A
2020
- Manuscript detailing application for the exploration of multimorbidity patterns in PheWAS analyses
- See landing page for more information.
Charge Reductions Associated with Shortening Time to Recovery in Septic Shock
Chest
N/A
2019
- Authored with Wesley H. Self, MD MPH; Dandan Liu, PhD; Stephan Russ, MD, MPH; Michael J. Ward, MD, PhD, MBA; Nathan I. Shapiro, MD, MPH; Todd W. Rice, MD, MSc; Matthew W. Semler, MD, MSc.
Multimorbidity Explorer | A shiny app for exploring EHR and biobank data
RStudio::conf 2019
N/A
2019
- Contributed Poster. Authored with Yaomin Xu.
Taking a network view of EHR and Biobank data to find explainable multivariate patterns
Vanderbilt Biostatistics Seminar Series
N/A
2019
- University wide seminar series.
Patient-specific risk factors independently influence survival in Myelodysplastic Syndromes in an unbiased review of EHR records
Under-Review (copy available upon request.)
N/A
2019
- Bayesian network analysis used to find novel subgroups of patients with Myelodysplastic Syndromes (MDS).
- Analysis done using method built for my dissertation.
Patient specific comorbidities impact overall survival in myelofibrosis
Under-Review (copy available upon request.)
N/A
2019
- Bayesian network analysis used to find robust novel subgroups of patients with given genetic mutations.
- Analysis done using method built for my dissertation.
R timelineViz: Visualizing the distribution of study events in longitudinal studies
Under-Review (copy available upon request.)
N/A
2018
- Authored with Alex Sunderman of the Vanderbilt Department of Epidemiology.
Continuous Classification using Deep Neural Networks
Vanderbilt Biostatistics Qualification Exam
N/A
2017
- Review of methods for classifying continuous data streams using neural networks
- Successfully met qualifying examination standards
Asymmetric Linkage Disequilibrium: Tools for Dissecting Multiallelic LD
Journal of Human Immunology
N/A
2015
- Authored with Richard Single, Vanja Paunic, Mark Albrecht, and Martin Maiers.
An Agent Based Model of Mysis Migration
International Association of Great Lakes Research Conference
N/A
2015
- Authored with Brian O’Malley, Sture Hansson, and Jason Stockwell.
Declines of Mysis diluviana in the Great Lakes
Journal of Great Lakes Research
N/A
2015
- Authored with Peter Euclide and Jason Stockwell.