As electronic health records (EHR) and biobanks have grown in popularity, so has the amount of data available to discover relationships between a patient’s genotype and phenotype.
EHRs contain vast quantities of information for individual patients. One useful set of information is ICD9 and ICD10 codes. These are used to keep track of billable actions. The PheCode is a mapping of ICD codes to phenotype (or patient conditions) for research purposes. E.g. Phecode 360.2: Progressive Myopia. Additionally, biobanks provide data on a patient’s biomarkers such as genetic mutations (aka their genotype.)
Developed at Vanderbilt, a popular method for linking genotypes with phenotypes is the Phenome-Wide Association Study (PheWAS). PheWAS uses EHR data to produce a list of phenotypes significantly associated with a pre-specified genotype.
ME is a shiny application that allows researchers to explore the results of PheWAS studies along with investigating individual-level data that produced those results.
PheWAS results are typically delivered with static plots and tables. ME allows researchers to interact with those results.
By giving researcher’s the ability to look at the network behavior of genotype-phenotype associations, ME allows for more nuanced insights from data than a single P-Value can provide.
ME is currently being used in the following scenarios:
All plots are custom-built interactive javascript visualizations made with the help of the package r2d3. Visualizations use both d3.js and three.js for efficient rendering of large amounts of data.
To ease the creation of new versions for different use-cases, a helper package meToolkit() with shiny modules was built. Standardized input and output allow easy swapping and testing of app components.
All coding done in RStudio Server Pro hosted on AWS EC2. The app is a Shiny app in dashboard format thanks to Shinydashboard. Custom interactive plots are built with d3.js and called from R using R2D3.
Data for the app is managed using Hive running on AWS Athena. The larger-than-memory datasets are stored in Apache Parquet files for efficient queries.
Completed apps are most frequently hosted on lab’s RStudio Connect server. Occasionally, apps are run locally using Docker containers for speed and security reasons.
This research and application are made possible due to the following support:
CTSA award No. UL1 TR002243 from the National Center for Advancing Translational Sciences.
Many thanks to those that helped support this research:
Quinn Wells, Pharm.D., M.D. | Michael R. Savona, M.D.
Joshua C. Denny, MD, MS | Vanderbilt Drug Repurposing program
When presented at RStudio::conf 2019 this tab contained a locally-running version of the app.
For a realtime demo of the app you can clone the app from the github repo.
To run the app using docker to avoid dependency installation woes the provided docker image and it’s installation instructions will help you!