class: center, middle, inverse, title-slide # Taking a network view of EHR and Biobank data to find explainable multivariate patterns ### Nick Strayer ### 2019-04-03 --- # Talk outline ![:space 6]() - Electronic medical records - PheWAS - Changing how we  these data - Multimorbidity explorer - Changing how we  these data - The stochastic block model - Applying to real data - Future directions --- # Electronic medical records (EHR) ![:space 1]() In an effort to make healthcare more efficient EHR systems have become common in the US. ![:space 3]() .pull-left[
While originally made for billing purposes there is still a huge sum of information that, with careful effort, _hopefully_, can be extracted for research. ] .pull-right[
In this presentation I will focus on the subset of EHR pertaining to billing codes: ICD9, ICD10, and Phecodes. ] ![:space 4]()  ) --- # Biobanks ![:space 6]
Some hospitals have repositories of biological samples that can be matched to their EHR. ![:space 8]
Data could be anything from plain unprocessed-plasma all the way to full single-cell sequencing. ![:space 8]
Here I will focus on plain SNP-chip readings, aka presence or absence of a given marker at multiple points on the genome. --- # PheWAS ![:space 5] In an effort to extract information from these data the technique PheWAS was made. ![:space 8]  --- ## Concept ![:space 5]  --- ## The univariate problem ![:space 15]()
PheWAS looks at one genotype
phenotype association at a time. ![:space 12]() .pull-left[
This gives us the multiple-comparisons problem. ] .pull-right[
Also, does the world work like this? ] --- # Changing how we  these data  --- # Multimorbidity explorer ![:space 14]()  --- ## What it is Application that allows researchers to explore the results of PheWAS studies along with investigating individual-level data that produced those results using 
--- ## Why it is ![:space 15]() .pull-left[ ####
Interact with results PheWAS results are typically delivered with static plots and tables. ME allows researchers to interact with those results. ] .pull-right[ ####
Expand past plain associations By giving researcher's the ability to look at the network behavior of genotype-phenotype associations, it can provide more nuanced insights from the data than a table. ] --- ## How it is ![:space 4]()
Central package contains all the building blocks of a typical ME deployment. ![:indent]()
Visualizations custom built in Javascript. ![:space 5]()
Creating a custom app is an exercise in combining the neccesary components. ![:space 5]()
Hosted on lab RStudio Connect instance running on AWS. ![:indent]()
If more security is desired, Docker image is available. --- class: center, middle ## Demo --- # Changing how we  these data ![:space 8]()
If visualizing the data as a network can help us understand them, why not model them as a network, too? ![:space 4]()
Mathematical and statistical models of networks are a diverse and booming field. ![:space 4]()
Unfortunately, many of the methods sit on fragile methodological foundations. ![:indent]()
Luckily, a new model type has taken the stage recently... --- ## The tochastic lock odel ![:space 3]()  --- ## SBM formula  --- ## Bipartite expansion ![:space 2]() In its basic form the SBM only works with a single type of node, but we have two (phenotype and patients)... ![:space 4]()  ![:space 6]()
To fix this we can add a constraint to the model to only cluster nodes of the same type. ![:indent]() <img src='figures/bell-curve.svg' width = 40px/> This is equivalent to an infinitely strong prior on cluster 'types'. --- ## partite  formula  --- ## Statistical features Model is fit in a bayesian manor  ![:space 2]() .pull-left[ ### Priors  Priors are set on both the group/cluster structure and the edge counts between groups. ] .pull-right[ ### Fitting .center[
] Metropolis-Hastings is used to fit model using MCMC Very similar to how a dirichlet mixture model is fit ] --- ### Prior on clusters  --- ### Prior on edge counts  --- ### MCMC acception formula  --- ## Simulation results ![:space 2]()  The BiSBM outperforms dirichlet mixture models even when the true generating distribution is dirichlet. ![:space 3]() .right[  ] --- class: center, middle # Applying in the real world --- ### Looking for [patterns in Myeloid disease](  --- ### Investigating large-scale patient structure 55k patients over 1500 phecodes, dimension reduced using BiSBM, visualized with UMAP.  --- # Future directions ![:space 18]() - Simulations! - Turning into a semi-supervized method - More battle-testing with real data - Optimizing visualizations for information transfer --- # Aknowledgements ![:space 2]() A thanks to those who have helped build these ideas to this point - Yaomin Xu - TBILab members - Wells Lab - Savona Lab - Travis Spaulding - Denny Lab - Vanderbilt drug repurposing group - Vanderbilt PheWAS group ![:space 7]() And thanks to the funding sources who have paid me to play around with javascript - NIH Big Data to Knowledge training grant - Vanderbilt Biostatistics department 2018 development grant --- # References These slides: ![:indent]()
[]( ![:indent]()
[]( ![:space 8]() Relevant papers: -  -  -  -