class: center, middle, inverse, title-slide # Making Magic with Keras and Shiny ## An exploration of Shiny’s position in the data science pipeline ### Nick Strayer ### 2018/01/22 --- # Me .pull-left[ - [@NicholasStrayer](https://twitter.com/NicholasStrayer) - [nickstrayer.me](http://nickstrayer.me/) - [LiveFreeOrDichotomize](http://livefreeordichotomize.com/) ] .pull-right[ - PhD Candidate in Biostatistics at Vanderbilt University - Previously: - Johns Hopkins Data Science Lab - New York Times graphics department - Data visualization "engineer" ] # Slides .center[ <p style = "font-size:34px"> nickstrayer.me/dataDayTexas </p> ] --- class: middle # Outline Three sections 1. What... 2. Why... 3. How... ... I did --- class: center, middle # What? --- class: middle # Shiny app for casting spells .center[ <img src = "http://digitalspyuk.cdnds.net/16/46/480x198/gallery-1479227655-tumblr-inline-n9laetes451qh7ojh.gif"/> <a href = "http://digitalspyuk.cdnds.net/16/46/480x198/gallery-1479227655-tumblr-inline-n9laetes451qh7ojh.gif">source</a> ] > Any sufficiently advanced technology is indistinguishable from magic. - Arthur C. Clark --- class: center, middle # Demo _In which the rest of the talk is invalidated..._ --- class: middle # Three main steps: 1. Shiny app for gathering data 2. Modeling data with Keras 3. Shiny app for presenting data --- class: center, middle # Why? --- class: middle <img src = "https://raw.githubusercontent.com/nstrayer/shinysense_useR2017/master/screenshots/data_science_flow_goal.png" /> I want to show Shiny can fit into the datascience workflow from the begining to the end. Traditionally shiny has been used to show off results afterwards, but that's missing a huge portion of its potential. --- # Shiny for gathering data Who is better at knowing what they want their data to look like more than the ones who are going to be using it? .center[ <img src = "http://www.michaelbunker.com/wp-content/uploads/sites/9/2015/06/wheat8.jpg"/ height = 250> <a href = "http://www.michaelbunker.com"/><span size = 8>source</span></a> ] Many people like statisticians are given entire courses on how to set up things like clinical trials, but datascience often assumes the data has magically appeared. --- # Shiny for presenting models This is the traditional way shiny is used, and there's a reason for that: it's great. .center[ <img src = "model_plot.svg" height = 250/> ] Instead of having to get fancy with learning how to port tensorflow models to Swift or Java you can just throw a predict function inside of a simple app. _How well can this scale?_ --- class: center, middle # How? --- class: middle .pull-left[ # Data gathering app We need to... - Gather data from accelerometer. - Visualize results so that I could make sure I wasn't recording bad data. - Save data somewhere I could use it. ] .pull-right[ <img src = 'gathering.png' height = 550/> ] --- class: middle # Shiny .pull-left[ > Shiny is an R package that makes it easy to build interactive web apps straight from R. ] .pull-right[ <img src = "http://hexb.in/vector/shiny.svg" height = 300/> ] [RStudio](https://shiny.rstudio.com/) --- class: middle #Gathering data from accelerometer Used my package `shinysense`. .center[ <img src = 'shinysenseslim.png' height = 150 /> ] Contains functions to bring data into Shiny apps. One of these data sources happens to be accelerometer data. --- # Movr .pull-left[ __ui.R__ ```r shinymovrUI("movr_button") ``` ] .pull-right[ __server.R__ ```r movement <- callModule( shinymovr, "movr_button") ``` ] Binds to the `devicemotion` api in javascript to pull in motion data. ```js window.addEventListener("deviceorientation", handleMovementFunc); ``` <div class="garden"> <div class="ball"><img src = "http://hexb.in/vector/shiny.svg" height = 100/> </div> </div> --- # Visualizing data Not that hard as `shinysense::movr` returns an observable that can be used like any other shiny object. .pull-left[ ```r movement() %>% ggplot(aes( x = time, y = accel )) + geom_line(aes( color = direction )) + ... ``` ] .pull-right[ <img src = 'visualized_app.png' height = 300/> ] Added a delete button if bad data was detected. --- class: middle # Saving the data Two options: 1. Use shinyapps.io and do a dropbox or similar upload 2. Just host the app from my own server - This was easier. - Better for other reasons I will get to later as well. Luckily the RStudio Server makes this crazy easy. ```r write_csv(rvs$movements, paste0(sessionId, '_movement.csv')) ``` --- class: center, middle # Data exploration/ visualization --- # Getting data all in By saving all of the data to the home repo as unique id'd `.csv`s all I have to do is read them in with a single `purrr` line. ```r # get the files in our directory all_files <- list.files('gather_data') # just get the csvs csv_names <- all_files[grepl('.csv', all_files)] # load all data into a dataframe data <- csv_names %>% map_df(read_csv) ``` By far the best thing about this whole project. --- # Visualizing all data for trends ```r ggplot(data, aes(x = time, y = accel, group = recording_id)) + geom_line(aes(color = direction), alpha = 0.15) + facet_wrap(~label) + theme_void() + guides(color = FALSE) + theme(strip.text = element_text(family = "Amatic SC", size = 18)) ``` ![](index_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- class: center, middle # Modeling --- # Converting data to tensors .pull-left[ Currently the largest sticking point in the R world for deep learning. (_opinion_) We (R users) like to think flat and tidy, tensors do not. ] .pull-right[ <img style="margin-top:-55px" src = 'http://www.sunlab.org/teaching/cse6250/fall2017/lab/image/DL/tensor.png' height = 220/> ] ```r directions <- c('m_x','m_y','m_z') wide_data <- data %>% # take data and make it wide. Et tu brutus? spread(direction, accel) %>% # group into each individual spell cast group_by(label, recording_id) %>% # convert each grouping into a list with a matrix inside. do( data = as.matrix(.[directions]) ) ``` One-hot encoding is even more messy at this point. --- class: middle # Building Keras model <p style='padding-bottom: 25px;'> When building a deep learning model you need to ask yourself three things </p> .pull-left[ 1. What architecture fits my problem/limitations best? 2. How do I stack my layers? 3. Couldn't KNN do this better? ] .pull-right[ <img src = "https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png"/> ] --- # Architecture .pull-left[ For sequential data in deep learning there are two options, recurrent and convolutional. Reccurent is fantastic for learning time dependent patterns but are slow. Convolutional is great at recognizing patterns and are fast, but bad at time dependent patterns. Since I don't have a ton of compute and accelerometer data is often pattern recognized I chose a CNN. ] .pull-right[ <img style='margin-top: -95px' src = "modelShape.svg" height = 550 /> ] --- # R's Keras vs Python's This is where R shines in deep learning. The pipe operator turns the sequential API into a much more readable format than in python. .pull-left[ __R__ ```r model <- keras_model_sequential() model %>% layer_conv_1d(...) %>% layer_batch_normalization() %>% layer_global_max_pooling_1d() %>% layer_dropout(0.4) %>% layer_dense( num_classes, activation = 'softmax' ) ``` ] .pull-right[ __Python__ ```python model = Sequential() model.add(conv_1d(...)) model.add(batch_normalization) model.add(global_max_pooling_1d) model.add(dropout(0.4)) model.add(dense( num_classes, activation = 'softmax' )) ``` ] --- # Saving model for future use <style> .small_code { font-size: 0.85em; } </style> You can dump the keras model as its own file to be loaded up and instantly used later. .pull-left[ __Now__ .small_code[ ```r model %>% save_model_hdf5('model.h5') ``` ] ] .pull-right[ __Later__ .small_code[ ```r model <- load_model_hdf5('model.h5') ``` ] ] .center[ <img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Now-and-Laters.jpg/1200px-Now-and-Laters.jpg" width = 25%/> ] While you technically could do this with any model using `saveRds`, it's nice to have it be a first-class citizen. --- class: center, middle # Final Shiny app --- # Wire in Keras You simply have to plop it in like any other package. ```r library(keras) ``` Make sure to turn off GPU mode! You don't need the speed. ```r use_session_with_seed( 42, disable_gpu = TRUE, disable_parallel_cpu = FALSE ) ``` If running on a server that you use to train this can cause fun crashes due to memory access permissions. --- class: middle # Testing it .pull-left[ __Problem:__ _Need to test the app on a phone, but how to do that?_ ] .pull-right[ __Solution:__ _Use ShinyServer hosted on your own computer! _ ] <div style = 'padding-top: 70px'> </div> By hosting your own server this process is way faster. Simply save your app's files and refresh your page and it's updated! Massively speeds up the iteration process. --- # Making it look pretty _The most important part_ It's very easy to add some CSS to your app. ```r tags$style(HTML(" @import url('//fonts.googleapis.com/css?family=Amatic+SC|Josefin+Slab'); h2, h3, button { font-family: 'Amatic SC', cursive; } h2 { font-size: 50px; } p { font-family: 'Josefin Slab', cursive; font-size: 18px; } button { font-size: 28px; }")) ``` --- .pull-left[ __Before__ <img src = 'ugly_final.png' width = 300px/> ] .pull-right[ __After__ <img src = 'pretty_final.png' width = 300px/> ] --- class:middle # Deploying it Do we want to kill our server? .pull-left[ __Yes__ Leave it on personal server. ] .pull-right[ __No__ Send it to shinyapps.io ] <div style='text-align: center; padding-top: 65px'> <a href = 'https://jhubiostatistics.shinyapps.io/cast_spells/'> jhubiostatistics.shinyapps.io/cast_spells/ </a> </div> --- # Thanks To - [Johns Hopkins Data Science Lab](http://jhudatascience.org/) - Support in developing `shinysense` and all the hosting time - [Lynn Bender](https://www.linkedin.com/in/lynnbender/) - For organizing this conference - [RStudio](https://www.rstudio.com/) - For all the amazing packages that they put out. [github repo](https://github.com/nstrayer/spell_casts)