class: center, middle, inverse, title-slide # Efficiently inferring community structure in bipartite networks ## Daniel B. Larremore, Aaron Clauset, and Abigail Z. Jacobs ### Presented by: Nick Strayer ### 2019-03-13 --- ## Presentation Layout - Network Refresher - What are they trying to do? - Why are they trying to do it? - The "old" ways - The SBM - Degree Correction - The BiSBM - Results - My thoughts ![:space 8] You can find these slides at: ) ..and the raw materials on github at  ) --- class: center, middle # Network Refresher --- class: middle  --- class: middle  --- class: middle  --- class: middle  --- # What are they trying to do? ![:space 7] > This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results. ![:space 10] Build a model for detecting communities of nodes that... - doesn't need to project data to unipartite structure - is statistically 'principled' - and is interpretable --- # Why are they trying to do it? No projections: > Using projections creates both practical and principled issues... tend to inflate measures such as assortativity and the clustering coefficient... loss of information... projection of a highly structured bipartite network can appear unstructured ![:space 5] Statistically Principled and interpretable: > [Non-SBM attempts] express implicit modeling restrictions and assumptions in their output... Stochastic block models have the advantage of explicitly stating the underlying assumptions, which improves the interpretability of the results. --- # The "old" ways ![:space 15] - Project bipartite network into a unipartite one and cluster on that ![:space 5] - Cluster on the bipartite network using matrix decomposition methods ![:space 5] - Use complex and black-box optimization methods --- # The SBM - Models the community structure of nodes by collapsing them into blocks - Defines a probability distribution over all possible community forms - Therefor is a 'generative' model  --- class: middle ## SBM formulation  --- # Degree Correction ![:space 1] ### Why Looking just at the average number of connections between groups means the model will cluster popular nodes with each other. This means a node that is noisey but popular can mess stuff up. ![:space 6] ### How Introduce a new parameter `\(\theta_i\)` which controls the expected degree of each node. Thus changing the `\(\lambda\)` of the Poisson from `\(\omega_{i,j}\)` to `\(\theta_i\theta_j\omega_{i,j}\)`. --- # The BiSBM ### How By not letting the SBM place nodes of the same type in a cluster together we force it to take a bipartite form. ### What changes `\(k\)` groups `\(\to\)` `\(k_a\)` and `\(k_b\)` groups. Full `\(\omega\)` matrix `\(\to\)` block matrix with upper right partition `\(B\)`: `\((k_a \times k_b)\)` ### What doesn't change This structure forcing amounts to an infinitely strong prior of group membership between clusters. So posterior is effectively the same. --- ## BiSBM formulation  --- ## How it's fit ![:space 3] 1. Assign a random group to all nodes 2. Take node `\(i\)` and move it to another group of the same type and record posterior probability 3. Repeat step 2 for every available group of the same type 4. Repeat steps 2 and 3 for every node in the network 5. Choose move that yielded the best posterior prob (this may be a decline) and change group assignments accordingly 6. Repeat steps 2 through 5 until convergence ![:space 8] > The algorithm should be run many times and the highest score from among these independent replicates selected. --- # Results ### What They tested the model on simulated data and on real world well characterized networks ### How Comparisons are made between the BiSBM and the plain SBM running on unipartite projections. ### Why Due to how existing bipartite methods work they couldn't directly compare them, hence the projections. --- ## How they measured performance for simulated data ![:space 3] - Used mutual information between truth and model results to measure 'degree to which knowledge of one partition let's us predict the other partition'. ![:space 3] - To get rid of influence of the size of the group they used a normalized version of mutual information. - `\(I_{\text{norm}}(X,Y) = 1, \text{iff } X = Y\)` - `\(I_{\text{norm}}(X,Y) = 0\)` when `\(X\)` and `\(Y\)` are completely uncorrelated. ![:space 3] - Compared bipartite cluster choice of one type to unipartite projection on that type. --- class: middle  --- ## Easy simulation - Non-overlapping structure with same number of clusters for both types - Uniform degree distribution among nodes ![:space 5]  --- ## Hard simulation - Overlapping structure with differing number of clusters for both types - Wide degree distribution among nodes ![:space 5]  --- ## Real data ![:space 20] - No 'truth' to compare to so well characterized networks are used - Used three networks that differed in size greatly - Results 'make sense' --- ## Southern Womens Cohort - __18__ women attended __14__ events - The `hello_world` of bipartite networks - Exactly matched the expert-derived partitions ![:space 7]  --- ## Malaria genes - __297__ genes and __806__ genetic substrings - Wide degree distributions - Clusters seemed biologically meaningful when DC was used ![:space 7]  --- ## IMDB - __53,158__ actors and __39,786__ movies - Partitions correlated almost directly with language and genre ![:space 7]  --- # My thoughts ![:space 3] - Lots of the limitations in this work have been fixed - The ability to let the model choose number of partitions - Sampling inefficiency - Hierarchichal structure - The difficulty with networks (and all unsupervized learning really) is determining how well they work in the real world - Thinking about high dimensional data like this gets around a lot of problems we have in classic statistics methods - Curse of dimensionality - Intepretability...