Efficiently inferring community structure in bipartite networks

# Efficiently inferring community structure in bipartite networks
## Daniel B. Larremore, Aaron Clauset, and Abigail Z. Jacobs
### Presented by: Nick Strayer
### 2019-03-13

---

## Presentation Layout

- Network Refresher
- What are they trying to do?
- Why are they trying to do it?
- The "old" ways
- The SBM
- Degree Correction
- The BiSBM
- Results
- My thoughts

![:space 8]
You can find these slides at:![:small 0.9](
[nickstrayer.me/journal_club_bipartite_sbm_larremore_et_al](http://nickstrayer.me/journal_club_bipartite_sbm_larremore_et_al/)
)

..and the raw materials on github at  ![:small 0.9](
[github.com/nstrayer/journal_club_bipartite_sbm_larremore_et_al](github.com/nstrayer/journal_club_bipartite_sbm_larremore_et_al)
)

---

# Network Refresher

---

![:centerPic 90](photos/vertex_defn.jpg)

---

![:centerPic 90](photos/edge_defn.jpg)

---

![:centerPic 90](photos/degree_defn.jpg)

---

![:centerPic 90](photos/bipartite_defn.jpg)

---

# What are they trying to do?

![:space 7]

> This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results.

![:space 10]

Build a model for detecting communities of nodes that...
- doesn't need to project data to unipartite structure
- is statistically 'principled'
- and is interpretable

---

# Why are they trying to do it?

No projections: 
> Using projections creates both practical and principled issues... tend to inflate measures such as assortativity and the clustering coefficient... loss of information... projection of a highly structured bipartite network can appear unstructured

![:space 5]

Statistically Principled and interpretable:
> [Non-SBM attempts] express implicit modeling restrictions and assumptions in their output... Stochastic block models have the advantage of explicitly stating the underlying assumptions, which improves the interpretability of the results.

---

# The "old" ways

![:space 15]

- Project bipartite network into a unipartite one and cluster on that
![:space 5]

- Cluster on the bipartite network using matrix decomposition methods
![:space 5]

- Use complex and black-box optimization methods

---

# The SBM

- Models the community structure of nodes by collapsing them into blocks
- Defines a probability distribution over all possible community forms
  - Therefor is a 'generative' model

![:centerPic 55](photos/sbm_explained.png)

---

## SBM formulation

![:centerPic 100](photos/sbm_formula.jpg)

---

# Degree Correction

![:space 1]

### Why

Looking just at the average number of connections between groups means the model will cluster popular nodes with each other. This means a node that is noisey but popular can mess stuff up.

![:space 6]

### How

Introduce a new parameter `\(\theta_i\)` which controls the expected degree of each node. Thus changing the `\(\lambda\)` of the Poisson from `\(\omega_{i,j}\)` to `\(\theta_i\theta_j\omega_{i,j}\)`.

---

# The BiSBM

### How
By not letting the SBM place nodes of the same type in a cluster together we force it to take a bipartite form.

### What changes
`\(k\)` groups `\(\to\)` `\(k_a\)` and `\(k_b\)` groups.

Full `\(\omega\)` matrix `\(\to\)` block matrix with upper right partition `\(B\)`: `\((k_a \times k_b)\)`

### What doesn't change

This structure forcing amounts to an infinitely strong prior of group membership between clusters. So posterior is effectively the same.

---

## BiSBM formulation

![:centerPic 120](photos/dcbisbm_formula.jpg)

---

## How it's fit

![:space 3]

1. Assign a random group to all nodes
2. Take node `\(i\)` and move it to another group of the same type and record posterior probability
3. Repeat step 2 for every available group of the same type
4. Repeat steps 2 and 3 for every node in the network
5. Choose move that yielded the best posterior prob (this may be a decline) and change group assignments accordingly
6. Repeat steps 2 through 5 until convergence

![:space 8]

> The algorithm should be run many times and the highest score from among these independent replicates selected.

---

# Results

### What
They tested the model on simulated data and on real world well characterized networks

### How
Comparisons are made between the BiSBM and the plain SBM running on unipartite projections.

### Why
Due to how existing bipartite methods work they couldn't directly compare them, hence the projections.

---

## How they measured performance for simulated data

![:space 3]

- Used mutual information between truth and model results to measure 'degree to which knowledge of one partition let's us predict the other partition'.

![:space 3]

- To get rid of influence of the size of the group they used a normalized version of mutual information.
  - `\(I_{\text{norm}}(X,Y) = 1, \text{iff } X = Y\)`
  - `\(I_{\text{norm}}(X,Y) = 0\)` when `\(X\)` and `\(Y\)` are completely uncorrelated.

![:space 3]

- Compared bipartite cluster choice of one type to unipartite projection on that type.
---

![:centerPic 100](photos/mutual_info.jpg)

---

## Easy simulation

- Non-overlapping structure with same number of clusters for both types
- Uniform degree distribution among nodes

![:space 5]

![:centerPic 70](photos/easy_sim.jpg)

---

## Hard simulation

- Overlapping structure with differing number of clusters for both types
- Wide degree distribution among nodes

![:space 5]
![:centerPic 70](photos/hard_sim.jpg)

---

## Real data

![:space 20]
- No 'truth' to compare to so well characterized networks are used
- Used three networks that differed in size greatly
- Results 'make sense'

---

## Southern Womens Cohort

- __18__ women attended __14__ events
- The `hello_world` of bipartite networks
- Exactly matched the expert-derived partitions

![:space 7]

![:centerPic 60](photos/swc_plot.png)

---

## Malaria genes

- __297__ genes and __806__ genetic substrings
- Wide degree distributions
- Clusters seemed biologically meaningful when DC was used
  
![:space 7]

![:centerPic 90](photos/malaria_plot.png)

---

## IMDB
  
- __53,158__ actors and __39,786__ movies
- Partitions correlated almost directly with language and genre

![:space 7]

![:centerPic 60](photos/imdb_plot.png)

---

# My thoughts

![:space 3]
- Lots of the limitations in this work have been fixed
  - The ability to let the model choose number of partitions 
  - Sampling inefficiency
  - Hierarchichal structure
  
- The difficulty with networks (and all unsupervized learning really) is determining how well they work in the real world

- Thinking about high dimensional data like this gets around a lot of problems we have in classic statistics methods
  - Curse of dimensionality
  - Intepretability...