Given a dataframe of edges with strength between nodes this function returns info on every subgraph state achieved by adding nodes in one-at-a-time in descending order of strength.

calculate_subgraph_structure(
  association_pairs,
  strength_column = "strength",
  return_subgraph_membership = FALSE
)

Arguments

association_pairs

dataframe with columns a and b representing the ids of the variables or nodes and columns strength that is a numeric indicator of strength of association (higher = stronger).

strength_column

Id of column that encodes the strength of the associations for pairs

return_subgraph_membership

Should a column with a named integer vector of each node's subgraph membership be returned? This can be useful for comparing consistency of structure across different networks etc. but comes at the cost of speed and memory usage.

Value

Dataframe with the following columns for each unique subgraph state.

ColumnDescription
stepInteger step, aka the number of unique edge strengths in network at state
n_edgesHow many edges have been added thus far
strengthThe lowest strength of the edge(s) added
n_nodes_seenHow many unique nodes/variables are currently in network
n_subgraphsHow many isolated subgraphs/components of size > 2 are in current network
n_triplesHow many isolated subgraphs of size 3 or larger are in current network
max_sizeSize in number of nodes of the largest current subgraph
max_rel_sizeProportion of all seen nodes (n_nodes_seen) that the largest subgraph includes. Large values indicate presence of a giant-component.
avg_sizeAverage size of subgraphs
avg_densityAverage density of subgraphs. Scale from >0 - 1. Where a density of 1 is a fully-connected subgraph.
subgraphsList column of summary stats for each subgraph at a given step. See the subgraph list column section for more info.

subgraph list column

The subgraph list column in the results contains information on the present subgraphs at each step. It is a list with the following format, but can be turned into a dataframe/tibble easily with dplyr::as_tibble/as.data.frame.

ColumnDescription
idInteger ID for subgraph. Can be used to track subgraph evolution over steps.
sizeHow many variables/nodes subgraph has
densityDensity of subgraph. Scale from >0 - 1. Where a density of 1 is a fully-connected subgraph.
strengthHow many unique nodes/variables are currently in network
first_edge0-based integer index of first edge that made up subgraph. Used internally to match subgraphs in interactive visualizations with these results.

Examples

virus_associations <- dplyr::arrange(virus_net, dplyr::desc(strength)) calculate_subgraph_structure(head(virus_associations, 1000))
#> # A tibble: 327 x 11 #> step n_edges strength n_nodes_seen n_subgraphs max_size rel_max_size #> <int> <int> <dbl> <int> <int> <int> <dbl> #> 1 1 1 0.0916 2 1 2 1 #> 2 2 2 0.0836 3 1 3 1 #> 3 3 3 0.0750 5 2 3 0.6 #> 4 4 4 0.0729 6 2 3 0.5 #> 5 5 5 0.0728 6 2 3 0.5 #> 6 6 6 0.0704 7 2 4 0.571 #> 7 7 7 0.0689 8 2 4 0.5 #> 8 8 8 0.0676 9 2 5 0.556 #> 9 9 10 0.0651 12 3 5 0.417 #> 10 10 11 0.0623 14 4 5 0.357 #> # … with 317 more rows, and 4 more variables: avg_size <dbl>, #> # avg_density <dbl>, n_triples <int>, subgraphs <list>
# We can also return each nodes membership at every step # although it will slow things down a bit calculate_subgraph_structure(head(virus_associations, 1000), return_subgraph_membership = TRUE)
#> Warning: first element used of 'length.out' argument
#> Error in seq_len(nrow(res$subgraph_membership)): argument must be coercible to non-negative integer