R/subgraphs_functions.R
calculate_subgraph_structure.Rd
Given a dataframe of edges with strength between nodes this function returns info on every subgraph state achieved by adding nodes in one-at-a-time in descending order of strength.
calculate_subgraph_structure( association_pairs, strength_column = "strength", return_subgraph_membership = FALSE )
association_pairs | dataframe with columns |
---|---|
strength_column | Id of column that encodes the strength of the associations for pairs |
return_subgraph_membership | Should a column with a named integer vector of each node's subgraph membership be returned? This can be useful for comparing consistency of structure across different networks etc. but comes at the cost of speed and memory usage. |
Dataframe with the following columns for each unique subgraph state.
Column | Description | |
step | Integer step, aka the number of unique edge strengths in network at state | |
n_edges | How many edges have been added thus far | |
strength | The lowest strength of the edge(s) added | |
n_nodes_seen | How many unique nodes/variables are currently in network | |
n_subgraphs | How many isolated subgraphs/components of size > 2 are in current network | |
n_triples | How many isolated subgraphs of size 3 or larger are in current network | |
max_size | Size in number of nodes of the largest current subgraph | |
max_rel_size | Proportion of all seen nodes (n_nodes_seen ) that the largest subgraph includes. Large values indicate presence of a giant-component. | |
avg_size | Average size of subgraphs | |
avg_density | Average density of subgraphs. Scale from >0 - 1. Where a density of 1 is a fully-connected subgraph. | |
subgraphs | List column of summary stats for each subgraph at a given step. See the subgraph list column section for more info. |
subgraph
list columnThe subgraph list column in the results contains information on the present subgraphs at each step. It is a list with the following format, but can be turned into a dataframe/tibble easily with dplyr::as_tibble/as.data.frame
.
Column | Description | |
id | Integer ID for subgraph. Can be used to track subgraph evolution over steps. | |
size | How many variables/nodes subgraph has | |
density | Density of subgraph. Scale from >0 - 1. Where a density of 1 is a fully-connected subgraph. | |
strength | How many unique nodes/variables are currently in network | |
first_edge | 0-based integer index of first edge that made up subgraph. Used internally to match subgraphs in interactive visualizations with these results. |
virus_associations <- dplyr::arrange(virus_net, dplyr::desc(strength)) calculate_subgraph_structure(head(virus_associations, 1000))#> # A tibble: 327 x 11 #> step n_edges strength n_nodes_seen n_subgraphs max_size rel_max_size #> <int> <int> <dbl> <int> <int> <int> <dbl> #> 1 1 1 0.0916 2 1 2 1 #> 2 2 2 0.0836 3 1 3 1 #> 3 3 3 0.0750 5 2 3 0.6 #> 4 4 4 0.0729 6 2 3 0.5 #> 5 5 5 0.0728 6 2 3 0.5 #> 6 6 6 0.0704 7 2 4 0.571 #> 7 7 7 0.0689 8 2 4 0.5 #> 8 8 8 0.0676 9 2 5 0.556 #> 9 9 10 0.0651 12 3 5 0.417 #> 10 10 11 0.0623 14 4 5 0.357 #> # … with 317 more rows, and 4 more variables: avg_size <dbl>, #> # avg_density <dbl>, n_triples <int>, subgraphs <list># We can also return each nodes membership at every step # although it will slow things down a bit calculate_subgraph_structure(head(virus_associations, 1000), return_subgraph_membership = TRUE)#> Warning: first element used of 'length.out' argument#> Error in seq_len(nrow(res$subgraph_membership)): argument must be coercible to non-negative integer