R/setse_bicomp.R
setse_bicomp.Rd
Embeds/smooths a feature network using the SETSe algorithm automatically finding convergence parameters using a grid search. In addition it breaks the network into bi-connected component solves each sub-component inidividually and re-assembles them back into a single network. This is the most reliable method to perform SETSe embeddings and can be substantially quicker on certain network topologies.
setse_bicomp( g, force = "force", distance = "distance", edge_name = "edge_name", k = "k", tstep = 0.02, tol = 0.01, max_iter = 20000, mass = NULL, sparse = FALSE, sample = 100, static_limit = NULL, hyper_iters = 100, hyper_tol = 0.1, hyper_max = 30000, drag_min = 0.01, drag_max = 100, tstep_change = 0.2, verbose = FALSE, noisy_termination = TRUE )
g | An igraph object |
---|---|
force | A character string. This is the node attribute that contains the force the nodes exert on the network. |
distance | A character string. The edge attribute that contains the original/horizontal distance between nodes. |
edge_name | A character string. This is the edge attribute that contains the edge_name of the edges. |
k | A character string. This is k for the moment don't change it. |
tstep | A numeric. The time interval used to iterate through the network dynamics. |
tol | A numeric. The tolerance factor for early stopping. |
max_iter | An integer. The maximum number of iterations before stopping. Larger networks usually need more iterations. |
mass | A numeric. This is the mass constant of the nodes in normalised networks. Default is set to NULL and call mass_adjuster to set the mass for each biconnected component |
sparse | Logical. Whether sparse matrices will be used. This becomes valuable for larger networks |
sample | Integer. The dynamics will be stored only if the iteration number is a multiple of the sample. This can greatly reduce the size of the results file for large numbers of iterations. Must be a multiple of the max_iter |
static_limit | Numeric. The maximum value the static force can reach before the algorithm terminates early. This prevents calculation in a diverging system. The value should be set to some multiple greater than one of the force in the system. If left blank the static limit is the system absolute mean force. |
hyper_iters | integer. The hyper parameter that determines the number of iterations allowed to find an acceptable convergence value. |
hyper_tol | numeric. The convergence tolerance when trying to find the minimum value |
hyper_max | integer. The maximum number of iterations that SETSe will go through whilst searching for the minimum. |
drag_min | integer. A power of ten. The lowest drag value to be used in the search |
drag_max | integer. A power of ten. if the drag exceeds this value the tstep is reduced |
tstep_change | numeric. A value between 0 and 1 that determines how much the time step will be reduced by default value is 0.5 |
verbose | Logical. This value sets whether messages generated during the process are suppressed or not. |
noisy_termination | Stop the process if the static force does not monotonically decrease. |
A list containing 5 dataframes.
The node embeddings. Includes all data on the nodes the forces exerted on them position and dynamics at simulation termination
The network dynamics describing several key figures of the network during the convergence process, this includes the static_force
memory_df A dataframe recording the iteration history of the convergence of each component.
Time taken. A data frame giving the time taken for the simulation as well as the number of nodes and edges. Node and edge data is given as this may differ from the total number of nodes and edges in the network depending on the method used for convergence. For example if setse_bicomp is used then some simulations may contain as little as two nodes and 1 edge
The edge embeddings. Includes all data on the edges as well as the strain and tension values.
Embedding the network by solving each bi-connected component then re-assembling can be faster for larger graphs, graphs with many nodes of degree 2, or networks with a low clustering coefficient. This is because although SETSe is very efficient the topology of larger graphs make them more difficult to converge. Large graph tend to be made of 1 very large biconnected component and many very small biconnected components. As the mass of the system is concentrated in the major biconnected component smaller ones can be knocked around by minor movements of the largest component. This can lead to long convergence times. By solving all biconnected components separately and then reassembling the block tree at the end, the system can be converged considerably faster.
Setting mass to the absolute system force divided by the total nodes, often leads to faster convergence. As such When mass is left to the default of NULL, the mean absolute force value is used.
Other setse:
setse_auto_hd()
,
setse_auto()
,
setse_expanded()
,
setse()
set.seed(234) #set the random see for generating the network g <- generate_peels_network(type = "E") embeddings <- g %>% prepare_edges(k = 500, distance = 1) %>% #prepare the network for a binary embedding prepare_categorical_force(., node_names = "name", force_var = "class") %>% #embed the network setse_bicomp(., force = "class_A")#>