Title: | Network Analysis and Community Detection |
---|---|
Description: | Features tools for the network data analysis and community detection. Provides multiple methods for fitting, model selection and goodness-of-fit testing in degree-corrected stochastic blocks models. Most of the computations are fast and scalable for sparse networks, esp. for Poisson versions of the models. Implements the following: Amini, Chen, Bickel and Levina (2013) <doi:10.1214/13-AOS1138> Bickel and Sarkar (2015) <doi:10.1111/rssb.12117> Lei (2016) <doi:10.1214/15-AOS1370> Wang and Bickel (2017) <doi:10.1214/16-AOS1457> Zhang and Amini (2020) <arXiv:2012.15047> Le and Levina (2022) <doi:10.1214/21-EJS1971>. |
Authors: | Arash A. Amini [aut, cre] , Linfan Zhang [aut] |
Maintainer: | Arash A. Amini <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-16 05:38:01 UTC |
Source: | https://github.com/aaamini/nett |
The adjusted spectral goodness-of-fit test based on Poisson DCSBM.
The test is a natural extension on Lei's work of testing goodness-of-fit
for SBM. The residual matrix is computed from the DCSBM estimation
expectation of
A
. To speed up computation, the residual matrix uses Poisson variance instead.
Specifically,
where and
are computed using estim_dcsbm if not provided.
Adjusted spectral test
adj_spec_test( A, K, z = NULL, DC = TRUE, theta = NULL, B = NULL, cluster_fct = spec_clust, ... )
adj_spec_test( A, K, z = NULL, DC = TRUE, theta = NULL, B = NULL, cluster_fct = spec_clust, ... )
A |
adjacency matrix. |
K |
number of communities. |
z |
label vector for rows of adjacency matrix. If not given, will be calculated by the spectral clustering. |
DC |
whether or not include degree correction in the parameter estimation. |
theta |
give the propensity parameter directly. |
B |
give the connectivity matrix directly. |
cluster_fct |
community detection function to get |
... |
additional arguments for |
Adjusted spectral test statistics.
Details of modification can be seen at Adjusted chi-square test for degree-corrected block models, Linfan Zhang, Arash A. Amini, arXiv preprint arXiv:2012.15047, 2020.
The original spectral test is from A goodness-of-fit test for stochastic block models Lei, Jing, Ann. Statist. 44 (2016), no. 1, 401–424. doi:10.1214/15-AOS1370.
Estimate the number of communities under block models by using the spectral properties of network Beth-Hessian matrix with moment correction.
bethe_hessian_select(A, Kmax)
bethe_hessian_select(A, Kmax)
A |
adjacency matrix. |
Kmax |
the maximum number of communities to check. |
A list of result
K |
estimated the number of communities |
rho |
eigenvalues of the Beth-Hessian matrix |
Estimating the number of communities in networks by spectral methods, Can M. Le, Elizaveta Levina, arXiv preprint arXiv:1507.00827, 2015
Compute the block sum of an adjacency matrix given a label vector.
compute_block_sums(A, z)
compute_block_sums(A, z)
A |
adjacency matrix. |
z |
label vector. |
A K x L matrix with (k,l)-th element as
Compute confusion matrix
compute_confusion_matrix(z, y, K = NULL)
compute_confusion_matrix(z, y, K = NULL)
z |
a label vector |
y |
a label vector |
K |
number of labels in both |
A K
xK
confusion matrix between z
and y
Compute the NMI between two label vectors with the same cluster number
compute_mutual_info(z, y)
compute_mutual_info(z, y)
z |
a label vector |
y |
a label vector |
NMI between z
and y
Compute the block sum of an adjacency matrix given a label vector.
estim_dcsbm(A, z)
estim_dcsbm(A, z)
A |
adjacency matrix. |
z |
label vector. |
where is the sum of the elements of
A
in block
specified by labels
,
is the number of nodes in community
according to
and
A list of result
B |
estimated connectivity matrix. |
theta |
estimated node propensity parameter. |
compute BIC score when fitting a DCSBM to network data
eval_dcsbm_bic(A, z, K, poi)
eval_dcsbm_bic(A, z, K, poi)
A |
adjacency matrix |
z |
label vector |
K |
number of community in |
poi |
whether to use Poisson version of likelihood |
the BIC score is calculated by -2*log likelihood minus
BIC score
BIC score is originally proposed in Likelihood-based model selection for stochastic block models Wang, YX Rachel, Peter J. Bickel, The Annals of Statistics 45, no. 2 (2017): 500-528.
The details of modified implementation can be found in Adjusted chi-square test for degree-corrected block models, Linfan Zhang, Arash A. Amini, arXiv preprint arXiv:2012.15047, 2020.
eval_dcsbm_like, eval_dcsbm_loglr
Compute the log likelihood of a DCSBM, using estimated parameters B, theta based on the given label vector
eval_dcsbm_like(A, z, poi = TRUE, eps = 1e-06)
eval_dcsbm_like(A, z, poi = TRUE, eps = 1e-06)
A |
adjacency matrix |
z |
label vector |
poi |
whether to use Poisson version of likelihood |
eps |
truncation threshold for the Bernoulli likelihood, used when parameter phat is close to 1 or 0. |
The log likelihood is calculated by
where ,
is calculated by estim_dcsbm,
is the proportion of nodes in community k.
log likelihood of a DCSBM
eval_dcsbm_loglr, eval_dcsbm_bic
Computes the log-likelihood ratio of one DCSBM relative to another, using
estimated parameters B
and theta
based on the given label vectors.
eval_dcsbm_loglr(A, labels, poi = TRUE, eps = 1e-06)
eval_dcsbm_loglr(A, labels, poi = TRUE, eps = 1e-06)
A |
adjacency matrix |
labels |
a matrix with two columns representing two different label vectors |
poi |
whether to use Poisson version of likelihood (instead of Bernoulli) |
eps |
truncation threshold for the Bernoulli likelihood, used when parameter phat is close to 1 or 0. |
The log-likehood ratio is computed between two DCSBMs specified by the columns
of labels
. The function computes the log-likelihood ratio of the model with
labels[ , 2]
w.r.t. the model with labels[ , 1]
. This is often used with two
label vectors fitted using different number of communities (say K
and K+1
).
When poi
is set to TRUE
, the function uses fast sparse matrix computations
and is scalable to large sparse networks.
log-likelihood ratio
eval_dcsbm_like, eval_dcsbm_bic
Extract the largest connected component of a network
extract_largest_cc(gr, mode = "weak")
extract_largest_cc(gr, mode = "weak")
gr |
The network as an igraph object |
mode |
Type of connected component ("weak"|"strong") |
An igraph object
Extract a low-degree connected component of a network
extract_low_deg_comp(g, deg_prec = 0.75, verb = FALSE)
extract_low_deg_comp(g, deg_prec = 0.75, verb = FALSE)
g |
The network as an igraph object |
deg_prec |
The cut-off degree percentile |
verb |
Whether to be verbose (TRUE|FALSE) |
An igraph object
The Conditional Pseudo-Likelihood (CPL) algorithm for fitting degree-corrected block models
fast_cpl(Amat, K, ilabels = NULL, niter = 10)
fast_cpl(Amat, K, ilabels = NULL, niter = 10)
Amat |
adjacency matrix of the network |
K |
desired number of communities |
ilabels |
initial label vector (if not provided, initial labels are estimated using spec_clust) |
niter |
number of iterations |
The function implements the CPL algorithm as described in the paper below. It
relies on the mixtools
package for fitting a mixture of multinomials to a
block compression of the adjacency matrix based on the estimated labels and
then reiterates.
Technically, fast_cpl
fits a stochastic block model (SBM) conditional on
the observed node degrees,to account for the degree heterogeneity within
communities that is not modeled well in SBM. CPL can also be used to
effectively estimate the parameters of the degree-corrected block model
(DCSBM).
The code is an adaptation of the original R code by Aiyou Chen with slight simplifications.
Estimated community label vector.
For more details, see Pseudo-likelihood methods for community detection in large sparse networks, A. A. Amini, A. Chen, P. J. Bickel, E. Levina, Annals of Statistics 2013, Vol. 41 (4), 2097—2122.
head(fast_cpl(igraph::as_adj(polblogs), 2), 50)
head(fast_cpl(igraph::as_adj(polblogs), 2), 50)
Samples an adjacency matrix from a stochastic block model (SBM)
fast_sbm(z, B)
fast_sbm(z, B)
z |
Node labels ( |
B |
Connectivity matrix ( |
The function implements a fast algorithm for sampling sparse SBMs, by only sampling the necessary nonzero entries. This function is adapted almost verbatim from the original code by Aiyou Chen.
An adjacency matrix following SBM
B = pp_conn(n = 10^4, oir = 0.1, lambda = 7, pri = rep(1,3))$B head(fast_sbm(sample(1:3, 10^4, replace = TRUE), B))
B = pp_conn(n = 10^4, oir = 0.1, lambda = 7, pri = rep(1,3))$B head(fast_sbm(sample(1:3, 10^4, replace = TRUE), B))
Creates a randomly permuted DCPP connectivity matrix with a given average expected degree
gen_rand_conn(n, K, lambda, gamma = 0.3, pri = rep(1, K)/K, theta = rep(1, n))
gen_rand_conn(n, K, lambda, gamma = 0.3, pri = rep(1, K)/K, theta = rep(1, n))
n |
number of nodes |
K |
number of communities |
lambda |
expected average degree |
gamma |
a measure of out-in-ratio (convex combination parameter) |
pri |
the prior on community labels |
theta |
node connection propensity parameter of DCSBM, by default E(theta) = 1 |
The connectivity matrix is a convex combination of a random symmetric permutation matrix and the matrix of all ones, with weights gamm and 1-gamma.
connectivity matrix B of the desired DCSBM.
Calculate the expected average degree of a DCSBM
get_dcsbm_exav_deg(n, pri, B, ex_theta = 1)
get_dcsbm_exav_deg(n, pri, B, ex_theta = 1)
n |
number of nodes |
pri |
distribution of node labels (K x 1) |
B |
connectivity matrix (K x K) |
ex_theta |
expected value of theta |
expected average degree of a DCSBM
Convert label matrix to vector
label_mat2vec(Z)
label_mat2vec(Z)
Z |
a cluster assignment matrix |
A label vector that follows the assignment matrix
Convert label vector to matrix
label_vec2mat(z, K = NULL, sparse = FALSE)
label_vec2mat(z, K = NULL, sparse = FALSE)
z |
a label vector |
K |
number of labels in |
sparse |
whether the output should be sparse matrix |
A cluster assignment matrix that follows from the label vector z
The NAC test to measure the goodness-of-fit of the DCSBM to network data.
The function computes the NAC+ or NAC statistics in the paper below. Label vectors, if
not provided, are estimated using spec_clust by default but one can also use any other
community detection algorithms through cluster_fct
. Note that the function has to have
A
and K
as its first two arguments, and additional arguments could be provided through
...
.
nac_test(A, K, z = NULL, y = NULL, plus = TRUE, cluster_fct = spec_clust, ...)
nac_test(A, K, z = NULL, y = NULL, plus = TRUE, cluster_fct = spec_clust, ...)
A |
adjacency matrix. |
K |
number of communities. |
z |
label vector for rows of |
y |
label vector for columns of |
plus |
whether or not use column label vector with ( |
cluster_fct |
community detection function to get |
... |
additional arguments for |
A list of result
stat |
NAC or NAC+ test statistic. |
z |
row label vector. |
y |
column label vector. |
Adjusted chi-square test for degree-corrected block models, Linfan Zhang, Arash A. Amini, arXiv preprint arXiv:2012.15047, 2020.
A <- sample_dcpp(500, 10, 4, 0.1)$adj nac_test(A, K = 4)$stat nac_test(A, K = 4, cluster_fct = fast_cpl)$stat
A <- sample_dcpp(500, 10, 4, 0.1)$adj nac_test(A, K = 4)$stat nac_test(A, K = 4, cluster_fct = fast_cpl)$stat
Plot the degree distribution of a network on log scale
plot_deg_dist(gr, logx = TRUE)
plot_deg_dist(gr, logx = TRUE)
gr |
the network as an igraph object |
logx |
whether the degree is in log scale. |
Histogram of the degree of 'gr'.
Plot a network using degree-modulated node sizes, community colors and other enhancements
plot_net( gr, community = NULL, color_map = NULL, extract_lcc = TRUE, heavy_edge_deg_perc = 0.97, coord = NULL, vsize_func = function(deg) log(deg + 3) * 1, vertex_border = FALSE, niter = 1000, vertex_alpha = 0.4, remove_loops = TRUE, make_simple = FALSE, ... )
plot_net( gr, community = NULL, color_map = NULL, extract_lcc = TRUE, heavy_edge_deg_perc = 0.97, coord = NULL, vsize_func = function(deg) log(deg + 3) * 1, vertex_border = FALSE, niter = 1000, vertex_alpha = 0.4, remove_loops = TRUE, make_simple = FALSE, ... )
gr |
the network as an igraph object |
community |
community assignment; vector of node labels |
color_map |
color palette for clusters in 'gr' |
extract_lcc |
Extract largest connected component or not |
heavy_edge_deg_perc |
Degree percentile threshold for determining heavy edges |
coord |
Optional starting positions for the vertices. If this argument is not NULL then it should be an appropriate matrix of starting coordinates. |
vsize_func |
function to determine the size of node size |
vertex_border |
whether to show the border of vertex or not |
niter |
number of iteration for FR layout computation |
vertex_alpha |
factor modifying the opacity alpha of vertex; typically in [0,1] |
remove_loops |
whether to remove loops in the network |
make_simple |
whether to simplify edge weight calculation |
... |
other settings |
A network plot
Plot ROC curves given results from simulate_roc.
plot_roc(roc_results, method_names = NULL, font_size = 16)
plot_roc(roc_results, method_names = NULL, font_size = 16)
roc_results |
data frame |
method_names |
a list of method names |
font_size |
font size of the plot |
Roc plot based on results from simulate_roc
Plot the smooth community profiles based on a resampled statistic
plot_smooth_profile( tstat, net_name = "", trunc_type = "none", spar = 0.3, plot_null_spar = TRUE, alpha = 0.3, base_font_size = 12 )
plot_smooth_profile( tstat, net_name = "", trunc_type = "none", spar = 0.3, plot_null_spar = TRUE, alpha = 0.3, base_font_size = 12 )
tstat |
dataframe that has a column 'value' as statistic in the plot and a column 'K' as its corresponding community number |
net_name |
name of network |
trunc_type |
method to round the dip/elbow point as the estimated community number |
spar |
the sparsity level of fitting spline to the value of |
plot_null_spar |
whether to plot the spline with zero sparsity |
alpha |
transparency of the points in the plot |
base_font_size |
font size of the plot |
smooth profile plot of a network
This is a directed network of hyperlinks between political blogs about politics in the United States of America.
data(polblogs)
data(polblogs)
An igraph data with 1490 nodes and 19090 edges
Data source. Original paper:The political blogosphere and the 2004 US election: divided they blog, Adamic, Lada A., and Natalie Glance. "." Proceedings of the 3rd international workshop on Link discovery. 2005.
Create a degree-corrected planted partition connectivity matrix with a given average expected degree.
pp_conn( n, oir, lambda, pri, theta = rep(1, n), normalize_theta = FALSE, d = rep(1, length(pri)) )
pp_conn( n, oir, lambda, pri, theta = rep(1, n), normalize_theta = FALSE, d = rep(1, length(pri)) )
n |
the number of nodes |
oir |
out-in-ratio |
lambda |
the expected average degree |
pri |
the prior on community labels |
theta |
node connection propensity parameter of DCSBM |
normalize_theta |
whether to normalize theta so that max(theta) == 1 |
d |
diagonal of the connectivity matrix. An all-one vector by default. |
The connectivity matrix B of the desired DCSBM.
The usual "printf" function
printf(...)
printf(...)
... |
printing object |
the value of the printing object
Generate a random symmetric permutation matrix (recursively)
rsymperm(K)
rsymperm(K)
K |
size of the matrix |
A random K
x K
symmetric permutation matrix
Sample an adjacency matrix from a degree-corrected Erdős–Rényi model (DCER).
sample_dcer(theta)
sample_dcer(theta)
theta |
Node connectivity propensity vector ( |
An adjacency matrix following DCSBM
A DCLVM with clusters has edges generated as
where ,
is the
th basis vector of
,
,
and
. The proportionality constant is chosen such
that the overall network has expected average degree
.
To calculate the scaling constant, we approximate
for
by generating random
npairs
and average over them.
sample_dclvm(z, lambda, theta, npairs = NULL)
sample_dclvm(z, lambda, theta, npairs = NULL)
z |
a vector of cluster labels |
lambda |
desired average degree of the network |
theta |
degree parameter |
npairs |
number of pairs of |
Sample form a degree-corrected latent variable model with Gaussian kernel
Adjacency matrix of DCLVM
Sample from a degree-corrected planted partition model
sample_dcpp( n, lambda, K, oir, theta = NULL, pri = rep(1, K)/K, normalize_theta = FALSE )
sample_dcpp( n, lambda, K, oir, theta = NULL, pri = rep(1, K)/K, normalize_theta = FALSE )
n |
number of nodes |
lambda |
average degree |
K |
number of communities |
oir |
out-in ratio |
theta |
propensity parameter, if not given will be samples from a Pareto distribution with scale parameter 2/3 and shape parameter 3 |
pri |
prior distribution of node labels |
normalize_theta |
whether to normalize theta so that max(theta) == 1 |
an adjacency matrix following a degree-corrected planted parition model
Sample an adjacency matrix from a degree-corrected block model (DCSBM)
sample_dcsbm(z, B, theta = 1)
sample_dcsbm(z, B, theta = 1)
z |
Node labels ( |
B |
Connectivity matrix ( |
theta |
Node connectivity propensity vector ( |
An adjacency matrix following DCSBM
sample_dcpp, fast_sbm, sample_tdcsbm
B = pp_conn(n = 10^3, oir = 0.1, lambda = 7, pri = rep(1,3))$B head(sample_dcsbm(sample(1:3, 10^3, replace = TRUE), B, theta = rexp(10^3)))
B = pp_conn(n = 10^3, oir = 0.1, lambda = 7, pri = rep(1,3))$B head(sample_dcsbm(sample(1:3, 10^3, replace = TRUE), B, theta = rexp(10^3)))
Sample an adjacency matrix from a truncated degree-corrected block model (DCSBM) using a fast algorithm.
sample_tdcsbm(z, B, theta = 1)
sample_tdcsbm(z, B, theta = 1)
z |
Node labels ( |
B |
Connectivity matrix ( |
theta |
Node connectivity propensity vector ( |
The function samples an adjacency matrix from a truncated DCSBM, with entries having Bernoulli distributions with mean
The approach uses the masking idea
of Aiyou Chen, leading to fast sampling for sparse networks. The masking,
however, truncates to at most 1, hence
we refer to it as the truncated DCSBM.
An adjacency matrix following DCSBM
B = pp_conn(n = 10^4, oir = 0.1, lambda = 7, pri = rep(1,3))$B head(sample_tdcsbm(sample(1:3, 10^4, replace = TRUE), B, theta = rexp(10^4)))
B = pp_conn(n = 10^4, oir = 0.1, lambda = 7, pri = rep(1,3))$B head(sample_tdcsbm(sample(1:3, 10^4, replace = TRUE), B, theta = rexp(10^4)))
Simulate data from the null and alternative distributions to estimate ROC curves for a collection of methods.
simulate_roc( apply_methods, gen_null_data, gen_alt_data, nruns = 100, core_count = parallel::detectCores() - 1, seed = NULL )
simulate_roc( apply_methods, gen_null_data, gen_alt_data, nruns = 100, core_count = parallel::detectCores() - 1, seed = NULL )
apply_methods |
a function that returns a data.frame with columns "method", "tstat" and "twosided" |
gen_null_data |
a function that generate data under the null model |
gen_alt_data |
a function that generate data under the alternative model |
nruns |
number of simulated data from the null/alternative model |
core_count |
number of cores used in parallel computing |
seed |
seed for random simulation |
a list of result
roc |
A data frame used to plot ROC curves with columns: method, whether a two sided test, false positive rate (FPR), and true positive rate (TPR) |
raw |
A data frame containing raw output from null and alternative models with columns: method, statistics value, whether a two sided test, and the type of hypothesis |
elapsed_time |
symstem elapsed time for generating ROC data |
Implements the Sinkhorn–Knopp algorithm for transforming a square matrix with positive entries to a stochastic matrix with given common row and column sums (e.g., a doubly stochastic matrix).
sinkhorn_knopp( A, sums = rep(1, nrow(A)), niter = 100, tol = 1e-08, sym = FALSE, verb = FALSE )
sinkhorn_knopp( A, sums = rep(1, nrow(A)), niter = 100, tol = 1e-08, sym = FALSE, verb = FALSE )
A |
input matrix |
sums |
desired row/column sums |
niter |
number of iterations |
tol |
convergence tolerance |
sym |
whether to compute symmetric scaling D A D |
verb |
whether to print the current change |
Computes diagonal matrices D1 and D2 to make D1AD2 into a matrix with
given row/column sums. For a symmetric matrix A
, one can set sym = TRUE
to
compute a symmetric scaling DAD.
Diagonal matrices D1 and D2 to make D1AD2 into a matrix with given row/column sums.
Compute SNAC+ with resampling
snac_resample( A, nrep = 20, Kmin = 1, Kmax = 13, ncores = parallel::detectCores() - 1, seed = 1234 )
snac_resample( A, nrep = 20, Kmin = 1, Kmax = 13, ncores = parallel::detectCores() - 1, seed = 1234 )
A |
adjacency matrix |
nrep |
number of times SNAC+ is computed |
Kmin |
minimum community number to use in SNAC+ |
Kmax |
maximum community number to use in SNAC+ |
ncores |
number of cores to use in the parallel computing |
seed |
seed for random sampling |
A data frame with columns specifying repetition cycles, number of community numbers and the value of SNAC+ statistics
Applying SNAC+ test sequentially to estimate community number of a network fit to DCSBM
snac_select( A, Kmin = 1, Kmax, alpha = 1e-05, labels = NULL, cluster_fct = spec_clust, ... )
snac_select( A, Kmin = 1, Kmax, alpha = 1e-05, labels = NULL, cluster_fct = spec_clust, ... )
A |
adjacency matrix. |
Kmin |
minimum candidate community number. |
Kmax |
maximum candidate community number. |
alpha |
significance level for rejecting the null hypothesis. |
labels |
a matrix with each column being a row label vector for a
candidate community number. If not provided, will be computed by |
cluster_fct |
community detection function to get label vectors to compute SNAC+ statistics (in snac_test), by default using spec_clust. |
... |
additional arguments for |
estimated community number.
A <- sample_dcpp(500, 10, 3, 0.1)$adj snac_select(A, Kmax = 6)
A <- sample_dcpp(500, 10, 3, 0.1)$adj snac_select(A, Kmax = 6)
The SNAC test to measure the goodness-of-fit of the DCSBM to network data.
The function computes the SNAC+ or SNAC statistics in the paper below.
The row label vector of the adjacency matrix could be given through z
otherwise will
be estimated by cluster_fct
. One can specify the ratio of nodes used to estimate column
label vector. If plus = TRUE
, the column labels will be estimated by spec_clust with
(K
+1) clusters, i.e. performing SNAC+ test, otherwise with K
clusters SNAC test.
One can also get multiple test statistics with repeated random subsampling on nodes.
snac_test( A, K, z = NULL, ratio = 0.5, fromEachCommunity = TRUE, plus = TRUE, cluster_fct = spec_clust, nrep = 1, ... )
snac_test( A, K, z = NULL, ratio = 0.5, fromEachCommunity = TRUE, plus = TRUE, cluster_fct = spec_clust, nrep = 1, ... )
A |
adjacency matrix. |
K |
desired number of communities. |
z |
label vector for rows of adjacency matrix. If not provided, will be estimated by
|
ratio |
ratio of subsampled nodes from the network. |
fromEachCommunity |
whether subsample from each estimated community or the full network, default is TRUE |
plus |
whether or not use column label vector with ( |
cluster_fct |
community detection function to estimate label vectors, by default using spec_clust.
The first two arguments have to be |
nrep |
number of times the statistics are computed. |
... |
additional arguments for |
A list of result
stat |
SNAC or SNAC+ test statistic. |
z |
row label vector. |
Adjusted chi-square test for degree-corrected block models, Linfan Zhang, Arash A. Amini, arXiv preprint arXiv:2012.15047, 2020.
A <- sample_dcpp(500, 10, 4, 0.1)$adj snac_test(A, K = 4, niter = 3)$stat
A <- sample_dcpp(500, 10, 4, 0.1)$adj snac_test(A, K = 4, niter = 3)$stat
Perform spectral clustering (with regularization) to estimate communities
spec_clust( A, K, type = "lap", tau = 0.25, nstart = 20, niter = 10, ignore_first_col = FALSE )
spec_clust( A, K, type = "lap", tau = 0.25, nstart = 20, niter = 10, ignore_first_col = FALSE )
A |
Adjacency matrix (n x n) |
K |
Number of communities |
type |
("lap" | "adj" | "adj2") Whether to use Laplacian or adjacency-based spectral clustering |
tau |
Regularization parameter for the Laplacian |
nstart |
argument from function 'kmeans' |
niter |
argument from function 'kmeans' |
ignore_first_col |
whether to ignore the first eigen vector when doing spectral clustering |
A label vector of size n x 1 with elements in 1,2,...,K
Provides a spectral representation of the network (with regularization) based on the adjacency or Laplacian matrices
spec_repr(A, K, type = "lap", tau = 0.25, ignore_first_col = FALSE)
spec_repr(A, K, type = "lap", tau = 0.25, ignore_first_col = FALSE)
A |
Adjacency matrix (n x n) |
K |
Number of communities |
type |
("lap" | "adj" | "adj2") Whether to use Laplacian or adjacency-based spectral clustering |
tau |
Regularization parameter for the Laplacian |
ignore_first_col |
whether to ignore the first eigen vector |
The n x K matrix resulting from a spectral embedding of the network into R^K