Title: | Latent Order Logistic Graph Models |
---|---|
Description: | Estimation of Latent Order Logistic (LOLOG) Models for Networks. LOLOGs are a flexible and fully general class of statistical graph models. This package provides functions for performing MOM, GMM and variational inference. Visual diagnostics and goodness of fit metrics are provided. See Fellows (2018) <arXiv:1804.04583> for a detailed description of the methods. |
Authors: | Ian E. Fellows [aut, cre], Mark S. Handcock [ctb] |
Maintainer: | Ian E. Fellows <[email protected]> |
License: | MIT + file LICENCE |
Version: | 1.3.2 |
Built: | 2024-11-06 05:37:43 UTC |
Source: | https://github.com/statnet/lolog |
indexing
indexing
indexing
indexing
## S4 method for signature 'Rcpp_DirectedNet,ANY,ANY,ANY' x[i, j, ..., maskMissing = TRUE, drop = TRUE] ## S4 method for signature 'Rcpp_UndirectedNet,ANY,ANY,ANY' x[i, j, ..., maskMissing = TRUE, drop = TRUE] ## S4 replacement method for signature 'Rcpp_DirectedNet,ANY,ANY,ANY' x[i, j, ...] <- value ## S4 replacement method for signature 'Rcpp_UndirectedNet,ANY,ANY,ANY' x[i, j, ...] <- value
## S4 method for signature 'Rcpp_DirectedNet,ANY,ANY,ANY' x[i, j, ..., maskMissing = TRUE, drop = TRUE] ## S4 method for signature 'Rcpp_UndirectedNet,ANY,ANY,ANY' x[i, j, ..., maskMissing = TRUE, drop = TRUE] ## S4 replacement method for signature 'Rcpp_DirectedNet,ANY,ANY,ANY' x[i, j, ...] <- value ## S4 replacement method for signature 'Rcpp_UndirectedNet,ANY,ANY,ANY' x[i, j, ...] <- value
x |
object |
i |
indices |
j |
indices |
... |
unused |
maskMissing |
should missing values be masked by NA |
drop |
unused |
value |
values to assign |
data(ukFaculty) net <- as.BinaryNet(ukFaculty) #dyad Extraction net[1:2,1:5] net$outNeighbors(c(1,2,3)) #dyad assignment net[1,1:5] <- rep(NA,5) net[1:2,1:5] net[1:2,1:5,maskMissing=FALSE] #remove the mask over missing values and see #nothing was really changed #node variables net$variableNames() net[["Group"]] net[["rnorm"]] <- rnorm(net$size()) net[["rnorm"]]
data(ukFaculty) net <- as.BinaryNet(ukFaculty) #dyad Extraction net[1:2,1:5] net$outNeighbors(c(1,2,3)) #dyad assignment net[1,1:5] <- rep(NA,5) net[1:2,1:5] net[1:2,1:5,maskMissing=FALSE] #remove the mask over missing values and see #nothing was really changed #node variables net$variableNames() net[["Group"]] net[["rnorm"]] <- rnorm(net$size()) net[["rnorm"]]
Convert to either an UndirectedNet or DirectedNet object
as.BinaryNet(x, ...)
as.BinaryNet(x, ...)
x |
the object |
... |
unused |
Converts network objects to BinaryNets. This function also converts other graph formats, such as igraph and tidygraph, utilizing intergraph::asNetwork.
either an Rcpp_UndirectedNet or Rcpp_DirectedNet object
data(ukFaculty) net <- as.BinaryNet(ukFaculty) net
data(ukFaculty) net <- as.BinaryNet(ukFaculty) net
Convert to either an UndirectedNet or DirectedNet object
## Default S3 method: as.BinaryNet(x, ...)
## Default S3 method: as.BinaryNet(x, ...)
x |
the object |
... |
unused |
Converts network objects to BinaryNets. This function also converts other graph formats, such as igraph and tidygraph, utilizing intergraph::asNetwork.
either an Rcpp_UndirectedNet or Rcpp_DirectedNet object
data(ukFaculty) net <- as.BinaryNet(ukFaculty) net
data(ukFaculty) net <- as.BinaryNet(ukFaculty) net
Convert a DirectedNet to a network object
## S3 method for class 'Rcpp_DirectedNet' as.network(x, ...)
## S3 method for class 'Rcpp_DirectedNet' as.network(x, ...)
x |
the object |
... |
unused |
A network object
el <- matrix(c(1,2),ncol=2) #make an UndirectedNet with one edge and 5 nodes net <- new(UndirectedNet, el, 5L) nw <- as.network(net) nw
el <- matrix(c(1,2),ncol=2) #make an UndirectedNet with one edge and 5 nodes net <- new(UndirectedNet, el, 5L) nw <- as.network(net) nw
Convert a UndirectedNet to a network object
## S3 method for class 'Rcpp_UndirectedNet' as.network(x, ...)
## S3 method for class 'Rcpp_UndirectedNet' as.network(x, ...)
x |
the object |
... |
unused |
A network object
el <- matrix(c(1,2),ncol=2) #make an UndirectedNet with one edge and 5 nodes net <- new(UndirectedNet, el, 5L) net[1:5,1:5] nw <- as.network(net) nw
el <- matrix(c(1,2),ncol=2) #make an UndirectedNet with one edge and 5 nodes net <- new(UndirectedNet, el, 5L) net[1:5,1:5] nw <- as.network(net) nw
BinaryNet
Rcpp_DirectedNet and Rcpp_UndirectedNet are the native network classes for the lolog package. They are designed for algorithmic performance, and are thin wrappers for an underlying C++ object. These network objects can be passed back and forth between R and C++ with little overhead. Because they are pointers to C++ objects, serialization via 'save' or 'dput' are not supported
Calculate network statistics from a formula
calculateStatistics(formula)
calculateStatistics(formula)
formula |
A lolog formula (See |
data(ukFaculty) calculateStatistics(ukFaculty ~ edges + mutual + triangles)
data(ukFaculty) calculateStatistics(ukFaculty ~ edges + mutual + triangles)
Extracts estimated model coefficients.
## S3 method for class 'lolog' coef(object, ...)
## S3 method for class 'lolog' coef(object, ...)
object |
A 'lolog' object. |
... |
unused |
# Extract parameter estimates as a numeric vector: data(ukFaculty) fit <- lolog(ukFaculty ~ edges) coef(fit)
# Extract parameter estimates as a numeric vector: data(ukFaculty) fit <- lolog(ukFaculty ~ edges) coef(fit)
Creates a model
createCppModel(formula, cloneNet = TRUE, theta = NULL)
createCppModel(formula, cloneNet = TRUE, theta = NULL)
formula |
the model formula |
cloneNet |
create a deep copy of the network within the model object |
theta |
the model parameters. |
Creates a C++ Model object. In general this isn't needed by most users of the package.
data(ukFaculty) model <- createCppModel(ukFaculty ~ edges) model$calculate() model$statistics()
data(ukFaculty) model <- createCppModel(ukFaculty ~ edges) model$calculate() model$statistics()
Creates a probability model for a latent ordered network model
createLatentOrderLikelihood(formula, theta = NULL)
createLatentOrderLikelihood(formula, theta = NULL)
formula |
A LOLOG formula. See |
theta |
Parameter values. |
An Rcpp object representing the likelihood model
# See the methods of the objects returned by this function UndirectedLatentOrderLikelihood DirectedLatentOrderLikelihood # A Barabasi-Albert type graph model with 1000 vertices el <- matrix(0, nrow=0, ncol=2) net <- new(UndirectedNet, el, 1000L) lolik <- createLatentOrderLikelihood(net ~ preferentialAttachment(), theta=1) banet <- lolik$generateNetwork()$network # generate a random network from the model degrees <- banet$degree(1:1000) hist(degrees, breaks=100) # plot the degree distribution order <- banet[["__order__"]] # The vertex inclusion order # Earlier nodes have higher degrees library(ggplot2) qplot(order, degrees, alpha=I(.25)) + geom_smooth(method="loess")
# See the methods of the objects returned by this function UndirectedLatentOrderLikelihood DirectedLatentOrderLikelihood # A Barabasi-Albert type graph model with 1000 vertices el <- matrix(0, nrow=0, ncol=2) net <- new(UndirectedNet, el, 1000L) lolik <- createLatentOrderLikelihood(net ~ preferentialAttachment(), theta=1) banet <- lolik$generateNetwork()$network # generate a random network from the model degrees <- banet$degree(1:1000) hist(degrees, breaks=100) # plot the degree distribution order <- banet[["__order__"]] # The vertex inclusion order # Earlier nodes have higher degrees library(ggplot2) qplot(order, degrees, alpha=I(.25)) + geom_smooth(method="loess")
Conduct goodness of fit diagnostics
gofit(object, ...)
gofit(object, ...)
object |
the object to evaluate |
... |
additional parameters |
see gofit.lolog
Goodness of Fit Diagnostics for a LOLOG fit
## S3 method for class 'lolog' gofit(object, formula, nsim = 100, ...)
## S3 method for class 'lolog' gofit(object, formula, nsim = 100, ...)
object |
the object to evaluate |
formula |
A formula specifying the statistics on which to evaluate the fit |
nsim |
The number of simulated statistics |
... |
additional parameters |
library(network) data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) # A dyad independent model fitind <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC")) summary(fitind) # Check gof on degree distribution (bad!) gind <- gofit(fitind, ukFaculty ~ degree(0:50)) gind plot(gind) #check gof on esp distribution (bad!) gind <- gofit(fitind, ukFaculty ~ esp(0:25)) gind plot(gind) ## Not run: #include triangles and 2-stars (in and out) fitdep <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC") + triangles + star(2, direction="in") + star(2, direction="out"), nsamp=1500) summary(fitdep) # Check gof on (in + out) degree distribution (good!) gdep <- gofit(fitdep, ukFaculty ~ degree(0:50)) gdep plot(gdep) #check gof on esp distribution (good!) gdep <- gofit(fitdep, ukFaculty ~ esp(0:25)) gdep plot(gdep) ## End(Not run)
library(network) data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) # A dyad independent model fitind <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC")) summary(fitind) # Check gof on degree distribution (bad!) gind <- gofit(fitind, ukFaculty ~ degree(0:50)) gind plot(gind) #check gof on esp distribution (bad!) gind <- gofit(fitind, ukFaculty ~ esp(0:25)) gind plot(gind) ## Not run: #include triangles and 2-stars (in and out) fitdep <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC") + triangles + star(2, direction="in") + star(2, direction="out"), nsamp=1500) summary(fitdep) # Check gof on (in + out) degree distribution (good!) gdep <- gofit(fitdep, ukFaculty ~ degree(0:50)) gdep plot(gdep) #check gof on esp distribution (good!) gdep <- gofit(fitdep, ukFaculty ~ esp(0:25)) gdep plot(gdep) ## End(Not run)
An lolog plug-in for easy C++ prototyping and access
The inline plug-in for lolog
inlineLologPlugin(...) inlineLologPlugin
inlineLologPlugin(...) inlineLologPlugin
... |
plug-in arguments |
The lolog Rcpp plugin allows for the rapid prototyping of compiled code.
new functions can be registered and exposed using cppFunction
and new statistics can be compiled and registered using sourceCpp
.
cppFunction
, sourceCpp
, cppFunction
## Not run: # This creates a function in C++ to create an empty network of size n # and expose it to R. src <- " lolog::BinaryNet<lolog::Directed> makeEmptyNetwork(const int n){ Rcpp::IntegerMatrix tmp(0,2); lolog::BinaryNet<lolog::Directed> net(tmp, n); return net; } " Rcpp::registerPlugin("lolog",inlineLologPlugin) emptyNetwork <- cppFunction(src,plugin="lolog") net <- emptyNetwork(10L) net[1:10,1:10] ## End(Not run)
## Not run: # This creates a function in C++ to create an empty network of size n # and expose it to R. src <- " lolog::BinaryNet<lolog::Directed> makeEmptyNetwork(const int n){ Rcpp::IntegerMatrix tmp(0,2); lolog::BinaryNet<lolog::Directed> net(tmp, n); return net; } " Rcpp::registerPlugin("lolog",inlineLologPlugin) emptyNetwork <- cppFunction(src,plugin="lolog") net <- emptyNetwork(10L) net[1:10,1:10] ## End(Not run)
This data set comes from a network study of corporate law partnership that was carried out in a Northeastern US corporate law firm, referred to as SG&R, 1988-1991 in New England.
data(lazega)
data(lazega)
CC BY 4.0. When publishing results obtained using this
data set, the original authors (Lazega, 2001)
should be cited, along with this R
package.
Creative Commons Attribution-Share Alike 4.0 International License, see https://creativecommons.org/licenses/by/4.0/ for details.
See http://elazega.fr/?page_id=609 and https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm
Lazega, Emmanuel (2001), The Collegial Phenomenon: The Social Mechanisms of Cooperation among Peers in a Corporate Law Partnership, Oxford: Oxford University Press
lolog
is used to fit Latent Order Logistic Graph (LOLOG) models. LOLOG models are
motivated by the idea of network growth where the network begins empty, and edge variables
are sequentially 'added' to the network with an either unobserved, or partially observed
order . Conditional upon the inclusion order, the probability of an edge has a
logistic relationship with the change in network statistics.
lolog( formula, auxFormula = NULL, theta = NULL, nsamp = 1000, includeOrderIndependent = TRUE, targetStats = NULL, weights = "full", tol = 0.1, nHalfSteps = 10, maxIter = 100, minIter = 2, startingStepSize = 0.1, maxStepSize = 0.5, cluster = NULL, verbose = TRUE )
lolog( formula, auxFormula = NULL, theta = NULL, nsamp = 1000, includeOrderIndependent = TRUE, targetStats = NULL, weights = "full", tol = 0.1, nHalfSteps = 10, maxIter = 100, minIter = 2, startingStepSize = 0.1, maxStepSize = 0.5, cluster = NULL, verbose = TRUE )
formula |
A lolog formula for the sufficient statistics (see details). |
auxFormula |
A lolog formula of statistics to use for moment matching. |
theta |
Initial parameters values. Estimated via |
nsamp |
The number of sample networks to draw at each iteration. |
includeOrderIndependent |
If TRUE, all order independent terms in formula are used for moment matching. |
targetStats |
A vector of network statistics to use as the target for the moment equations.
If |
weights |
The type of weights to use in the GMM objective. Either 'full' for the inverse of the full covariance matrix or 'diagonal' for the inverse of the diagonal of the covariance matrix. |
tol |
The Hotelling's T^2 p-value tolerance for convergence for the transformed moment conditions. |
nHalfSteps |
The maximum number of half steps to take when the objective is not improved in an iteration. |
maxIter |
The maximum number of iterations. |
minIter |
The minimum number of iterations. |
startingStepSize |
The starting dampening of the parameter update. |
maxStepSize |
The largest allowed value for dampening. |
cluster |
A parallel cluster to use for graph simulation. |
verbose |
Level of verbosity 0-3. |
LOLOG represents the probability of a tie, given the network grown up to a time point as
where is the growth order of the network up to time
,
is the
state of the graph at time
.
is a vector
representing the change in graph statistics from time
to
if an edge is present, and
is a vector of parameters.
The motivating growth order proceeds 'by vertex.' The network begins 'empty' and then vertices are 'added' to the network sequentially. The order of vertex inclusion may be random or fixed. When a vertex 'enters' the network, each of the edge variables connecting it and vertices already in the network are considered for edge creation in a completely random order.
LOLOG formulas contain a network, DirectedNet or UndirectedNet object on the left hand side. the right hand side contains the model terms used. for example,
net ~ edges
represents and Erdos-Renyi model and
net ~ edges + preferentialAttachment()
represents a Barabasi-Albert model. See lolog-terms
for a list of allowed model statistics
Conditioning on (partial) vertex order can be done by placing an ordering variable on the right hand side of the '|' operator, as in
net ~ edges + preferentialAttachment() | order
'order' should be a numeric vector with as many elements as there are vertices in the network. Ties are allowed. Vertices with higher order values will always be included later. Those with the same values will be included in a random order in each simulated network.
offsets and constraints are specified by wrapping them with either offset()
or constraint()
,
for example, the following specifies an Erdos-Renyi model with the constraint that degrees must be less
that 10
net ~ edges + constraint(boundedDegree(0L, 10L))
If the model contains any order dependent statistics, additional moment constraints
must be specified in auxFormula
. Ideally these should be chosen to capture
the features modeled by the order dependent statistics. For example, preferentialAttachment
models the degree structure, so we might choose two-stars as a moment constraint.
lolog(net ~ edges + preferentialAttachment(), net ~ star(2))
will fit a Barabasi-Albert model with the number of edges and number of two-stars as moment constraints.
An object of class 'lolog'. If the model is dyad independent, the returned object will
also be of class "lologVariational" (see lologVariational
, otherwise it will
also be a "lologGmm" object.
lologGmm objects contain:
method |
"Method of Moments" for order independent models, otherwise "Generalized Method of Moments" |
formula |
The model formula |
auxFormula |
The formula containing additional moment conditions |
theta |
The parameter estimates |
stats |
The statistics for each network in the last iteration |
estats |
The expected stats (G(y,s)) for each network in the last iteration |
obsStats |
The observed h(y) network statistics |
targetStats |
The target network statistics |
obsModelStats |
The observed g(y,s) network statistics |
net |
A network simulated from the fit model |
grad |
The gradient of the moment conditions (D) |
vcov |
The asymptotic covariance matrix of the parameter estimates |
likelihoodModel |
An object of class *LatentOrderLikelihood at the fit parameters |
library(network) set.seed(1) data(flo) flomarriage <- network(flo,directed=FALSE) flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) # A dyad independent model fit <- lolog(flomarriage ~ edges + nodeCov("wealth")) summary(fit) # A dyad dependent model with 2-stars and triangles fit2 <- lolog(flomarriage ~ edges + nodeCov("wealth") + star(2) + triangles, verbose=FALSE) summary(fit2) ## Not run: # An order dependent model fit3 <- lolog(flomarriage ~ edges + nodeCov("wealth") + preferentialAttachment(), flomarriage ~ star(2:3), verbose=FALSE) summary(fit3) # Try something a bit more real data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) fituk <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC") + triangles + star(2)) summary(fituk) plot(fituk$net, vertex.col= ukFaculty %v% "Group" + 2) ## End(Not run)
library(network) set.seed(1) data(flo) flomarriage <- network(flo,directed=FALSE) flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) # A dyad independent model fit <- lolog(flomarriage ~ edges + nodeCov("wealth")) summary(fit) # A dyad dependent model with 2-stars and triangles fit2 <- lolog(flomarriage ~ edges + nodeCov("wealth") + star(2) + triangles, verbose=FALSE) summary(fit2) ## Not run: # An order dependent model fit3 <- lolog(flomarriage ~ edges + nodeCov("wealth") + preferentialAttachment(), flomarriage ~ star(2:3), verbose=FALSE) summary(fit3) # Try something a bit more real data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) fituk <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC") + triangles + star(2)) summary(fituk) plot(fituk$net, vertex.col= ukFaculty %v% "Group" + 2) ## End(Not run)
LOLOG Model Terms
edges
(dyad-independent) (order-independent) (directed) (undirected)Edges: This term adds one network statistic equal to the number of edges (i.e. nonzero values) in the network.
star(k, direction="in")
(order-independent) (directed) (undirected)The k
argument is a vector of distinct integers.
This term adds one network statistic to the model for each element in k
.
The th such statistic counts the number of distinct
k[i]
-stars in the network,
where a -star is defined to be a node
and a set of
different nodes
such that the ties
exist for
.
For directed networks, direction indicates whether the count is of in-stars (direction="in")
or out-stars (direction="out")
triangles()
(order-independent) (directed) (undirected)This term adds one statistic to the model equal to the number of triangles
in the network. For an undirected network, a triangle is defined to be any
set of three edges. For a directed network, a
triangle is defined as any set of three edges
and
and either
or
.
clustering()
(order-independent) (undirected)The global clustering coefficient, defined as the number of triangles over the number of possible triangles https://en.wikipedia.org/wiki/Clustering_coefficient, or 3 * triangles / 2-stars.
transitivity()
(order-independent) (undirected)The Soffer-Vazquez transitivity. This is clustering metric that adjusts for large degree differences and is described by C in Equation 6 of #' https://pubmed.ncbi.nlm.nih.gov/16089694/. Note The approximation of the number of possible shared neighbors between node i and j of min(d_i,d_j) - 1 in this implementation.
mutual()
(order-independent) (directed) A count of the number of pairs of actors
and
for which
and
both exist.
nodeMatch(name)
(dyad-independent) (order-independent) (directed) (undirected)For categorical network nodal variable 'name,' the number of edges between nodes with the same variable value.
nodeMix(name)
(dyad-independent) (order-independent) (directed) (undirected)For categorical network nodal variable 'name,' adds one statistic for each combination of levels of the variable equal to the count of edges between those levels.
degree(d, direction="undirected", lessThanOrEqual=FALSE)
(order-independent) (directed)
(undirected)The d
argument is a vector of distinct integers. This term adds one
network statistic to the model for each element in d
; the th
such statistic equals the number of nodes in the network of degree
d[i]
, i.e. with exactly d[i]
edges.
For directed networks if direction="undirected"
degree is counted as the sum of the in and out degrees of a node. If direction="in" then in-degrees are
used and direction="out" indicates out-degrees.
If lessThanOrEqual=TRUE, then the count is the number of nodes with degree less than or equal to d.
twoPath
(order-independent) (directed) (undirected)This term adds one statistic to the model, equal to the number of 2-paths in
the network. For a directed network this is defined as a pair of edges
, where
and
must be distinct. That is, it is a directed path of length 2 from
to
via
. For directed networks a 2-path is also a
mixed 2-star.
For undirected networks a twopath is defined as a pair of edges
. That is, it is an undirected path of length 2 from
to
via
, also known as a 2-star.
degreeCrossProd() (order-independent) (undirected)
This term adds one network statistic equal to the mean of the cross-products of the degrees of all pairs of nodes in the network which are tied.
nodeCov(name)
(dyad-independent) (order-independent) (directed) (undirected)The name
argument is a character string giving the name of a
numeric attribute in the network's vertex attribute list.
This term adds a single network statistic to the model equaling the sum of
name(i)
and name(j)
for all edges in the
network. For categorical variables, levels are coded as 1,..,nlevels'.
edgeCov(x, name=NULL)
(dyad-independent) (order-independent) (directed) (undirected)The x
argument is a square matrix of covariates, one for each possible edge in the network.
This term adds one statistic to the model, equal to the sum
of the covariate values for each edge appearing in the network. The
edgeCov
term applies to both directed and undirected networks. For
undirected networks the covariates are also assumed to be undirected.
If present, the name
argument is a character string providing a
name for the edgeCov
term. The name will be "edgeCov.<name>". It is
recommended that all edgeCov
terms be given explicit names. In particular,
if two unnamed edgeCov
terms are supplied an error will occur (as they will
have the same default name "edgeCov.".
edgeCovSparse(x, name=NULL)
(dyad-independent) (order-independent) (directed) (undirected)Identical to edgeCov, except x
should be a sparse matrix. This is especially useful for larger networks,
where passing a dense matrix to edgeCov
is too memory intensive.
gwesp(alpha)
(order-independent) (directed) (undirected)This term is just like gwdsp
except it adds a statistic equal to the
geometrically weighted edgewise (not dyadwise) shared partner
distribution with decay parameter
alpha
parameter, which should be non-negative.
gwdegree(alpha, direction="undirected")
(order-independent) (directed) (undirected)This term adds one network statistic to the model equal to the weighted
degree distribution with decay controlled by the decay
parameter.
The alpha
parameter is the same as theta_s in equation (14) in Hunter (2007).
For directed networks if direction="undirected" degree is counted as the sum of the in and out degrees of a node. If direction="in" then in-degrees are used ans direction="out" indicates out-degrees.
gwdsp(alpha)
(order-independent) (directed) (undirected)This term adds one network statistic to the model equal to the geometrically
weighted dyadwise shared partner distribution with decay parameter
decay
parameter, which should be non-negative.
esp(d, type=2)
(order-independent) (directed) (undirected)This term adds one network
statistic to the model for each element in d
where the th such
statistic equals the number of edges (rather than dyads) in the
network with exactly
d[i]
shared partners. This term can be used with
directed and undirected networks. For directed networks the count depends on type:
type = 1 : from -> to -> nbr -> from
type = 2 : from -> to <- nbr <- from (homogeneous)
type = 3 : either type 1 or 2
type = 4 : all combinations of from -> to <-> nbr <-> from
geoDist(long, lat, distCuts=Inf)
(dyad-independent) (order-independent) (undirected)given nodal variables for longitude and latitude, calculates the sum of the great circle distance between connected nodes. distCuts splits this into separate statistics that count the sum of the minimum of the cut point and the distance.
dist(names
(dyad-independent) (order-independent) (undirected)Calculates a statistic equal to the sum of the euclidean distances between connected nodes on the numeric nodal variables specified in names.
preferentialAttachment(k=1, direction="in")
(directed) (undirected)An order dependent preferential attachment term. For each edge, adds
log( (k+degree) / (n * (meanDegree + k)))
where degree is the current degree of the acting node, n is the network size, and meanDegree is the mean degree of the network. This depends upon the order in which edges are added. For directed networks, if direction="in" the in-degrees are used. If it is "out" the out degrees are used, otherwise "undirected" means that the sum of the in and out degrees are used.
sharedNbrs(k=1)
(undirected)for each edge adds
log(k + shared / minDeg)
where shared is the current number of shared neighbors between the two nodes, and minDeg is the minimum of the current degrees of the two nodes (i.e. the number of possible shared neighbors).
nodeLogMaxCov(name)
(order-independent) (undirected)For each edge (i,j) and nodal variable variable, add to the statistic
log(max(variable[i],variable[j]))
If the variable is a (partial) rank order of nodal inclusion into the network, this statistic can be useful in modeling the mean degree over the course of the growth process.
nodeFactor(name, direction="undirected")
(order-independent) (undirected) (directed)The name
argument is a character vector giving
one or more names of categorical attributes in the network's vertex
attribute list. This term adds multiple network statistics to the
model, one for each of (a subset of) the unique values of the
attrname
attribute (or each combination of the attributes
given). Each of these statistics gives the number of times a node
with that attribute or those attributes appears in an edge in the
network. In particular, for edges whose endpoints both have the same
attribute values, this value is counted twice. For directed networks, if direction="in" then in-edges are
used and direction="out" indicates out-edges.
absDiff(name, power=1)
(order-independent) (undirected) (directed)The name
argument is a character string giving the name
of one or mode quantitative attribute in the network's vertex attribute
list. This term adds one network statistic to the model equaling the
sum of sum(abs(name[i]-name[j])^pow)
for all edges (i,j)
in the network.
boundedDegree(lower,upper)
(order-independent) (undirected)Adds a constraint that the degrees for the network must be between lower and upper.
Create a skeleton for a package extending lolog
lologPackageSkeleton(path = ".")
lologPackageSkeleton(path = ".")
path |
where to create the package |
lolog is a modular package, and can be extended at both the R and C++ level. This function will build a package skeleton that can be used as a starting point for development. To create the package in the current directory run:
lologPackageSkeleton()
Build and install the package from the command line with
R CMD build LologExtension
R CMD INSTALL LologExtension_1.0.tar.gz
## Not run: #install package lologPackageSkeleton() system("R CMD build LologExtension") system("R CMD INSTALL LologExtension_1.0.tar.gz") library(LologExtension) #Load package # Run model with new minDegree statistic library(network) m <- matrix(0,20,20) for(i in 1:19) for(j in (i+1):20) m[i,j] <- m[j,i] <- rbinom(1,1,.1) g <- network(m, directed=FALSE) fit <- lologVariational(g ~ edges() + minDegree(1L)) summary(fit) ## End(Not run)
## Not run: #install package lologPackageSkeleton() system("R CMD build LologExtension") system("R CMD INSTALL LologExtension_1.0.tar.gz") library(LologExtension) #Load package # Run model with new minDegree statistic library(network) m <- matrix(0,20,20) for(i in 1:19) for(j in (i+1):20) m[i,j] <- m[j,i] <- rbinom(1,1,.1) g <- network(m, directed=FALSE) fit <- lologVariational(g ~ edges() + minDegree(1L)) summary(fit) ## End(Not run)
Fits a latent ordered network model using Monte Carlo variational inference
lologVariational( formula, nReplicates = 5L, dyadInclusionRate = NULL, edgeInclusionRate = NULL, targetFrameSize = 5e+05 )
lologVariational( formula, nReplicates = 5L, dyadInclusionRate = NULL, edgeInclusionRate = NULL, targetFrameSize = 5e+05 )
formula |
A lolog formula. See |
nReplicates |
An integer controlling how many dyad ordering to perform. |
dyadInclusionRate |
Controls what proportion of non-edges in each ordering should be dropped. |
edgeInclusionRate |
Controls what proportion of edges in each ordering should be dropped. |
targetFrameSize |
Sets dyadInclusionRate so that the model frame for the logistic regression will have on average this amount of observations. |
This function approximates the maximum likelihood solution via a variational inference on the graph (y) over the latent edge variable inclusion order (s). Specifically, it replaces the conditional probability p(s | y) by p(s). If the LOLOG model contains only dyad independent terms, then these two probabilities are identical, and thus variational inference is exactly maximum likelihood inference. The objective function is
This can be approximated by drawing samples from p(s) to approximate the expectation. The number of samples is controlled by the nReplicates parameter. The memory required is on the order of nReplicates * (# of dyads). For large networks this can be impractical, so adjusting dyadInclusionRate and edgeInclusionRate allows one to down sample the # of dyads in each replicate. By default these are set attempting to achieve as equal a number of edges and non-edges as possible while targeting a model frame with targetFrameSize number of rows.
If the model is dyad independent, replicates are redundant, and so nReplicates is set to 1 with a note.
The functional form of the objective function is equivalent to logistic regression, and so
the glm
function is used to maximize it. The asymptotic covariance of the parameter
estimates is calculated using the methods of Westling (2015).
An object of class c('lologVariationalFit','lolog','list') consisting of the following items:
formula |
The model formula |
method |
"variational" |
theta |
The fit parameter values |
vcov |
The asymptotic covariance matrix for the parameter values. |
nReplicates |
The number of replicates |
dyadInclusionRate |
The rate at which non-edges are included |
edgeInclusionRate |
The rate at which edges are included |
allDyadIndependent |
Logical indicating model dyad independence |
likelihoodModel |
An object of class *LatentOrderLikelihood at the fit parameters |
outcome |
The outcome vector for the logistic regression |
predictors |
The change statistic predictor matrix for the logistic regression |
Westling, T., & McCormick, T. H. (2015). Beyond prediction: A framework for inference with variational approximations in mixture models. arXiv preprint arXiv:1510.08151.
library(network) data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) fit <- lologVariational(ukFaculty ~ edges() + nodeMatch("GroupC"), nReplicates=1L, dyadInclusionRate=1) summary(fit)
library(network) data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) fit <- lologVariational(ukFaculty ~ edges() + nodeMatch("GroupC"), nReplicates=1L, dyadInclusionRate=1) summary(fit)
Plots a gofit object
## S3 method for class 'gofit' plot( x, y, type = c("line", "box"), scaling = c("none", "std", "sqrt"), lineAlpha = 0.06, lineSize = 1, ... )
## S3 method for class 'gofit' plot( x, y, type = c("line", "box"), scaling = c("none", "std", "sqrt"), lineAlpha = 0.06, lineSize = 1, ... )
x |
the gofit object |
y |
unused |
type |
type of plot, boxplot or lineplot |
scaling |
type of scaling of the network statistics. If "std", network statistics are scaling by subtracting off the observed statistics and scaling by the standard deviation. If "sqrt", network statistics are plotted on the square root scale (The square root is the variance stabilizing transformation for a Poisson random variable). The default is "none", where by the network statistics are not scaled. |
lineAlpha |
The transparency of the simulated statistics lines |
lineSize |
The width of the lines |
... |
passed to either boxplot or geom_line |
library(network) data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) # A dyad independent model fitind <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC")) summary(fitind) # Check gof on degree distribution (bad!) gind <- gofit(fitind, ukFaculty ~ degree(0:50)) plot(gind) plot(gind, type="box")
library(network) data(ukFaculty) # Delete vertices missing group delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group"))) # A dyad independent model fitind <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC")) summary(fitind) # Check gof on degree distribution (bad!) gind <- gofit(fitind, ukFaculty ~ degree(0:50)) plot(gind) plot(gind, type="box")
This function creates simple diagnostic plots for MC sampled statistics produced from a lolog fit.
## S3 method for class 'lologGmm' plot(x, type = c("histograms", "target", "model"), ...)
## S3 method for class 'lologGmm' plot(x, type = c("histograms", "target", "model"), ...)
x |
A model fit object to be diagnosed. |
type |
The type of diagnostic plot. "histograms", the default, produces histograms of the sampled output statistic values with the observed statistics represented by vertical lines. "target" produces a pairs plot of the target output statistic values with the pairs of observed target statistics represented by red squares. output statistic values with the observed statistics represented by vertical lines. "model" produces a pairs plot of the sampled output statistic values with the pairs of observed statistics represented by red squares. |
... |
Additional parameters. Passed to geom_histogram if type="histogram" and pairs otherwise. |
Plots are produced that represent the distributions of the output sampled statistic values or the target statistics values. The values of the observed target statistics for the networks are also represented for comparison with the sampled statistics.
library(network) set.seed(1) data(flo) flomarriage <- network(flo,directed=FALSE) flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) # An order dependent model fit3 <- lolog(flomarriage ~ edges + nodeCov("wealth") + preferentialAttachment(), flomarriage ~ star(2:3), verbose=FALSE) plot(fit3) plot(fit3, "target") plot(fit3, "model")
library(network) set.seed(1) data(flo) flomarriage <- network(flo,directed=FALSE) flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) # An order dependent model fit3 <- lolog(flomarriage ~ edges + nodeCov("wealth") + preferentialAttachment(), flomarriage ~ star(2:3), verbose=FALSE) plot(fit3) plot(fit3, "target") plot(fit3, "model")
plot an DirectedNet object
## S3 method for class 'Rcpp_DirectedNet' plot(x, ...)
## S3 method for class 'Rcpp_DirectedNet' plot(x, ...)
x |
the Rcpp_DirectedNet object |
... |
additional parameters for plot.network |
This is a thin wrapper around plot.network
.
data(ukFaculty) net <- as.BinaryNet(ukFaculty) plot(net, vertex.col=net[["Group"]]+1)
data(ukFaculty) net <- as.BinaryNet(ukFaculty) plot(net, vertex.col=net[["Group"]]+1)
Plot an UndirectedNet object
## S3 method for class 'Rcpp_UndirectedNet' plot(x, ...)
## S3 method for class 'Rcpp_UndirectedNet' plot(x, ...)
x |
the object |
... |
additional parameters for plot.network |
This is a thin wrapper around plot.network
.
el <- matrix(c(1,2),ncol=2) net <- new(UndirectedNet, el, 5L) net[1,5] <- 1 net[2,5] <- 1 plot(net)
el <- matrix(c(1,2),ncol=2) net <- new(UndirectedNet, el, 5L) net[1,5] <- 1 net[2,5] <- 1 plot(net)
prints a gofit object
## S3 method for class 'gofit' print(x, ...)
## S3 method for class 'gofit' print(x, ...)
x |
The object |
... |
passed to print.data.frame |
Print a 'lolog' object
## S3 method for class 'lolog' print(x, ...)
## S3 method for class 'lolog' print(x, ...)
x |
the object |
... |
additional parameters (unused) |
Print of a lologVariationalFit object
## S3 method for class 'lologVariationalFit' print(x, ...)
## S3 method for class 'lologVariationalFit' print(x, ...)
x |
the object |
... |
additional parameters (unused) |
Register Statistics
registerDirectedStatistic
registerDirectedStatistic
Generates BinaryNetworks from a fit lolog object
## S3 method for class 'lolog' simulate(object, nsim = 1, seed = NULL, convert = FALSE, ...)
## S3 method for class 'lolog' simulate(object, nsim = 1, seed = NULL, convert = FALSE, ...)
object |
A 'lolog' object. |
nsim |
The number of simulated networks |
seed |
Either NULL or an integer that will be used in a call to set.seed before simulating |
convert |
convert to a network object#' |
... |
unused |
A list of BinaryNet (or network if convert=TRUE) objects. Networks contain an additional vertex covariate "__order__" that indicates the sequence order in which the vertex was 'added' into the network.
library(network) data(flo) flomarriage <- network(flo,directed=FALSE) flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) fit <- lolog(flomarriage ~ edges + nodeCov("wealth")) net <- simulate(fit)[[1]] plot(net)
library(network) data(flo) flomarriage <- network(flo,directed=FALSE) flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) fit <- lolog(flomarriage ~ edges + nodeCov("wealth")) net <- simulate(fit)[[1]] plot(net)
Summary of a 'lolog' object
## S3 method for class 'lolog' summary(object, ...)
## S3 method for class 'lolog' summary(object, ...)
object |
the object |
... |
additional parameters (unused) |
data(lazega) fit <- lologVariational(lazega ~ edges() + nodeMatch("office") + triangles, nReplicates=50L, dyadInclusionRate=1) summary(fit)
data(lazega) fit <- lologVariational(lazega ~ edges() + nodeMatch("office") + triangles, nReplicates=50L, dyadInclusionRate=1) summary(fit)
The personal friendship network of a faculty of a UK university, consisting of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. The survey contained missing data for the school of two individuals.
data(ukFaculty)
data(ukFaculty)
When publishing results obtained using this
data set, the original authors (Nepusz T., Petroczi A., Negyessy L., Bazso F. 2008)
should be cited, along with this R
package.
Creative Commons Attribution-Share Alike 2.0 UK: England & Wales License, see http://creativecommons.org/licenses/by-sa/2.0/uk/ for details.
The data set was originally reported by Nepusz et. al. (2008) and was subsequently processed and included by the igraphdata package. We have simply converted their network from an igraph to a network object.
Nepusz T., Petroczi A., Negyessy L., Bazso F.: Fuzzy communities and the concept of bridgeness in complex networks. Physical Review E 77:016107, 2008.