package version 0.3.5 built 2021-10-31
Skye Bender deMoll ([email protected]) and the statnet team (https://statnet.org). Major contributions from James Moody, Martina Morris, Carter Butts, Steve Goodreau, Samuel Jenness, Li Wang and Kirk Li. This work was supported by grant R01HD68395 from the National Institute of Health.
Documentation compiled Mon Nov 11 03:13:51 2024
This package provides tools for working with dynamic networks (or processes on networks) in which the ordering and timing of ties is important. These approaches are needed when the phenomena of interest involve networks having densities and edge-turnover rates in a parameter space such that the time-collapsed aggregate network give too great a distortion of the toplology connectivity for useful analysis.
Most tsna
functions accept as their input continuous-
and discrete-time longitudinal networks having vertex, edge, and
attribute dynamics stored in the networkDynamic
format.
tsna
includes tools for applying “traditional” static
Social Network Analysis (SNA) metrics at multiple time points, as well
as temporal extensions of SNA metrics using forwards- and backwards-path
routines. The initial version of tsna
is primarily focused
on working with networks of the type that might be generated from
discrete time simulations of network evolution.
Research on formal methods for analysis of longitudinal networks is a relatively new area of study. Many of the measures included in this package may be formally identical to measures already published with different names in literature of diverse research fields. We will attempt to find and cite such previous work, but if you encounter any omissions, please let us know.
This document assumes familiarity with general concepts of SNA,
R
, and the statnet
suite of R packages. For
more background on the network
and
networkDynamic
data structures and functions, please see
appropriate package documentation and tutorials.
Loading required package: network
'network' 1.18.2 (2023-12-04), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
Loading required package: networkDynamic
'networkDynamic' 0.11.4 (2023-12-10?), part of the Statnet Project
* 'news(package="networkDynamic")' for changes since last version
* 'citation("networkDynamic")' for citation information
* 'https://statnet.org' for help, support, and other information
As always in R, a general help file for the package can be displayed, and the individual arguments for each function are documented in more detail in the function’s help page.
This vignette also makes use of various example data sets provided by
the networkDynamicData
package, and optionally employs
static Social Network Analysis measures provided by the sna
package.
Loading required package: statnet.common
Attaching package: 'statnet.common'
The following objects are masked from 'package:base':
attr, order
sna: Tools for Social Network Analysis
Version 2.8 created on 2024-09-07.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
For citation information, type citation("sna").
Type help(package="sna") to get started.
Most of the tsna
package function assume that their
input is formatted as a networkDynamic
data structure. The
networkDynamic
package provides utilities
(networkDynamic()
) for converting data from various formats
(such as timed edge-lists, or lists of matrices) as well as functions
for manipulating the data structures.
The data structure provided by networkDynamic
objects
assumes that the vertices and (directed or non-directed) edges of a
network have multiple ‘activity spells’ associated with them indicating
when they are ‘active’ or exist within the observation period. Each
spell is an interval with an onset and terminus time.
Each edge or vertex can activate and deactivate multiple times during
the period over which the network is observed. Please see
?activate
for additional details.
As an example, consider the included moodyContactSim
example object (Moody 2008). We will first plot it as a time-aggregated
network, ignoring the temporal info and showing all of the edges that
are ever active.
We can view the activity spells associated with the network’s edges
onset terminus tail head onset.censored terminus.censored duration edge.id
1 40 72 10 4 FALSE FALSE 32 1
2 214 247 1 11 FALSE FALSE 33 2
3 224 256 7 10 FALSE FALSE 32 3
4 453 479 13 4 FALSE FALSE 26 4
5 494 524 13 2 FALSE FALSE 30 5
6 575 599 2 16 FALSE FALSE 24 6
7 583 615 1 16 FALSE FALSE 32 7
8 621 651 1 12 FALSE FALSE 30 8
9 634 660 13 3 FALSE FALSE 26 9
10 665 692 4 14 FALSE FALSE 27 10
11 674 701 1 9 FALSE FALSE 27 11
12 701 733 16 6 FALSE FALSE 32 12
13 709 740 2 15 FALSE FALSE 31 13
14 712 739 13 5 FALSE FALSE 27 14
15 719 745 8 13 FALSE FALSE 26 15
16 748 782 4 16 FALSE FALSE 34 16
17 749 793 11 8 FALSE FALSE 44 17
18 769 795 13 7 FALSE FALSE 26 18
Since this is a fairly simple network, we could plot the times on the aggregate network diagram as edge labels.
coords<-plot(moodyContactSim,
displaylabels=TRUE,
label.cex=0.8,
label.pos=5,
vertex.col='white',
vertex.cex=3,
edge.label=sapply(get.edge.activity(moodyContactSim),function(e){
paste('(',e[,1],'-',e[,2],')',sep='')
}),
edge.label.col='blue',
edge.label.cex=0.7
)
In situations when we are considering transmission processes on a dynamic network, the aggregate network is not a good representation because it exaggerates the connectivity in the network by ignoring the importance of the order of edge events (Moody, 2002). For example, the apparent 8 to 11 to 1 path appears to be the shortest path traverseable in the static aggregate network. But when we consider the constraints of edge activity, the edge between 1 and 11 becomes inactive long before the edge between 8 and 11 activates, there is no such path in the dynamic network.
We could go to the other extreme, and instead of looking at the aggregate network we view only the network that is actually active at a single point in time. These networks will be much less connected than time-collapsed network. We see the edge between 1 and 11, and the edge between 11 and 8, but is difficult to determine any connectivity at all for the network as a whole.
par(mfcol=c(1,2))
plot(network.extract(moodyContactSim,at=215),
main='network at time 215',
displaylabels=TRUE,
label.cex=0.6,
label.pos=5,
vertex.col='white',
vertex.cex=3,
coord=coords)
plot(network.extract(moodyContactSim,at=750),
main='network at time 750',
displaylabels=TRUE,
label.cex=0.6,
label.pos=5,
vertex.col='white',
vertex.cex=3,
coord=coords)
A goal of the tsna
package is to allow users to flexibly
explore network data and apply metrics at multiple temporal resolutions.
We also provide some techniques for extracting networks of potential
temporal paths through the network as a means of examining transmission
potential without introducing aggregation biases.
One of the key features of this package is the ability to calculate temporal paths in networks. A forward temporal path is a sequence of vertices and directed edges such that the start times of successive elements are greater than or equal than those of the previous. The path is a directed traversal of the network that respects the constraints of edge activity spells and permits ‘waiting’ at intermediate vertices for ‘future’ edges to form.
To give a concrete example, if a network describes the time-evolution of contacts between people during which disease transmission could occur, a sequence of infection events spreading out through network from a single person would be a temporal path. Likewise, a series of messages forwarding a cute cat picture would be a temporal path in an email network. In most cases we assume that once a path (message) reaches a vertex, that vertex remains ‘infected’ and can transmit to all of its future contacts.
Depending on network topology, there may be lots of possible temporal
paths between a single pair of vertices. Bui Xuan, et al (2002) call a
permissable time-respecting path between a pair of vertices a
journey. The earliest arriving forward temporal path
from vertex i
to vertex j
can be thought of as
the soonest possible time a message sent from i
to
j
could possibly arrive. Because it depends on the sequence
of edge timing, earliest forward path is not usually equal to
the shortest path geodesic in a static network and may not be
the same as the shortest temporal path. However, we can calculate the
earliest path with relative efficiency using algorithms similar to those
used to calculate the shortest geodesic path (Bui Xuan, et al. 2002).
Please see the package documentation for for further algorithm
details.
The forward reachable set (FRS) from vertex v
at time t
is the set of vertices reachable by forward
temporal paths from v
, begining at time t
.
Also known as temporal out-component (Nicosia, et al 2013)
For a more tangible example, we can use tPath
to extract
a forward temporal path starting from vertex 10 in the Moody example
network.
$tdist
[1] 583 494 634 40 712 701 224 719 674 0 749 621 453 665 709 575
$previous
[1] 16 13 13 10 13 16 10 13 1 0 8 1 4 4 2 2
$gsteps
[1] 5 3 3 1 3 5 1 3 6 0 4 6 2 2 4 4
$start
[1] 0
$end
[1] Inf
$direction
[1] "fwd"
$type
[1] "earliest.arrive"
attr(,"class")
[1] "tPath" "list"
The tPath
function returns a tPath
object
which is a list with several components. The $tdist
component indicates the time (elapsed from start
) at which
each vertex is reached from the starting vertex v
. The
$previous
component gives, for each vertex index, the
vertex id of the previous vertex along path. As we will see, this path
information can be used to construct a new network object containing
only the tPath tree. $gsteps
gives the number of steps
(graph hops) in the path from v
for each vertex. The
remaining elements store the query parameters for later use by other
functions.
There is a special plot function for tPath
objects,
which essentially sets some useful presets for
plot.network
.
The tsna
package also provides a utility function to
plot the path on top of the static aggregate network, along with the
distance (transmission time) for each edge. We can use it to visually
compare two paths.
Extract an alternate path, this time starting from vertex 1 instead of 10
Plot both paths side-by-side.
Notice that paths are directed (even if the underlying network is not) and not always symmetric. For example vertex 10 can reach 1, but 1 cannot reach 10.
It is also possible to pass in a list of multiple paths and draw them on the same network.
# or draw both plots on the them both on the same network
plotPaths(moodyContactSim,coord=coords,list(v10path,v1path))
The forward reachable set is not the same as the backwards
reachable set (BRS) : The set of vertices that v
can
reach along paths traveling backwards in time. Or perhaps more
intuitively, the set of vertices that can reach v
along
forward paths.
plotPaths(moodyContactSim, list(
tPath(moodyContactSim,v=10,direction='bkwd',type='latest.depart'),
tPath(moodyContactSim,v=10)))
In most of the examples, we’ve been assuming that the the start of the tPath search begins at the earliest time observed in the network. But that doesn’t have to be true, a tPath can begin at any time on the network, and the size of the reachable set will often depend on the time at which the search began.
par(mfcol=1:2)
plot(tPath(moodyContactSim,v=1,start=0),coord=coords,
main='tPath from v1 @ t=0')
plot(tPath(moodyContactSim,v=1,start=500),coord=coords,
main='tPath from v1 @ t=500')
In the plots above, both tPaths started at vertex 1, but the one that started later (at t=500) doesn’t include vertices 11 and 8 because the 1-11 edge had already ended before the path search begain.
Here is a trivial example that illustrates several types of potential 2-step temporal paths from vertex A to vertex G.
First we construct a directed network with edge timing very specifically constructed to make only certain paths feasible.
pathCompare<-network.initialize(7)
network.vertex.names(pathCompare)<-LETTERS[1:7]
add.edges.active(pathCompare,tail=c(1,2),head=c(2,7),onset=c(1,4),terminus=c(2,5))
add.edges.active(pathCompare,tail=c(1,3),head=c(3,7),onset=c(0,6),terminus=c(2,7))
add.edges.active(pathCompare,tail=c(1,4),head=c(4,7),onset=c(4,5),terminus=c(5,6))
add.edges.active(pathCompare,tail=c(1,5),head=c(5,7),onset=c(6,9),terminus=c(7,10))
add.edges.active(pathCompare,tail=c(1,6),head=c(6,7),onset=c(4,10),terminus=c(5,11))
as.data.frame(pathCompare)
onset terminus tail head onset.censored terminus.censored duration edge.id
1 1 2 1 2 FALSE FALSE 1 1
2 4 5 2 7 FALSE FALSE 1 2
3 0 2 1 3 FALSE FALSE 2 3
4 6 7 3 7 FALSE FALSE 1 4
5 4 5 1 4 FALSE FALSE 1 5
6 5 6 4 7 FALSE FALSE 1 6
7 6 7 1 5 FALSE FALSE 1 7
8 9 10 5 7 FALSE FALSE 1 8
9 4 5 1 6 FALSE FALSE 1 9
10 10 11 6 7 FALSE FALSE 1 10
Now plot it, with each edge labeled with onset and terminus times, and each path type in a different color.
# pre-define some coords for arbitrary positioning
coords<-cbind(c(0,0.5,0.5,0.5,0.5,0.5,1),c(0.3,0.15,0.3,0.45,0.65,0.8,0.7))
# do the plot
plot(pathCompare,
coord=coords,jitter=FALSE,
#mode='circle',
displaylabels=TRUE, vertex.col='white',
edge.label=get.edge.activity(pathCompare),edge.label.cex=0.8,
edge.lwd=4,
edge.col=c('blue','blue','red','red','green','green','orange','orange','pink','pink'),
main='Comparison of fwd temporal path types from A to G')
# plot a legend
legend(-0.3,1,legend = c('earliest leaving (ACG @ 6)',
'earliest arriving (ABG @ 4)',
'latest leaving (AEG @ 10)',
'quickest (ADG @ 5)',
'latest arriving (AFG @ 11)'),
fill=c('red','blue','orange','green','pink'),
cex=0.8)
Most of these other path types would be useful for specific applications, but we do not yet have algorithms implemented for effiently calculating them on non-trivial networks.
Again, it is important to rember that the earliest temporal path is not always the same as the shortest (fewest step) temporal path.
For transmission processes, the forward reachable set is a theoretical upper bound (for a perfectly infectious process) on the possible number of vertices that can be reached from a source vertex within the time bounds. Hence, if we calculate the forward reachable set from every vertex, we would know the distribution of maximum possible epidemic sizes in the dynamic network. Can we characterize networks by looking at the distribution of sizes of fwd reachable sets?
This example makes use of the ConcurrencyComparisonNets
example data provided in the networkDynamicData
package.
When loaded, this brings in three discrete time networks, simulated via a tergm process, each of which is 100 time steps in duration. All three networks were parameterized to have the same size, relationship duration distribution and cross-sectional mean degree, but different cross-sectional degree distributions.
The tReach
function computes the sizes of forward
reachable sets. Since these are large-ish networks (1000 vertices), we
will only calculate the paths from a sample of 25 seed vertices to save
time.
baseTrees<-tReach(base,sample=25)
monogTrees<-tReach(monog,sample=25)
middleTrees<-tReach(middle,sample=25)
If we peek at the forward reachable set sizes we observe in the
base
network (the network with a Poisson cross-sectional
degree distribution) we see a large number of FRS reaching the majority
of the network, and a few cases the didn’t spread far at all.
[1] 468 770 162 833 799 796 381 785 559 750 785 700 5 556 519 397 406 751 780
[20] 371 1 769 726 739 627
Contrast this with the monog
network has Bernoulli (0,1)
cross-section degree distribution, meaning nobody has more than one
partner at a time. This greatly reduces the spreading potential of the
network, so almost all of the FRS are quite small.
[1] 20 23 19 20 42 22 11 27 21 2 8 10 8 10 35 43 24 8 6 11 12 4 20 31 31
We can compare the three distributions using box plots.
boxplot(cbind(baseTrees,monogTrees,middleTrees),
main='fwd-reachable set size distributions for nets of varying concurrency')
For a bit more detail on the distribution of FRS sizes, we can plot some overlapping histograms.
hist(baseTrees, main='fwd-reach size distributions',
ylim=c(0,50),xlim=c(0,1000),
breaks=seq(from=0,to=1000,by=50),
col='#55000033',xlab='reachable set size')
hist(monogTrees,ylim=c(0,50),xlim=c(0,1000),
breaks=seq(from=0,to=1000,by=50),
col='#00550033',add=TRUE)
hist(middleTrees,ylim=c(0,50),xlim=c(0,1000),
breaks=seq(from=0,to=1000,by=50),
col='#00005533',add=TRUE)
legend(800,50,legend=c('base','monog','middle'),
fill=c('#55000033','#00550033','#00005533'))
These networks would clearly have very different spreading potential, despite having nearly identical aggregate mean degrees and densities – properties which would often be used to predict the size of expected ‘giant component’ in a static network.
[1] 7.664
[1] 7.688
[1] 7.68
Dynamic networks – especially those that are constructed from multiple waves of static networks – can be be converted into a time-projected “multi-slice” static network representation which preserves the potential paths the network. This type of network is conceptually similar to the representation used internally for the forward path calculations and be very helpful for testing and understanding flow through temporal networks.
Create four random ‘slice’ networks and plot them
nets4<-replicate(4,list(network(matrix(rbinom(16,5,0.1),ncol=4,nrow=4))))
par(mfcol=c(2,2))
plot(nets4[[1]],displaylabels=TRUE,main='t0')
plot(nets4[[2]],displaylabels=TRUE,main='t1')
plot(nets4[[3]],displaylabels=TRUE,main='t2')
plot(nets4[[4]],displaylabels=TRUE,main='t3')
Aggregate them into a networkDynamic object, and project it in time by creating “identity arcs” bentween the vertices’ realizations in sucessive time slices.
Neither start or onsets specified, assuming start=0
Onsets and termini not specified, assuming each network in network.list should have a discrete spell of length 1
Argument base.net not specified, using first element of network.list instead
Created net.obs.period to describe network
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 4
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
The projected network will have size equal the the number of vertices in the original network times the number of time slices, with each vertex now appearing multiple times.
[1] 16
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
The new network will have an edge attribute named
edge.type
added to distinguish the original “within_slice”
ties from the between-slice identity arcs. We can use this control edge
color to make the plot a little clearer.
plot(nets4Projected,
displaylabels=TRUE,
mode='kamadakawai',
edge.col=ifelse(nets4Projected%e%'edge.type'=='identity_arc','gray','black'))
For a temporally sparse network like moodyContactSim this can start to get a bit croweded on the plot.
changes<-get.change.times(moodyContactSim)
moodyProj<-timeProjectedNetwork(moodyContactSim,onsets=changes,termini=changes)
plot(moodyProj,
mode='kamadakawai',
vertex.cex=0.3,
arrowhead.cex=0.1,
edge.col=ifelse(moodyProj%e%'edge.type'=='identity_arc','gray','black'))
Note that we computed a network slice at every time point that the network changed. The produced quite a lote of “extra” vertices, but does make it so that we are sure that the temporal connectivity of the original dynamic network is completly represented in the projected one.
The functions tEdgeFormation
and
tEdgeDissolution
evaluate a network object at multiple time
points and return counts of the number of edges forming (edge onset at
time point) and dissolving (edge terminus at time point). These
functions are intended to provide descriptive stats about momentary
rates of change in the network.
Both measures also include the ability to output the results as
fractions instead of a raw counts. For edge dissolution, it then returns
the the fraction of previously-active edges that dissolve at the
specified time point. We can also use the start
,
end
and time.interval
parameters to adjust how
frequently the evaluations are made.
Time Series:
Start = 0
End = 100
Frequency = 0.1
[1] 0.00000000 0.04314721 0.01891892 0.04810127 0.05329949 0.06122449
[7] 0.04282116 0.01955990 0.04018913 0.02689487 0.05822785
For formation, the fraction is defined as the number of edges that form divided by the number of possible empty dyads (pairs of vertices that do not have an edge between them) that could possibly form ties.
Time Series:
Start = 0
End = 100
Frequency = 0.1
[1] 0.000000e+00 2.203817e-05 2.804762e-05 3.606188e-05 3.405831e-05
[6] 2.604454e-05 2.804858e-05 3.806715e-05 3.205706e-05 1.602853e-05
[11] 2.804813e-05
For sparse networks, these numbers will tend to be very, very small.
Many authors have described useful generalizations of traditional
network statistics by simply applying static graph metrics at multiple
time points in order to characterize changes in the network over time.
The tsna
package provides utilities to draw on the static
metrics offered by the sna
(Butts 2014) and
ergm
(Handcock et. al. 2015) packages.
Compute graph transitivity for all time points
Time Series:
Start = 0
End = 6
Frequency = 1
Series 1
[1,] 0.8591549
[2,] 0.6374046
[3,] 0.6666667
[4,] 0.6016949
[5,] 0.6141176
[6,] 0.4492754
[7,] NA
The results are returned as a time series object, which is a special type of matrix in which the rows correspond to regularly spaced time increments and the columns are variables of interest.
Other measures, such as triad.census
, return multiple
columns, one for each statistic (counts of the the various triad
types)
# compute triad census scores for each time point
tSnaStats(harry_potter_support,snafun='triad.census')
Time Series:
Start = 0
End = 6
Frequency = 1
003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U 120C 210 300
0 40488 222 911 0 2 0 20 0 0 0 0 0 1 0 0 20
1 38582 942 1901 1 32 6 66 8 0 0 37 6 18 4 28 33
2 38726 879 1850 0 20 2 54 20 1 0 36 3 15 1 16 41
3 40378 556 668 0 5 5 21 11 0 0 4 1 4 1 0 10
4 37581 594 3145 0 22 8 106 16 0 0 93 2 7 0 12 78
5 39638 939 969 2 2 10 23 33 3 0 22 0 9 0 4 10
6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
For vertex-level measures such as betweenness
we will
get back one column for each vertex indicating how its betweenness score
changes over time. This doesn’t fit well on the page, so we will not
print it all out.
[1] 7
[1] 64
Time Series:
Start = 0
End = 6
Frequency = 1
Harry James Potter
[1,] 15.00000
[2,] 127.75000
[3,] 50.66190
[4,] 42.16667
[5,] 252.00000
[6,] 50.08333
[7,] NA
[1] "mts" "ts" "matrix" "array"
Since a time series (ts
) object really is just a fancy
matrix, we can still use matrix functions on the rows and columns.
Compute the mean (over time) betweenness of each vertex
Adrian Pucey Alicia Spinnet Angelina Johnson
0.00000000 1.03214286 0.78214286
Anthony Goldstein Blaise Zabini C. Warrington
0.00000000 0.00000000 0.00000000
Cedric Diggory Cho Chang Colin Creevey
1.33333333 0.00000000 0.00000000
Cormac McLaggen Dean Thomas Demelza Robins
0.00000000 6.79047619 0.00000000
Dennis Creevey Draco Malfoy Eddie Carmichael
0.00000000 0.00000000 0.00000000
Eleanor Branstone Ernie Macmillan Euan Abercrombie
0.00000000 5.66666667 0.00000000
Fred Weasley George Weasley Ginny Weasley
6.05714286 4.47380952 0.93055556
Graham Pritchard Gregory Goyle Hannah Abbott
0.00000000 0.00000000 0.00000000
Harry James Potter Hermione Granger Jimmy Peakes
89.61031746 8.77222222 0.00000000
Justin Finch-Fletchley Katie Bell Kevin Whitby
5.66666667 0.00000000 0.00000000
Lavender Brown Leanne Lee Jordan
0.00000000 0.00000000 0.66666667
Lucian Bole Luna Lovegood Malcolm Baddock
0.00000000 0.04166667 0.00000000
Mandy Brocklehurst Marcus Belby Marcus Flint
0.00000000 0.00000000 0.00000000
Michael Corner Miles Bletchley Millicent Bulstrode
0.00000000 0.00000000 0.00000000
Natalie McDonald Neville Longbottom Oliver Wood
0.00000000 1.70000000 0.00000000
Orla Quirke Owen Cauldwell Padma Patil
0.00000000 0.00000000 0.00000000
Pansy Parkinson Parvati Patil Penelope Clearwater
0.33333333 0.00000000 0.00000000
Percy Weasley Peregrine Derrick Roger Davies
0.00000000 0.00000000 0.00000000
Romilda Vane Ronald Weasley Rose Zeller
0.00000000 22.58015873 0.00000000
Seamus Finnigan Stewart Ackerley Susan Bones
2.64603175 0.00000000 0.00000000
Terry Boot Theodore Nott Vincent Crabbe
0.41666667 0.00000000 0.00000000
Zacharias Smith
0.00000000
Or look at it the other way and compute mean betweenness of all vertices at each timepoint
[1] 0.312500 3.828125 2.546875 0.984375 5.781250 1.500000 NA
Not surprisingly, HP has the highest betweeness.
Since the sna
metrics are usually not cheap to
calculate, we may not want to evaluate them at every single time step
for larger longer-duration networks. If we just need a sense of the
trend in the network, we can use the time.interval
parameter to sample.
And notice that we also passed in the rescale=TRUE
argument used by the prestige
function. If the effective
size of the network is changing due to vertex activity, this would allow
the sna metric to renormalize with network size to reduce possible
variations due to making comparisons between networks of varying
size.
Each of the model terms provided by the ergm
package
(and its various add-ons) provide a ‘change statistic’ for evaluating
the effect of changing a single tie on network structure. These terms
can be effectively used as descriptive statistic for static networks.
For the static case, this is done by passing a formula naming the
desired statistics to ergm’s summary
function.
The tsna
package provides a wrapper to evaluate the
statistics at multiple time points and return the result as a time
series (ts
) object with a value for each statistic at each
time point.
Time Series:
Start = 0
End = 100
Frequency = 0.1
edges concurrent
0 371 186
10 377 186
20 363 154
30 376 162
40 373 166
50 368 169
60 380 181
70 401 203
80 406 191
90 398 189
100 372 166
The ts
object has a really handy print function that can
display the time series for each statistic side-by-side or overlapped on
a single plot.
Note that just like for the sna
functions, some
ergm
terms (such as degree
) produce multiple
statistics in the output. Also, there is some overlap with functionality
appearing in the sna package i.e. (sna’s degree
vs ergm
sociality
).
The tsna
package currently contains several functions
for reporting on the durations of the activity events (spells)
associated with a network. Most of these functions have options for
varying the units of aggregation or analysis. For example, the edge
measures can be adjusted from focusing on an ‘edge’ (aggregating across
all of the activity spells associated with that edge) or a ‘spell’ (a
single interval of contiguous time during which an edge is
activated).
The edgeDuration
function returns the activity durations
associated with the edges of the network.
[1] 32 33 32 26 30 24 32 30 26 27 27 32 31 27 26 34 44 26
This output means that the first edge was active for 32 time units, the second for 33, etc. This useful when we want to find out what the distribution of edge durations is shaped like.
Min. 1st Qu. Median Mean 3rd Qu. Max.
24.00 26.25 30.00 29.94 32.00 44.00
Durations for empty dyads (non existing edges) are not reported.
For a more realistic example, we can compare the distributions of
several networks simulated using the tergm
package. (Please
see ?concurrencyComparisonNets
for more background on these
networks) We would expect that since the concurrency simulations all had
the same dissolution parameter in the model, which was simply a uniform
probability that an edge would be deactivated at each time step, they
should all have similarly shaped distributions of edge durations.
The durations functions also have the ability to count the number of spells instead of aggregating their durations. This is mostly useful for working with event-based continuous data (see below) but also makes it so we can quickly determine if the network has re-occurring ties. In other words, which edges have more than one activity spell associated with them?
[1] 105 366 888
integer(0)
All of the edges in the moodyContactSim
network are only
active once (making many times of calculations much simpler), where in
the monog
network, there are a few edges that toggle on and
off and then back on again. By default, the edgeDuration
aggregates at the level of subject='edge'
summing the
durations together. Setting subject='spells'
will return /
count each activity spell independently. For example, we just query the
duration of the 105th valid edge id which was reported as having
multiple spells.
[1] 31
[1] 13 18
For networks in which edges rarely re-activate, there may be little
difference. But we were trying to determine something like mean values
of activity durations, we’d probably want to use
subject='spells'
to count each edges’ spells
separately.
[1] 20.15911
[1] 20.1485
For networks with a changing set of active vertices, the durations
can be summarized using the vertexDuration
function. The
windsurfers
data set only records interactions between the
surfers observed on the specific beach. Since most of the surfers don’t
make it to beach every day and can’t be observed, they are considered
inactive.
[1] 27 13 19 22 5 8 5 2 1 4 5 9 7 8 7 18 9 4 2 1 5 3 6 6 4
[26] 6 7 6 14 5 4 1 1 2 4 1 8 3 14 8 8 3 3 7 3 3 1 8 2 3
[51] 6 4 3 1 2 3 2 4 7 7 2 3 3 2 5 6 5 3 6 7 2 1 1 14 2
[76] 2 2 5 3 2 2 2 2 4 3 1 3 2 3 2 1 2 1 1 1
1 2 3 4 5 6 7 8 9 13 14 18 19 22 27
14 19 16 8 8 7 7 6 2 1 3 1 1 1 1
Again, setting subject='spells'
changes the
interpretation from “give me the total number of days each vertex was
observed to be active in the dataset” to “give me the number of
consecutive days in each spell of days the vertices were observed to be
active”
The tiedDuration
function measures the total amount of
time each vertex has ties during the network observation period.
Although it is possible to represent networks in which edges link to
inactive vertices, we assume that the situations are rare enough that we
have not implemented a specific measure to describe it. (For data
cleaning purposes, such cases can be discovered using
network.dynamic.check
.)
The McFarland classroom interaction dataset is a collection of time-stamped speech acts among teachers and students during 40 minutes of classroom time. How much total talking does each person in the room do?
[1] 29 0 27 63 25 23 164 19 18 32 19 39 20 81 20 26 32 29 0
[20] 25
Notice that since the edges in this network are coded has events with no duration, we actually asked it to consider counts instead of durations. So there is one person (7) who talks a lot more, and two people who don’t talk at all (2 and 19). Could this be related to their roles?
[1] "grade_11" "grade_11" "grade_11" "grade_12" "grade_12"
[6] "grade_11" "instructor" "grade_11" "grade_12" "grade_11"
[11] "grade_12" "grade_11" "grade_12" "instructor" "grade_11"
[16] "grade_11" "grade_11" "grade_11" "grade_11" "grade_11"
Well, at least the talker is one of the teachers. Since this is a directed network, it might also be interesting to look at it in terms of who people talk to
[1] 38 10 37 55 37 32 32 29 28 47 29 49 30 46 30 36 42 40 10 34
Or perhaps a ratio of speaking vs. being spoken to.
plot(tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'out'),
tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'in'),
xlab='# speaking events',ylab='# spoken to events',main='McFarland classroom network, speaking vs. spoken to' )
text(tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'out'),
tiedDuration(cls33_10_16_96, mode='counts',neighborhood = 'in'),
label=cls33_10_16_96%v%'type',cex=0.8,pos=4)
If we turn to the concurrency comparison networks, it is interesting
to compare the distributions of connected duration for the three
scenarios. We might assume that vertices that are connected longer are
more likely to be reached (or infected) because, ignoring topology, they
simply have more opertunities to transmit or recieve. Since this is an
undirected network, tiedDuration
totals the amount of time
that each vertex appears as the tail or head of each edge.
plot(sort(tiedDuration(base)),type='l',ylim=c(0,400),
main='sorted tiedDuration for concurrency scenearios',
xlab='sorted vertices',ylab='duration that each vertex is connected', col ='#55000033',lwd=4)
points(sort(tiedDuration(monog)),type='l',col='#00550033',lwd=4)
points(sort(tiedDuration(middle)),type='l',col='#00005533',lwd=4)
legend(200,300,legend=c('base','monog','middle'),
fill=c('#55000033','#00550033','#00005533'))
[1] 76.524
[1] 76.102
[1] 73.804
The means for the networks are quite similar, as we would expect
since they were simulated to have the same momentary mean degree, but
the shape of the distributions are somewhat different. There is very
little variation in the monog
version, but the
middle
and base
networks each have some
vertices have much higher connection rates.
Like many of the temporal measures, for truly random graphs we might expect a strong correlation between the number of people contacted (degree) and the total contact time (tiedDuration). But of course there is always the possibility of someone having a very few long-duration ties, or very many short ties.
Indeed, plotting aggregate degree against connected duration in the simulated networks suggests they are pretty tightly correlated.
par(mfcol=c(1,3))
plot(degree(as.network(base)),tiedDuration(base),xlim=c(0,25),ylim=c(0,350),main='base')
plot(degree(as.network(middle)),tiedDuration(middle),xlim=c(0,25), ylim=c(0,350),main='middle')
plot(degree(as.network(monog)),tiedDuration(monog),xlim=c(0,25),ylim=c(0,350),main='monog')
We can compare the measures on observations for each person in the real-world observations of the hospital contact network.
Lets put a bunch of the example networks together on a list so we apply the metrics to all of them at once
data(moodyContactSim)
data(harry_potter_support)
data(onlineNetwork)
data(vanDeBunt_students)
data(McFarland_cls33_10_16_96)
data(windsurfers)
data(hospital_contact)
data(concurrencyComparisonNets)
nets<-list(
moodyContactSim=moodyContactSim,
hospital=hospital,
base=base,
monog=monog,
harry_potter=harry_potter_support,
onlineNet=onlineNet,
vanDeBunt=vanDeBunt_students,
McFarland=cls33_10_16_96,
windsurfers=windsurfers)
Plot histograms for the tie durations for 9 example networks
par(mfcol=c(3,3))
for (n in seq_along(nets)){
hist(edgeDuration(nets[[n]]),main=names(nets)[n],xlab='duration')
}
Notice that onlineNet, vanDeBunt and McFarland’s classroom all have momentary events, so durations are not a very useful metric. Hospital has very short durations (20 seconds) compared to the overall time (~350,000 seconds). Really should be corrected to measure as if it was discrete time with 20 sec time steps. And since we are not considering the actual units of time, doesn’t really make sense to put them on the same plot
If we look at it by event counts per edge instead of durations, the momentary event networks rank much higher.
The tEdgeDensity
function is a network-level metric to
compute the temporal density of a networkDynamic object. It can
be thought of as a crude measure of the total relational activity in the
network. It is expressed as a measure of the total amount of time
vertices are tied by active edges in the network, divided by the amount
of time they could possibly be tied.
For all the dyads which are ever observed to have edges, what fraction of the time are they tied?
[1] 0.1996973
So for the base network, if an edge is observed, it is likely to be
active 19% of the time. This may be useful, but it doesn’t really tell
us how connected the overall network is because we could get this value
for a network with only a single edge that is active for 19% of the
observational period. We can change the aggregation unit from
'edge'
to 'dyad'
so that the denominator will
be the total amount of time dyads could possibly be connected.
For all the possible dyads in the network, what fraction of time are they actually tied?
[1] 0.000766006
So, for the base
network, if we peek at a randomly
chosen dyad at randomly chosen time in the observation period there is
only an 8 in 10,000 chance that they will be connected by an active
edge! But of course, just like static density measures, this form of
density is dependent on network size.
What fraction of time the observed edges in the example networks active?
edd<-sapply(nets,tEdgeDensity)
plot(edd,main='edge duration density',xaxt='n',xlab='networks')
text(edd,label=names(edd),pos=4)
If we look at it by events, the momentary event networks rank much higher.
How many events are there in a unit of time? (of course this depends on having sensible units of time)
eed<-sapply(nets,tEdgeDensity,mode='event')
plot(eed,main='edge event density',xaxt='n',xlab='networks')
text(eed,label=names(eed),pos=4)
Compute duration of active ties as fraction of the available dyads?
ddd<-sapply(nets,tEdgeDensity,agg.unit='dyad')
plot(ddd,main='dyad duration density',xaxt='n',xlab='networks')
text(ddd,label=names(ddd),pos=4)
Windsurfers should probably have a correction for dyad duration density, since the vertex set varies, so not not all dyads are always available.
The problem with these measures is that, like density, the values will be very, very low for real world networks so kind of hard to compare.
In addition to measures of momentary structure, dynamic or
longitudinal networks also give us the ability to explore sequences or
patterns of events. The package relevent
(Butts 2008)
includes some statistical techniques for modeling relational event
sequences, and the tsna
package borrows the participation
shift summary statistics from it.
The function pShiftCount
computes counts of dyadic
turn-taking events using a typology outlined by Gibson (2003).
Essentially, it looks at the sequence of events (rather than the exact
timing) and counts up instances of various types of ‘turn taking’ speech
acts. For example, two edges in sequence, ‘i’ talks to ‘j’, and then ‘j’
responds to ‘i’, is counted as one type of event, where ‘i’ talks to ‘j’
and then ‘j’ talks to ‘k’ would be another type of event. The set of
events Gibson was interested in is certainly not an exhaustive list of
potential sequences, but may still be an interesting approach for
characterizing dynamic networks.
We can apply the the pShiftCount
metric to the McFarland
dataset of conversational speech acts in a classroom.
Loading required namespace: relevent
AB-BA AB-B0 AB-BY A0-X0 A0-XA A0-XY AB-X0 AB-XA AB-XB AB-XY A0-AY AB-A0
691 247 2 45 3 2 5 4 7 8 155 0 1
AB-AY
691 29
By default, it gives us aback a vector with the counts of each type
participation shifts. After consulting the docs
(?pShiftcount
) we see that there is a lot of call and
response ‘Turn Receiving’ (‘AB-BA’) as well as quite a bit of ‘Turn
Usurping’ (‘AB-XY’) events.
Since we don’t really know what these numbers mean in absolute terms, it would be interesting to see if the counts change at various segments of time. So divide the time up into 5-minute intervals and compute counts for each
sliceCounts<- lapply(seq(from = 0,to=45,by = 5),function(onset){
pShiftCount(network.extract(cls33_10_16_96,onset,length = 5))
})
# convert to a matrix
sliceCounts<-do.call(rbind,sliceCounts)
sliceCounts
AB-BA AB-B0 AB-BY A0-X0 A0-XA A0-XY AB-X0 AB-XA AB-XB AB-XY A0-AY AB-A0
67 10 1 1 1 0 1 0 3 0 2 0 0
134 21 0 2 1 1 3 3 0 0 11 0 1
46 23 0 4 0 0 0 0 0 1 13 0 0
47 23 0 4 0 0 0 0 3 0 15 0 0
103 31 0 5 1 0 1 1 0 1 23 0 0
71 36 0 6 0 0 0 0 1 2 24 0 0
74 37 0 11 0 0 0 0 0 0 22 0 0
76 38 0 7 0 0 0 0 0 1 26 0 0
73 25 1 5 0 1 0 0 0 3 15 0 0
AB-AY
67 11
134 0
46 4
47 1
103 3
71 1
74 2
76 3
73 3
# plot
plot(sliceCounts[,'AB-BA'],type='b',col='blue',ylim=c(0,170),
main='pShift counts for 5-min intervals of cls33',
ylab='count of selected pShift',xlab='slice index')
points(sliceCounts[,'AB-AY'],type='b',col='red')
points(sliceCounts[,'AB-XA'],type='b',col='green')
Perhaps it is interesting that the ‘AB-BA’ call-and-response starts out low and increases for most of the class, in contrast to the where the ‘AB-AY’ counts? But I think we’d need to go back and look at the narrative accounts for the classroom.
This package is far from complete. Most functions do not yet include ability to bin times necessary for working with certain types of continuous time networks. Include vertex activity functions, and correctly account for vertex activity in edge functions.
To cite package 'tsna' in publications use:
Bender-deMoll S, Morris M (2021). _tsna: Tools for Temporal Social
Network Analysis_. R package version 0.3.5, <http://statnet.org/>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {tsna: Tools for Temporal Social Network Analysis},
author = {Skye Bender-deMoll and Martina Morris},
year = {2021},
note = {R package version 0.3.5},
url = {http://statnet.org/},
}
Vincenzo Nicosia, John Tang, Cecilia Mascolo, Mirco Musolesi, Giovanni Russo, and Vito Latora (2013) Graph Metrics for Temporal Networks. https://arxiv.org/pdf/1306.0493v1.pdf
Skye Bender-deMoll (2014). networkDynamicData: dynamic network datasets. R package version 0.1.0. https://CRAN.R-project.org/package=networkDynamicData
B. Bui Xuan, Afonso Ferreira, Aubin Jarry. “Computing shortest, fastest, and foremost journeys in dynamic networks”. RR-4589, 2002. https://hal.inria.fr/inria-00071996/document
Carter T. Butts, Ayn Leslie-Cook, Pavel N. Krivitsky and Skye Bender-deMoll (2015). networkDynamic: Dynamic Extensions for Network Objects. R package version 0.8. https://statnet.org
Carter T. Butts (2008). A Relational Event Framework for Social Action. Sociological Methodology, 38(1), 155-200.
Carter T. Butts (2014). sna: Tools for Social Network Analysis. R package version 2.3-2. https://CRAN.R-project.org/package=sna
Gibson, D.R. (2003) ‘Participation Shifts: Order and Differentiation in Group Conversation’ Social Forces 81 (4): 1335-1380 https://academic.oup.com/sf/article-abstract/81/4/1335/2234586
Handcock M, Hunter D, Butts C, Goodreau S, Krivitsky P and Morris M (2015). ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks. The Statnet Project (<URL: https://statnet.org>). R package version 3.4.0, <URL: https://CRAN.R-project.org/package=ergm>.
Moody, James. (2002) “The Importance of Relationship Timing for Diffusion.” Social Forces 81:25-56
Moody, James (2008) “Static Representations of Dynamic Networks” Duke Population Research Institute On-line Working Paper Series. http://www.soc.duke.edu/~jmoody77/StatDyn_5.pdf