Reproducing Goeyvaerts et al. (2018) using ergm.multi

library(ergm.multi)
library(dplyr)
library(purrr)
library(tibble)
library(ggplot2)

Obtaining data

The list of networks studied by Goeyvaerts et al. (2018) is included in this package:

data(Goeyvaerts)
length(Goeyvaerts)
## [1] 318

An explanation of the networks, including a list of their network (%n%) and vertex (%v%) attributes, can be obtained via ?Goeyvaerts. A total of 318 complete networks were collected, then two were excluded due to “nonstandard” family composition:

Goeyvaerts %>% discard(`%n%`, "included") %>% map(as_tibble, unit="vertices")
## [[1]]
## # A tibble: 4 × 5
##   vertex.names   age gender na    role       
## *        <int> <int> <chr>  <lgl> <chr>      
## 1            1    32 F      FALSE Mother     
## 2            2    48 F      FALSE Grandmother
## 3            3    32 M      FALSE Father     
## 4            4    10 F      FALSE Child      
## 
## [[2]]
## # A tibble: 3 × 5
##   vertex.names   age gender na    role  
## *        <int> <int> <chr>  <lgl> <chr> 
## 1            1    29 F      FALSE Mother
## 2            2    28 F      FALSE Mother
## 3            3     0 F      FALSE Child

To reproduce the analysis, exclude them as well:

G <- Goeyvaerts %>% keep(`%n%`, "included")

Data summaries

Obtain weekday indicator, network size, and density for each network, and summarize them as in Goeyvaerts et al. (2018) Table 1:

G %>% map(~list(weekday = . %n% "weekday",
                n = network.size(.),
                d = network.density(.))) %>% bind_rows() %>%
  group_by(weekday, n = cut(n, c(1,2,3,4,5,9))) %>%
  summarize(nnets = n(), p1 = mean(d==1), m = mean(d)) %>% kable()
weekday n nnets p1 m
FALSE (1,2] 3 1.0000000 1.0000000
FALSE (2,3] 19 0.7368421 0.8771930
FALSE (3,4] 48 0.8541667 0.9618056
FALSE (4,5] 18 0.7777778 0.9500000
FALSE (5,9] 3 1.0000000 1.0000000
TRUE (1,2] 9 1.0000000 1.0000000
TRUE (2,3] 53 0.9056604 0.9622642
TRUE (3,4] 111 0.7747748 0.9279279
TRUE (4,5] 39 0.6410256 0.8974359
TRUE (5,9] 13 0.4615385 0.8454212

Reproducing ERGM fits

We now reproduce the ERGM fits. First, we extract the weekday networks:

G.wd <- G %>% keep(`%n%`, "weekday")
length(G.wd)
## [1] 225

Next, we specify the multi-network model using the N(formula, lm) operator. This operator will evaluate the ergm formula formula on each network, weighted by the predictors passed in the one-sided lm formula, which is interpreted the same way as that passed to the built-in lm() function, with its “data” being the table of network attributes.

Since different networks may have different compositions, to have a consistent model, we specify a consistent list of family roles.

roleset <- sort(unique(unlist(lapply(G.wd, `%v%`, "role"))))

We now construct the formula object, which will be passed directly to ergm():

# Networks() function tells ergm() to model these networks jointly.
f.wd <- Networks(G.wd) ~
  # This N() operator adds three edge counts:
  N(~edges,
    ~ # one total for all networks  (intercept implicit as in lm),
      I(n<=3)+ # one total for only small households, and
      I(n>=5) # one total for only large households.
    ) +

  # This N() construct evaluates each of its terms on each network,
  # then sums each statistic over the networks:
  N(
      # First, mixing statistics among household roles, including only
      # father-mother, father-child, and mother-child counts.
      # Since tail < head in an undirected network, in the
      # levels2 specification, it is important that tail levels (rows)
      # come before head levels (columns). In this case, since
      # "Child" < "Father" < "Mother" in alphabetical order, the
      # row= and col= categories must be sorted accordingly.
    ~mm("role", levels = I(roleset),
        levels2=~.%in%list(list(row="Father",col="Mother"),
                           list(row="Child",col="Father"),
                           list(row="Child",col="Mother"))) +
      # Second, the nodal covariate effect of age, but only for
      # edges between children.
      F(~nodecov("age"), ~nodematch("role", levels=I("Child"))) +
      # Third, 2-stars.
      kstar(2)
  ) +
  
  # This N() adds one triangle count, totalled over all households
  # with at least 6 members.
  N(~triangles, ~I(n>=6))

See ergmTerm?mm for documentation on the mm term used above. Now, we can fit the model:

# (Set seed for predictable run time.)
fit.wd <- ergm(f.wd, control=snctrl(seed=123))
summary(fit.wd)
## Call:
## ergm(formula = f.wd, control = snctrl(seed = 123))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                                         Estimate Std. Error MCMC % z value Pr(>|z|)    
## N(1)~edges                                               0.81062    0.53843      0   1.506 0.132190    
## N(I(n <= 3)TRUE)~edges                                   1.49860    0.45039      0   3.327 0.000877 ***
## N(I(n >= 5)TRUE)~edges                                  -0.80876    0.20790      0  -3.890 0.000100 ***
## N(1)~mm[role=Child,role=Father]                         -0.64067    0.48780      0  -1.313 0.189051    
## N(1)~mm[role=Child,role=Mother]                          0.12369    0.51625      0   0.240 0.810645    
## N(1)~mm[role=Father,role=Mother]                         0.24913    0.57329      0   0.435 0.663873    
## N(1)~F(nodematch("role",levels=I("Child")))~nodecov.age -0.07149    0.01640      0  -4.358  < 1e-04 ***
## N(1)~kstar2                                             -0.24476    0.21283      0  -1.150 0.250142    
## N(1)~triangle                                            2.04816    0.30939      0   6.620  < 1e-04 ***
## N(I(n >= 6)TRUE)~triangle                               -0.28750    0.11285      0  -2.548 0.010845 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1975.5  on 1425  degrees of freedom
##  Residual Deviance:  610.1  on 1415  degrees of freedom
##  
## AIC: 630.1  BIC: 682.8  (Smaller is better. MC Std. Err. = 0.6593)

Similarly, we can extract the weekend network, and fit it to a smaller model. We only need one N() operator, since all statistics are applied to the same set of networks, namely, all of them.

G.we <- G %>% discard(`%n%`, "weekday")
fit.we <- ergm(Networks(G.we) ~
                 N(~edges +
                     mm("role", levels=I(roleset),
                        levels2=~.%in%list(list(row="Father",col="Mother"),
                                           list(row="Child",col="Father"),
                                           list(row="Child",col="Mother"))) +
                     F(~nodecov("age"), ~nodematch("role", levels=I("Child"))) +
                     kstar(2) +
                     triangles), control=snctrl(seed=123))
summary(fit.we)
## Call:
## ergm(formula = Networks(G.we) ~ N(~edges + mm("role", levels = I(roleset), 
##     levels2 = ~. %in% list(list(row = "Father", col = "Mother"), 
##         list(row = "Child", col = "Father"), list(row = "Child", 
##             col = "Mother"))) + F(~nodecov("age"), ~nodematch("role", 
##     levels = I("Child"))) + kstar(2) + triangles), control = snctrl(seed = 123))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                                                         Estimate Std. Error MCMC % z value Pr(>|z|)    
## N(1)~edges                                               2.13967    1.54965      0   1.381   0.1674    
## N(1)~mm[role=Child,role=Father]                         -1.14670    1.56584      0  -0.732   0.4640    
## N(1)~mm[role=Child,role=Mother]                          0.15849    1.71268      0   0.093   0.9263    
## N(1)~mm[role=Father,role=Mother]                        -0.73647    1.59575      0  -0.462   0.6444    
## N(1)~F(nodematch("role",levels=I("Child")))~nodecov.age -0.17544    0.05526      0  -3.175   0.0015 ** 
## N(1)~kstar2                                             -0.84210    0.35137      0  -2.397   0.0165 *  
## N(1)~triangle                                            3.53822    0.73617      0   4.806   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 802.7  on 579  degrees of freedom
##  Residual Deviance: 130.8  on 572  degrees of freedom
##  
## AIC: 144.8  BIC: 175.3  (Smaller is better. MC Std. Err. = 0.7861)

Diagnostics

Perform diagnostic simulation (Krivitsky et al. 2023), summarize the residuals, and make residuals vs. fitted and scale-location plots:

gof.wd <- gofN(fit.wd, GOF = ~ edges + kstar(2) + triangles)
summary(gof.wd)
## $`Observed/Imputed values`
##      edges            kstar2         triangle     
##  Min.   : 1.000   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 3.000   1st Qu.: 3.00   1st Qu.: 1.000  
##  Median : 6.000   Median :12.00   Median : 4.000  
##  Mean   : 5.778   Mean   :13.55   Mean   : 4.324  
##  3rd Qu.: 6.000   3rd Qu.:12.00   3rd Qu.: 4.000  
##  Max.   :18.000   Max.   :78.00   Max.   :23.000  
##                   NAs    :9       NAs    :9       
## 
## $`Fitted values`
##      edges            kstar2          triangle     
##  Min.   : 0.800   Min.   : 2.600   Min.   : 0.830  
##  1st Qu.: 2.950   1st Qu.: 8.227   1st Qu.: 2.493  
##  Median : 5.590   Median :10.600   Median : 3.400  
##  Mean   : 5.809   Mean   :13.707   Mean   : 4.387  
##  3rd Qu.: 5.780   3rd Qu.:11.415   3rd Qu.: 3.723  
##  Max.   :14.780   Max.   :58.310   Max.   :19.190  
##                   NAs    :9        NAs    :9       
## 
## $`Pearson residuals`
##      edges               kstar2             triangle        
##  Min.   :-4.472091   Min.   :-4.043767   Min.   :-4.337050  
##  1st Qu.: 0.214840   1st Qu.: 0.203101   1st Qu.: 0.203101  
##  Median : 0.359188   Median : 0.397307   Median : 0.418088  
##  Mean   : 0.006399   Mean   :-0.001824   Mean   :-0.001926  
##  3rd Qu.: 0.476827   3rd Qu.: 0.511559   3rd Qu.: 0.534335  
##  Max.   : 0.884271   Max.   : 0.962211   Max.   : 1.018892  
##                      NAs    :9           NAs    :9          
## 
## $`Variance of Pearson residuals`
## $`Variance of Pearson residuals`$edges
## [1] 0.9500859
## 
## $`Variance of Pearson residuals`$kstar2
## [1] 0.9648911
## 
## $`Variance of Pearson residuals`$triangle
## [1] 0.9691854
## 
## 
## $`Std. dev. of Pearson residuals`
## $`Std. dev. of Pearson residuals`$edges
## [1] 0.9747235
## 
## $`Std. dev. of Pearson residuals`$kstar2
## [1] 0.9822887
## 
## $`Std. dev. of Pearson residuals`$triangle
## [1] 0.9844721

Variances of Pearson residuals substantially greater than 1 suggest unaccounted-for heterogeneity.

autoplot(gof.wd)
## [[1]]

## 
## [[2]]

## 
## [[3]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[4]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[5]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[6]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

The plots don’t look unreasonable.

Also make plots of residuals vs. square root of fitted and vs. network size:

autoplot(gof.wd, against=sqrt(.fitted))
## [[1]]

## 
## [[2]]

## 
## [[3]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[4]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[5]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[6]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

autoplot(gof.wd, against=ordered(n))
## [[1]]
## Warning: Computation failed in `stat_boxplot()`.
## Caused by error in `loadNamespace()`:
## ! there is no package called 'quantreg'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 0.975
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 2.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 6.9393e-17
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1

## 
## [[2]]
## Warning: Computation failed in `stat_boxplot()`.
## Caused by error in `loadNamespace()`:
## ! there is no package called 'quantreg'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 0.975
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 2.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 6.9393e-17
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1

## 
## [[3]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Computation failed in `stat_boxplot()`.
## Caused by error in `loadNamespace()`:
## ! there is no package called 'quantreg'
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[4]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Computation failed in `stat_boxplot()`.
## Caused by error in `loadNamespace()`:
## ! there is no package called 'quantreg'
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[5]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Computation failed in `stat_boxplot()`.
## Caused by error in `loadNamespace()`:
## ! there is no package called 'quantreg'
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

## 
## [[6]]
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_boxplot()`).
## Warning: Computation failed in `stat_boxplot()`.
## Caused by error in `loadNamespace()`:
## ! there is no package called 'quantreg'
## Warning: Removed 9 rows containing non-finite outside the scale range (`stat_smooth()`).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at 1.98
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 1.02
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 1
## Warning: Removed 9 rows containing missing values or values outside the scale range (`geom_text_repel()`).

It looks like network-size effects are probably accounted for.

References

Goeyvaerts, Nele, Eva Santermans, Gail Potter, et al. 2018. “Household Members Do Not Contact Each Other at Random: Implications for Infectious Disease Modelling.” Proceedings of the Royal Society B: Biological Sciences 285 (1893): 20182201. https://doi.org/10.1098/rspb.2018.2201.
Krivitsky, Pavel N., Pietro Coletti, and Niel Hens. 2023. “A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks.” Journal of the American Statistical Association 118 (544): 2213–24. https://doi.org/10.1080/01621459.2023.2242627.