HW9 Cheat Sheet

This is a cheat sheet for the ninth homework on exponential random graph models. It works through each step in R.

First, you need the following packages.

Code

library(statnet)

1. Basic Description

First, we read the data, peoria.n. This is a statnet object and so we can use summary to inspect descriptive aspects of the network and plot to see how it looks.

Code

load("data/peoria.RDA")

summary(peoria.n, print.adj=FALSE)

Network attributes:
  vertices = 117
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 543 
   missing edges = 0 
   non-missing edges = 543 
 density = 0.04000884 

Vertex attributes:

 city:
   integer valued attribute
   117 values

 time:
   integer valued attribute
   117 values
  vertex.names:
   character valued attribute
   117 valid vertex names

No edge attributes

Code

plot(peoria.n)

2. ERGMs

Here we move through different types of exponential random graph models.

2a. Bernoulli Graph

Now we can use ergm to build a model with a edges parameter.

Code

model1 <- ergm(peoria.n ~ edges)

summary(model1)

Call:
ergm(formula = peoria.n ~ edges)

Maximum Likelihood Results:

      Estimate Std. Error MCMC % z value Pr(>|z|)    
edges  -3.1778     0.0438      0  -72.56   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 18815  on 13572  degrees of freedom
 Residual Deviance:  4559  on 13571  degrees of freedom
 
AIC: 4561  BIC: 4569  (Smaller is better. MC Std. Err. = 0)

The Bernoulli model suggests that Peoria network has a significantly lower density.

Recall that the exponentiated coefficient for edges is the same as network density.

Code

plogis(coef(model1))

     edges 
0.04000884

Code

network.density(peoria.n)

[1] 0.04000884

So there is .04 probability of an edge between two nodes in this graph.

2.b Homophily in terms of time

Now we can look at the time variable.

Code

model2 <- ergm(peoria.n ~ edges + nodematch("time"))

summary(model2)

Call:
ergm(formula = peoria.n ~ edges + nodematch("time"))

Maximum Likelihood Results:

               Estimate Std. Error MCMC % z value Pr(>|z|)    
edges           -3.3327     0.0520      0 -64.091   <1e-04 ***
nodematch.time   0.6672     0.0969      0   6.886   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 18815  on 13572  degrees of freedom
 Residual Deviance:  4516  on 13570  degrees of freedom
 
AIC: 4520  BIC: 4535  (Smaller is better. MC Std. Err. = 0)

There is a positive and significant time homophily effect. So, there is kind of a cohort effect in terms of who is connected to whom.

Code

plogis(coef(model2))

         edges nodematch.time 
    0.03446724     0.66087144

Matching on time increases the likelihood of tie formation by .66.

But note that we aren’t really tapping into network effects and the MCMC isn’t being deployed

Let’s look at some endogenous effects.

2.c Local Clustering?

Now let’s see if there is local clustering. The obvious place to start is the triangle parameter.

Code

model3 <- ergm(peoria.n ~ edges + nodematch("time") + triangle)

Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), : Number of edges in a simulated network exceeds that in the observed by a factor of more than 20. This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter.

As is common, the ERGM has trouble fitting the triangle parameter and therefore degeneracy has occured. Let’s use the geometrically weighted edgewise shared partner to evaluate local clustering. GWESP is the most common parameter to use as a local clustering alternative.

Code

model3b <- ergm(peoria.n ~ edges + nodematch("time") + gwesp(0, fixed = TRUE))

summary(model3b)

Call:
ergm(formula = peoria.n ~ edges + nodematch("time") + gwesp(0, 
    fixed = TRUE))

Monte Carlo Maximum Likelihood Results:

               Estimate Std. Error MCMC % z value Pr(>|z|)    
edges          -4.21308    0.06677      0 -63.095   <1e-04 ***
nodematch.time  0.50890    0.07912      0   6.432   <1e-04 ***
gwesp.fixed.0   1.24444    0.06517      0  19.095   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 18815  on 13572  degrees of freedom
 Residual Deviance:  4257  on 13569  degrees of freedom
 
AIC: 4263  BIC: 4285  (Smaller is better. MC Std. Err. = 0.6366)

gwesp is significant and positive, so clustering affects tie formation. Let’s exponentiate.

Code

plogis(coef(model3b))

         edges nodematch.time  gwesp.fixed.0 
    0.01458491     0.62454873     0.77633618

This shows that if a pair of nodes “have any positive number of friends in common and each of them is in at least one other triangle with each of those friends” then the probablity of a tie between them increases by .77.

2.d Reciprocity?

Let’s evaluate reciprocity in the Peoria doctors network. We use the mutuality parameter in this directed graph to evaluate the effects of whether people send and receive ties to one another.

Code

model4 <- ergm(peoria.n ~ edges + nodematch("time") + gwesp(0, fixed = TRUE) + mutual)

summary(model4)

Call:
ergm(formula = peoria.n ~ edges + nodematch("time") + gwesp(0, 
    fixed = TRUE) + mutual)

Monte Carlo Maximum Likelihood Results:

               Estimate Std. Error MCMC % z value Pr(>|z|)    
edges          -4.17858    0.06685      0 -62.504   <1e-04 ***
nodematch.time  0.44043    0.07351      0   5.991   <1e-04 ***
gwesp.fixed.0   0.94272    0.07005      0  13.459   <1e-04 ***
mutual          1.97352    0.16302      0  12.106   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 18815  on 13572  degrees of freedom
 Residual Deviance:  4143  on 13568  degrees of freedom
 
AIC: 4151  BIC: 4181  (Smaller is better. MC Std. Err. = 0.9074)

And reciprocity has a very large positive effect on tie formation improving the log-odds by nearly 2.

Code

plogis(coef(model4))

         edges nodematch.time  gwesp.fixed.0         mutual 
    0.01508912     0.60836243     0.71964944     0.87798851

The probability of a tie increases by .89 given reciprocity.

Now that we have built a model, let’s evaluate its goodness-of-fit and diagnostics.

3. Diagnostics

Now we can check out the model fits our network. We can use gof for that and plot.

Code

gof4 <- gof(model4)

plot(gof4)

This mostly looks good although degree seems a little off. I don’t think this is too much of a concern.

Last, we can run mcmc.diagnostics.

Code

mcmc.diagnostics(model4)

Sample statistics summary:

Iterations = 1474560:28672000
Thinning interval = 32768 
Number of chains = 1 
Sample size per chain = 831 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                Mean    SD Naive SE Time-series SE
edges          7.833 57.84   2.0063         2.8264
nodematch.time 2.544 22.67   0.7864         1.1483
gwesp.fixed.0  6.999 59.72   2.0717         3.0155
mutual         2.593 14.64   0.5080         0.7176

2. Quantiles for each variable:

                  2.5%   25% 50%  75% 97.5%
edges           -98.75 -33.0   5 47.0 123.0
nodematch.time  -40.00 -12.5   2 17.5  48.0
gwesp.fixed.0  -107.25 -35.5   4 45.0 128.2
mutual          -25.00  -8.0   2 13.0  31.0


Are sample statistics significantly different from observed?
                 edges nodematch.time gwesp.fixed.0       mutual       (Omni)
diff.      7.832731649     2.54392298    6.99879663 2.5932611312           NA
test stat. 2.771271283     2.21531368    2.32092193 3.6140393472 2.837370e+01
P-val.     0.005583788     0.02673854    0.02029106 0.0003014633 1.433938e-05

Sample statistics cross-correlations:
                   edges nodematch.time gwesp.fixed.0    mutual
edges          1.0000000      0.7937411     0.9639281 0.8843498
nodematch.time 0.7937411      1.0000000     0.7860627 0.7401637
gwesp.fixed.0  0.9639281      0.7860627     1.0000000 0.9060772
mutual         0.8843498      0.7401637     0.9060772 1.0000000

Sample statistics auto-correlation:
Chain 1 
                 edges nodematch.time gwesp.fixed.0      mutual
Lag 0       1.00000000    1.000000000   1.000000000  1.00000000
Lag 32768   0.32075760    0.293727485   0.338520904  0.36286072
Lag 65536   0.17812164    0.154438265   0.184712750  0.17974642
Lag 98304   0.07871944    0.055639889   0.094710200  0.06100919
Lag 131072 -0.02937337   -0.007445629  -0.014591771 -0.01610558
Lag 163840 -0.01273936   -0.017151413   0.009154743  0.02303434

Sample statistics burn-in diagnostic (Geweke):
Chain 1 

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

         edges nodematch.time  gwesp.fixed.0         mutual 
   -0.64367813     0.06732323    -0.56652064    -0.08585720 

Individual P-values (lower = worse):
         edges nodematch.time  gwesp.fixed.0         mutual 
     0.5197842      0.9463244      0.5710399      0.9315799 
Joint P-value (lower = worse):  0.3807258 

Note: MCMC diagnostics shown here are from the last round of
  simulation, prior to computation of final parameter estimates.
  Because the final estimates are refinements of those used for this
  simulation run, these diagnostics may understate model performance.
  To directly assess the performance of the final model on in-model
  statistics, please use the GOF command: gof(ergmFitObject,
  GOF=~model).

And these look like what we would anticipate for a successful simulation.