Code
library(statnet)
This is a cheat sheet for the ninth homework on exponential random graph models. It works through each step in R.
First, you need the following packages.
library(statnet)
First, we read the data, peoria.n. This is a statnet object and so we can use summary
to inspect descriptive aspects of the network and plot
to see how it looks.
load("data/peoria.RDA")
summary(peoria.n, print.adj=FALSE)
Network attributes:
vertices = 117
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges = 543
missing edges = 0
non-missing edges = 543
density = 0.04000884
Vertex attributes:
city:
integer valued attribute
117 values
time:
integer valued attribute
117 values
vertex.names:
character valued attribute
117 valid vertex names
No edge attributes
plot(peoria.n)
Here we move through different types of exponential random graph models.
Now we can use ergm
to build a model with a edges
parameter.
<- ergm(peoria.n ~ edges)
model1
summary(model1)
Call:
ergm(formula = peoria.n ~ edges)
Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -3.1778 0.0438 0 -72.56 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 18815 on 13572 degrees of freedom
Residual Deviance: 4559 on 13571 degrees of freedom
AIC: 4561 BIC: 4569 (Smaller is better. MC Std. Err. = 0)
The Bernoulli model suggests that Peoria network has a significantly lower density.
Recall that the exponentiated coefficient for edges is the same as network density.
plogis(coef(model1))
edges
0.04000884
network.density(peoria.n)
[1] 0.04000884
So there is .04 probability of an edge between two nodes in this graph.
Now we can look at the time variable.
<- ergm(peoria.n ~ edges + nodematch("time"))
model2
summary(model2)
Call:
ergm(formula = peoria.n ~ edges + nodematch("time"))
Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -3.3327 0.0520 0 -64.091 <1e-04 ***
nodematch.time 0.6672 0.0969 0 6.886 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 18815 on 13572 degrees of freedom
Residual Deviance: 4516 on 13570 degrees of freedom
AIC: 4520 BIC: 4535 (Smaller is better. MC Std. Err. = 0)
There is a positive and significant time homophily effect. So, there is kind of a cohort effect in terms of who is connected to whom.
plogis(coef(model2))
edges nodematch.time
0.03446724 0.66087144
Matching on time increases the likelihood of tie formation by .66.
But note that we aren’t really tapping into network effects and the MCMC isn’t being deployed
Let’s look at some endogenous effects.
Now let’s see if there is local clustering. The obvious place to start is the triangle parameter.
<- ergm(peoria.n ~ edges + nodematch("time") + triangle) model3
Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), : Number of edges in a simulated network exceeds that in the observed by a factor of more than 20. This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter.
As is common, the ERGM has trouble fitting the triangle parameter and therefore degeneracy has occured. Let’s use the geometrically weighted edgewise shared partner to evaluate local clustering. GWESP is the most common parameter to use as a local clustering alternative.
<- ergm(peoria.n ~ edges + nodematch("time") + gwesp(0, fixed = TRUE))
model3b
summary(model3b)
Call:
ergm(formula = peoria.n ~ edges + nodematch("time") + gwesp(0,
fixed = TRUE))
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -4.21308 0.06677 0 -63.095 <1e-04 ***
nodematch.time 0.50890 0.07912 0 6.432 <1e-04 ***
gwesp.fixed.0 1.24444 0.06517 0 19.095 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 18815 on 13572 degrees of freedom
Residual Deviance: 4257 on 13569 degrees of freedom
AIC: 4263 BIC: 4285 (Smaller is better. MC Std. Err. = 0.6366)
gwesp is significant and positive, so clustering affects tie formation. Let’s exponentiate.
plogis(coef(model3b))
edges nodematch.time gwesp.fixed.0
0.01458491 0.62454873 0.77633618
This shows that if a pair of nodes “have any positive number of friends in common and each of them is in at least one other triangle with each of those friends” then the probablity of a tie between them increases by .77.
Let’s evaluate reciprocity in the Peoria doctors network. We use the mutuality
parameter in this directed graph to evaluate the effects of whether people send and receive ties to one another.
<- ergm(peoria.n ~ edges + nodematch("time") + gwesp(0, fixed = TRUE) + mutual)
model4
summary(model4)
Call:
ergm(formula = peoria.n ~ edges + nodematch("time") + gwesp(0,
fixed = TRUE) + mutual)
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -4.17858 0.06685 0 -62.504 <1e-04 ***
nodematch.time 0.44043 0.07351 0 5.991 <1e-04 ***
gwesp.fixed.0 0.94272 0.07005 0 13.459 <1e-04 ***
mutual 1.97352 0.16302 0 12.106 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 18815 on 13572 degrees of freedom
Residual Deviance: 4143 on 13568 degrees of freedom
AIC: 4151 BIC: 4181 (Smaller is better. MC Std. Err. = 0.9074)
And reciprocity has a very large positive effect on tie formation improving the log-odds by nearly 2.
plogis(coef(model4))
edges nodematch.time gwesp.fixed.0 mutual
0.01508912 0.60836243 0.71964944 0.87798851
The probability of a tie increases by .89 given reciprocity.
Now that we have built a model, let’s evaluate its goodness-of-fit and diagnostics.
Now we can check out the model fits our network. We can use gof
for that and plot
.
<- gof(model4)
gof4
plot(gof4)
This mostly looks good although degree seems a little off. I don’t think this is too much of a concern.
Last, we can run mcmc.diagnostics
.
mcmc.diagnostics(model4)
Sample statistics summary:
Iterations = 1474560:28672000
Thinning interval = 32768
Number of chains = 1
Sample size per chain = 831
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
edges 7.833 57.84 2.0063 2.8264
nodematch.time 2.544 22.67 0.7864 1.1483
gwesp.fixed.0 6.999 59.72 2.0717 3.0155
mutual 2.593 14.64 0.5080 0.7176
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
edges -98.75 -33.0 5 47.0 123.0
nodematch.time -40.00 -12.5 2 17.5 48.0
gwesp.fixed.0 -107.25 -35.5 4 45.0 128.2
mutual -25.00 -8.0 2 13.0 31.0
Are sample statistics significantly different from observed?
edges nodematch.time gwesp.fixed.0 mutual (Omni)
diff. 7.832731649 2.54392298 6.99879663 2.5932611312 NA
test stat. 2.771271283 2.21531368 2.32092193 3.6140393472 2.837370e+01
P-val. 0.005583788 0.02673854 0.02029106 0.0003014633 1.433938e-05
Sample statistics cross-correlations:
edges nodematch.time gwesp.fixed.0 mutual
edges 1.0000000 0.7937411 0.9639281 0.8843498
nodematch.time 0.7937411 1.0000000 0.7860627 0.7401637
gwesp.fixed.0 0.9639281 0.7860627 1.0000000 0.9060772
mutual 0.8843498 0.7401637 0.9060772 1.0000000
Sample statistics auto-correlation:
Chain 1
edges nodematch.time gwesp.fixed.0 mutual
Lag 0 1.00000000 1.000000000 1.000000000 1.00000000
Lag 32768 0.32075760 0.293727485 0.338520904 0.36286072
Lag 65536 0.17812164 0.154438265 0.184712750 0.17974642
Lag 98304 0.07871944 0.055639889 0.094710200 0.06100919
Lag 131072 -0.02937337 -0.007445629 -0.014591771 -0.01610558
Lag 163840 -0.01273936 -0.017151413 0.009154743 0.02303434
Sample statistics burn-in diagnostic (Geweke):
Chain 1
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
edges nodematch.time gwesp.fixed.0 mutual
-0.64367813 0.06732323 -0.56652064 -0.08585720
Individual P-values (lower = worse):
edges nodematch.time gwesp.fixed.0 mutual
0.5197842 0.9463244 0.5710399 0.9315799
Joint P-value (lower = worse): 0.3807258
Note: MCMC diagnostics shown here are from the last round of
simulation, prior to computation of final parameter estimates.
Because the final estimates are refinements of those used for this
simulation run, these diagnostics may understate model performance.
To directly assess the performance of the final model on in-model
statistics, please use the GOF command: gof(ergmFitObject,
GOF=~model).
And these look like what we would anticipate for a successful simulation.