HW5 Cheat Sheet

This is a cheat sheet for the fifth homework on centrality. It works through each step in R.

First, you need the following packages.

Code

library(igraph)
library(ggplot2)
library(dplyr)

Next, we bring the data into igraph. There are two networks as there are two towns.

Code

k1 <- read.graph("data/Korea1.net", format="pajek")

k2 <- read.graph("data/Korea2.net", format="pajek")

Now we can load attributes data - (1=adopted modern methods, 2=did not adopt modern methods) - and store the data in the igraph objects.

Code

adopt1 <- as.matrix(read.table("data/Korea1_adopters.clu", skip=1))

adopt2 <- as.matrix(read.table("data/Korea2_adopters.clu", skip=1))


V(k1)$adopt <- adopt1

V(k2)$adopt <- adopt2

Let’s graph the two networks with node color varying by adoption or not.

Code

plot(k1, vertex.color=as.factor(V(k1)$adopt))

Code

plot(k2, vertex.color=as.factor(V(k2)$adopt))

Let’s use degree(g), closeness(g), betweenness(g), and eigen_centrality(g) to calculate various centrality measures for each network.

Code

k1.df <- data.frame(adopt=V(k1)$adopt, deg=degree(k1), close=closeness(k1), btwn=betweenness(k1, normalized=T), 
                    eig=eigen_centrality(k1)$vector)

k2.df <- data.frame(adopt=V(k2)$adopt, deg=degree(k2), close=closeness(k2), btwn=betweenness(k2, normalized=T), 
                    eig=eigen_centrality(k2)$vector)

Let’s use betweenness centralization to compare across the entire graphs. This offers a glimpse of centralization tendencies for the whole thing as opposed to nodal characteristics. I picked this measure because it is a medial one and therefore speaks to diffusion. You can choose another measure, but be sure to have an explanation for your choice.

Code

centr_betw(k1)

$res
 [1]   4.293651   4.400000   0.000000 102.996825   0.000000  27.798413
 [7]   0.000000  27.798413  93.133333   1.333333  51.800000  13.400000
[13]  15.159524 278.847619  10.377778  22.344444   5.050000  10.266667
[19]   0.000000   0.000000  60.000000   0.000000   0.000000   0.000000
[25]   0.000000 221.000000   0.000000 112.500000  31.000000   0.000000
[31]  31.500000   0.000000   0.000000   0.000000   0.000000   0.000000
[37]   0.000000   0.000000   0.000000

$centralization
[1] 0.3649793

$theoretical_max
[1] 26714

Code

mean(adopt1)

[1] 0.5641026

Code

centr_betw(k2)

$res
 [1]   0.0000000   9.6031191   6.2115440  30.9618853  16.5078255  12.9347375
 [7]  13.2977273   1.2837662   0.5075758  12.6121573   0.0000000  33.0000000
[13]  65.1206349  73.3106172  23.3077922   6.5066101   0.1666667 112.8511128
[19] 131.6335442  52.6718615  98.0078921 136.2270646  71.0416667   8.0711538
[25]   7.8595238 122.0000000   2.0000000  13.7787296  13.9910256  20.5011905
[31]  75.0325758  93.0000000  65.0000000   0.0000000   0.0000000   0.0000000
[37]   0.0000000   0.0000000   0.0000000

$centralization
[1] 0.1491299

$theoretical_max
[1] 26714

Code

mean(adopt2)

[1] 0.2564103

We can make a plot that compares how betweenness is distributed between the two villages.

Code

ggplot(k1.df, aes(x=as.factor(adopt), y=btwn)) + 
  geom_boxplot() +
  labs(title="Village 1",
        x ="Adopt Practice = 1", y = "Betweenness") +
  theme_bw()

Code

ggplot(k2.df, aes(as.factor(adopt), btwn)) + 
  geom_boxplot() +
labs(title="Village 2",
     x ="Adopt Practice = 1", y = "Betweenness") +
theme_bw()

We can make this graph more appealing by paneling them.

Code

#Let's combine the data frames and use facet() to "panelize" the figures

k1.df$village <- "One"
k2.df$village <- "Two"

ktot.df <- rbind(k1.df, k2.df)

ggplot(ktot.df, aes(x=as.factor(adopt), y=btwn)) + 
  geom_boxplot() +
  labs(title="Comparing Villages",
       x ="Adopt Practice = 1", y = "Betweenness") +
  theme_bw()+
  facet_wrap(~ village)

We can also construct some formal tests.

Code

t.test(btwn ~ adopt, data = ktot.df,
       var.equal = TRUE, alternative = "less")


    Two Sample t-test

data:  btwn by adopt
t = -1.7741, df = 76, p-value = 0.04002
alternative hypothesis: true difference in means between group 0 and group 1 is less than 0
95 percent confidence interval:
         -Inf -0.001819706
sample estimates:
mean in group 0 mean in group 1 
     0.03259951      0.06222427

We could even construct a logistic regression predicting likelihood of adopting the new practice.

Code

areg <- glm(adopt~btwn + as.factor(village), data=ktot.df, family="binomial")

summary(areg)


Call:
glm(formula = adopt ~ btwn + as.factor(village), family = "binomial", 
    data = ktot.df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4348  -1.0151  -0.6326   1.1255   1.8483  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)   
(Intercept)           -0.008883   0.348900  -0.025  0.97969   
btwn                   8.092282   4.347251   1.861  0.06268 . 
as.factor(village)Two -1.499363   0.525654  -2.852  0.00434 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 105.60  on 77  degrees of freedom
Residual deviance:  93.44  on 75  degrees of freedom
AIC: 99.44

Number of Fisher Scoring iterations: 4