HW5 Cheat Sheet

This is a cheat sheet for the fifth homework on centrality. It works through each step in R.

First, you need the following packages.

Code
library(igraph)
library(ggplot2)
library(dplyr)

Next, we bring the data into igraph. There are two networks as there are two towns.

Code
k1 <- read.graph("data/Korea1.net", format="pajek")

k2 <- read.graph("data/Korea2.net", format="pajek")

Now we can load attributes data - (1=adopted modern methods, 2=did not adopt modern methods) - and store the data in the igraph objects.

Code
adopt1 <- as.matrix(read.table("data/Korea1_adopters.clu", skip=1))

adopt2 <- as.matrix(read.table("data/Korea2_adopters.clu", skip=1))


V(k1)$adopt <- adopt1

V(k2)$adopt <- adopt2

Let’s graph the two networks with node color varying by adoption or not.

Code
plot(k1, vertex.color=as.factor(V(k1)$adopt))

Code
plot(k2, vertex.color=as.factor(V(k2)$adopt))

Let’s use degree(g), closeness(g), betweenness(g), and eigen_centrality(g) to calculate various centrality measures for each network.

Code
k1.df <- data.frame(adopt=V(k1)$adopt, deg=degree(k1), close=closeness(k1), btwn=betweenness(k1, normalized=T), 
                    eig=eigen_centrality(k1)$vector)

k2.df <- data.frame(adopt=V(k2)$adopt, deg=degree(k2), close=closeness(k2), btwn=betweenness(k2, normalized=T), 
                    eig=eigen_centrality(k2)$vector)

Let’s use betweenness centralization to compare across the entire graphs. This offers a glimpse of centralization tendencies for the whole thing as opposed to nodal characteristics. I picked this measure because it is a medial one and therefore speaks to diffusion. You can choose another measure, but be sure to have an explanation for your choice.

Code
centr_betw(k1)
$res
 [1]   4.293651   4.400000   0.000000 102.996825   0.000000  27.798413
 [7]   0.000000  27.798413  93.133333   1.333333  51.800000  13.400000
[13]  15.159524 278.847619  10.377778  22.344444   5.050000  10.266667
[19]   0.000000   0.000000  60.000000   0.000000   0.000000   0.000000
[25]   0.000000 221.000000   0.000000 112.500000  31.000000   0.000000
[31]  31.500000   0.000000   0.000000   0.000000   0.000000   0.000000
[37]   0.000000   0.000000   0.000000

$centralization
[1] 0.3649793

$theoretical_max
[1] 26714
Code
mean(adopt1)
[1] 0.5641026
Code
centr_betw(k2)
$res
 [1]   0.0000000   9.6031191   6.2115440  30.9618853  16.5078255  12.9347375
 [7]  13.2977273   1.2837662   0.5075758  12.6121573   0.0000000  33.0000000
[13]  65.1206349  73.3106172  23.3077922   6.5066101   0.1666667 112.8511128
[19] 131.6335442  52.6718615  98.0078921 136.2270646  71.0416667   8.0711538
[25]   7.8595238 122.0000000   2.0000000  13.7787296  13.9910256  20.5011905
[31]  75.0325758  93.0000000  65.0000000   0.0000000   0.0000000   0.0000000
[37]   0.0000000   0.0000000   0.0000000

$centralization
[1] 0.1491299

$theoretical_max
[1] 26714
Code
mean(adopt2)
[1] 0.2564103

We can make a plot that compares how betweenness is distributed between the two villages.

Code
ggplot(k1.df, aes(x=as.factor(adopt), y=btwn)) + 
  geom_boxplot() +
  labs(title="Village 1",
        x ="Adopt Practice = 1", y = "Betweenness") +
  theme_bw()

Code
ggplot(k2.df, aes(as.factor(adopt), btwn)) + 
  geom_boxplot() +
labs(title="Village 2",
     x ="Adopt Practice = 1", y = "Betweenness") +
theme_bw()

We can make this graph more appealing by paneling them.

Code
#Let's combine the data frames and use facet() to "panelize" the figures

k1.df$village <- "One"
k2.df$village <- "Two"

ktot.df <- rbind(k1.df, k2.df)

ggplot(ktot.df, aes(x=as.factor(adopt), y=btwn)) + 
  geom_boxplot() +
  labs(title="Comparing Villages",
       x ="Adopt Practice = 1", y = "Betweenness") +
  theme_bw()+
  facet_wrap(~ village)

We can also construct some formal tests.

Code
t.test(btwn ~ adopt, data = ktot.df,
       var.equal = TRUE, alternative = "less")

    Two Sample t-test

data:  btwn by adopt
t = -1.7741, df = 76, p-value = 0.04002
alternative hypothesis: true difference in means between group 0 and group 1 is less than 0
95 percent confidence interval:
         -Inf -0.001819706
sample estimates:
mean in group 0 mean in group 1 
     0.03259951      0.06222427 

We could even construct a logistic regression predicting likelihood of adopting the new practice.

Code
areg <- glm(adopt~btwn + as.factor(village), data=ktot.df, family="binomial")

summary(areg)

Call:
glm(formula = adopt ~ btwn + as.factor(village), family = "binomial", 
    data = ktot.df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4348  -1.0151  -0.6326   1.1255   1.8483  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)   
(Intercept)           -0.008883   0.348900  -0.025  0.97969   
btwn                   8.092282   4.347251   1.861  0.06268 . 
as.factor(village)Two -1.499363   0.525654  -2.852  0.00434 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 105.60  on 77  degrees of freedom
Residual deviance:  93.44  on 75  degrees of freedom
AIC: 99.44

Number of Fisher Scoring iterations: 4