Code
library(igraph)
library(ggplot2)
library(dplyr)
This is a cheat sheet for the fifth homework on centrality. It works through each step in R.
First, you need the following packages.
library(igraph)
library(ggplot2)
library(dplyr)
Next, we bring the data into igraph. There are two networks as there are two towns.
<- read.graph("data/Korea1.net", format="pajek")
k1
<- read.graph("data/Korea2.net", format="pajek") k2
Now we can load attributes data - (1=adopted modern methods, 2=did not adopt modern methods) - and store the data in the igraph objects.
<- as.matrix(read.table("data/Korea1_adopters.clu", skip=1))
adopt1
<- as.matrix(read.table("data/Korea2_adopters.clu", skip=1))
adopt2
V(k1)$adopt <- adopt1
V(k2)$adopt <- adopt2
Let’s graph the two networks with node color varying by adoption or not.
plot(k1, vertex.color=as.factor(V(k1)$adopt))
plot(k2, vertex.color=as.factor(V(k2)$adopt))
Let’s use degree(g), closeness(g), betweenness(g), and eigen_centrality(g) to calculate various centrality measures for each network.
<- data.frame(adopt=V(k1)$adopt, deg=degree(k1), close=closeness(k1), btwn=betweenness(k1, normalized=T),
k1.df eig=eigen_centrality(k1)$vector)
<- data.frame(adopt=V(k2)$adopt, deg=degree(k2), close=closeness(k2), btwn=betweenness(k2, normalized=T),
k2.df eig=eigen_centrality(k2)$vector)
Let’s use betweenness centralization to compare across the entire graphs. This offers a glimpse of centralization tendencies for the whole thing as opposed to nodal characteristics. I picked this measure because it is a medial one and therefore speaks to diffusion. You can choose another measure, but be sure to have an explanation for your choice.
centr_betw(k1)
$res
[1] 4.293651 4.400000 0.000000 102.996825 0.000000 27.798413
[7] 0.000000 27.798413 93.133333 1.333333 51.800000 13.400000
[13] 15.159524 278.847619 10.377778 22.344444 5.050000 10.266667
[19] 0.000000 0.000000 60.000000 0.000000 0.000000 0.000000
[25] 0.000000 221.000000 0.000000 112.500000 31.000000 0.000000
[31] 31.500000 0.000000 0.000000 0.000000 0.000000 0.000000
[37] 0.000000 0.000000 0.000000
$centralization
[1] 0.3649793
$theoretical_max
[1] 26714
mean(adopt1)
[1] 0.5641026
centr_betw(k2)
$res
[1] 0.0000000 9.6031191 6.2115440 30.9618853 16.5078255 12.9347375
[7] 13.2977273 1.2837662 0.5075758 12.6121573 0.0000000 33.0000000
[13] 65.1206349 73.3106172 23.3077922 6.5066101 0.1666667 112.8511128
[19] 131.6335442 52.6718615 98.0078921 136.2270646 71.0416667 8.0711538
[25] 7.8595238 122.0000000 2.0000000 13.7787296 13.9910256 20.5011905
[31] 75.0325758 93.0000000 65.0000000 0.0000000 0.0000000 0.0000000
[37] 0.0000000 0.0000000 0.0000000
$centralization
[1] 0.1491299
$theoretical_max
[1] 26714
mean(adopt2)
[1] 0.2564103
We can make a plot that compares how betweenness is distributed between the two villages.
ggplot(k1.df, aes(x=as.factor(adopt), y=btwn)) +
geom_boxplot() +
labs(title="Village 1",
x ="Adopt Practice = 1", y = "Betweenness") +
theme_bw()
ggplot(k2.df, aes(as.factor(adopt), btwn)) +
geom_boxplot() +
labs(title="Village 2",
x ="Adopt Practice = 1", y = "Betweenness") +
theme_bw()
We can make this graph more appealing by paneling them.
#Let's combine the data frames and use facet() to "panelize" the figures
$village <- "One"
k1.df$village <- "Two"
k2.df
<- rbind(k1.df, k2.df)
ktot.df
ggplot(ktot.df, aes(x=as.factor(adopt), y=btwn)) +
geom_boxplot() +
labs(title="Comparing Villages",
x ="Adopt Practice = 1", y = "Betweenness") +
theme_bw()+
facet_wrap(~ village)
We can also construct some formal tests.
t.test(btwn ~ adopt, data = ktot.df,
var.equal = TRUE, alternative = "less")
Two Sample t-test
data: btwn by adopt
t = -1.7741, df = 76, p-value = 0.04002
alternative hypothesis: true difference in means between group 0 and group 1 is less than 0
95 percent confidence interval:
-Inf -0.001819706
sample estimates:
mean in group 0 mean in group 1
0.03259951 0.06222427
We could even construct a logistic regression predicting likelihood of adopting the new practice.
<- glm(adopt~btwn + as.factor(village), data=ktot.df, family="binomial")
areg
summary(areg)
Call:
glm(formula = adopt ~ btwn + as.factor(village), family = "binomial",
data = ktot.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4348 -1.0151 -0.6326 1.1255 1.8483
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.008883 0.348900 -0.025 0.97969
btwn 8.092282 4.347251 1.861 0.06268 .
as.factor(village)Two -1.499363 0.525654 -2.852 0.00434 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 105.60 on 77 degrees of freedom
Residual deviance: 93.44 on 75 degrees of freedom
AIC: 99.44
Number of Fisher Scoring iterations: 4