Code
library(igraph)
library(ggplot2)
library(netseg)
This is a cheat sheet for the fourth homework. It works through each step in R.
First, you need the following packages.
library(igraph)
library(ggplot2)
library(netseg)
Next, we bring the data into igraph. This school data is stored as a pajek file.
<- read.graph("data/sch23.net", format="pajek") sch23
Now, we can load attribute data for this school using read.table and store the attributes in the igraph object (Note that 1=Girl and 2=Boy).
<- read.table("data/sch23_attr.txt", header=T, sep=" ")
att
V(sch23)$sex <- att$sex
V(sch23)$race <- att$race
V(sch23)$grade <- att$grade
V(sch23)$school <- att$school
Let’s examine some basic features of the network before turning to the local context.
is.directed(sch23)
[1] TRUE
is.connected(sch23)
[1] FALSE
vcount(sch23)
[1] 679
ecount(sch23)
[1] 3783
The graph is not connected. Let’s see what the components look like.
<- components(sch23)
comps
$csize comps
[1] 667 1 1 1 1 1 1 1 1 1 1 1 1
We can see that there are 12 nodes with no degree. This could be lonely folks or maybe they weren’t present at the time that the survey was asked or maybe they didn’t want to finish the survey. It may be hard to tell. For our purposes, let’s assume it is a product of the survey and not isolation in a social sense. We should remove them.
<- which(degree(sch23)==0)
isolates
<- delete.vertices(sch23, isolates) school
The school graph is the giant connected component of the sch23 graph. Everything will be completed on this graph.
First, we can compute local network size for each node or the degree of the network.
<- degree(school)
v_deg<- degree(school, mode="in")
v_indeg <- degree(school, mode="out")
v_outdeg
mean(v_deg)
[1] 11.34333
Next, we can look at density. Let’s look at the density for the entire graph.
graph.density(school)
[1] 0.008516012
Now we get density for each ego: Note that density returns NaN for isolates
<- make_ego_graph(school, 1) %>%
sch_ego_dens vapply(graph.density, numeric(1))
head(sch_ego_dens)
[1] 0.2619048 0.2252747 0.2678571 0.2523810 0.2500000 0.4666667
We can also look at ego network size for each of the students in the school.
<- ego_size(school)
ego.size
mean(ego.size)
[1] 9.686657
For future use, we can store as a data frame.
<-data.frame(density=sch_ego_dens, size=ego.size) den.df
Now we can calculate density both for the entire network and for each ego (type=local). We store all of this as a data frame as well.
transitivity(school)
[1] 0.2128278
<- transitivity(school, type="local")
transitive.sch
<- data.frame(transitivity=transitive.sch, size=ego.size) tr.df
Here we calculate constraint (Note that constraint is undefined for isolates) and store it.
<- constraint(school)
constraint.sch
<- data.frame(constraint=constraint.sch, size=ego.size) con.df
First let’s construct a mixing matrix for sex.
mixingm(school, "sex")
alter
ego 1 2
1 1076 863
2 762 1082
What about mixing across grades?
mixingm(school, "grade")
alter
ego 0 7 8 9 10 11 12
0 1 1 13 4 1 0 0
7 3 72 15 1 1 1 2
8 16 14 84 8 2 6 1
9 3 3 0 954 71 49 21
10 1 0 0 65 756 60 44
11 0 0 1 27 52 603 74
12 2 0 0 9 30 78 634
We can also look at assortativity.
assortativity(school, V(school)$sex)
[1] 0.1417051
assortativity(school, V(school)$grade)
[1] 0.72985
And we can look at the odds of a within group tie.
orwg(school, "sex")
[1] 1.31704
orwg(school, "grade")
[1] 16.95481
Bonus: Build random graph for quick comparison
Here we use erdos.renyi.game to build a graph with the same number of nodes and edges, but with edges randomly drawn
<- erdos.renyi.game(length(V(school)), length(E(school)), type="gnm", directed=T) rg
To construct an average attribute we need to see the distribution of girls and boys.
table(V(school)$sex)
1 2
361 306
Now we can add this distribution to randomly assign a sex variable.
V(rg)$sex <- sample(c(1,2), length(V(rg)), replace=T, prob=c(306/679, 361/679))
How does the mixing matrix compare.
mixingm(sch23, "sex", use.density=F)
alter
ego 1 2
1 1076 863
2 762 1082
mixingm(rg, "sex", use.density=F)
alter
ego 1 2
1 798 929
2 948 1108
Plot degree distribution
<- data.frame(v_deg, v_indeg, v_outdeg)
sch23.degree
ggplot(data=sch23.degree, aes(v_deg)) +
geom_histogram(breaks=seq(2,14, by=1),
col="black",
fill="black",
alpha = .2)+
labs(title="Total Degree Distribution") +
labs(x="Degree", y="Count")
Plot degree of graph and sex
V(sch23)$degree <- v_deg
plot.igraph(sch23, vertex.size=V(sch23)$degree/5, vertex.color=V(sch23)$sex, edge.arrow.size=.01,
vertex.label=NA)
Plot Transitivity by Size
ggplot(data=tr.df,aes(size, transitivity))+
geom_point()+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())
Plot Density by Size (Note error related to isolates)
ggplot(data=den.df, aes(size, density))+
geom_point()+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())
Plot Constraint by Size (Note error related to isolates)
ggplot(data=con.df, aes(size, constraint))+
geom_point()+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())