HW4 Cheat Sheet

This is a cheat sheet for the fourth homework. It works through each step in R.

First, you need the following packages.

Code
library(igraph)
library(ggplot2)
library(netseg)

Next, we bring the data into igraph. This school data is stored as a pajek file.

Code
sch23 <- read.graph("data/sch23.net", format="pajek")

Now, we can load attribute data for this school using read.table and store the attributes in the igraph object (Note that 1=Girl and 2=Boy).

Code
att <- read.table("data/sch23_attr.txt", header=T, sep=" ")

V(sch23)$sex <- att$sex
V(sch23)$race <- att$race
V(sch23)$grade <- att$grade
V(sch23)$school <- att$school

1. Compute Local Network Composition Measures

Let’s examine some basic features of the network before turning to the local context.

Code
is.directed(sch23)
[1] TRUE
Code
is.connected(sch23)
[1] FALSE
Code
vcount(sch23)
[1] 679
Code
ecount(sch23)
[1] 3783

The graph is not connected. Let’s see what the components look like.

Code
comps <- components(sch23)

comps$csize
 [1] 667   1   1   1   1   1   1   1   1   1   1   1   1

We can see that there are 12 nodes with no degree. This could be lonely folks or maybe they weren’t present at the time that the survey was asked or maybe they didn’t want to finish the survey. It may be hard to tell. For our purposes, let’s assume it is a product of the survey and not isolation in a social sense. We should remove them.

Code
isolates <- which(degree(sch23)==0) 

school <- delete.vertices(sch23, isolates)

The school graph is the giant connected component of the sch23 graph. Everything will be completed on this graph.

First, we can compute local network size for each node or the degree of the network.

Code
v_deg<- degree(school)
v_indeg <- degree(school, mode="in")                    
v_outdeg <- degree(school, mode="out")  

mean(v_deg)
[1] 11.34333

Next, we can look at density. Let’s look at the density for the entire graph.

Code
graph.density(school)
[1] 0.008516012

Now we get density for each ego: Note that density returns NaN for isolates

Code
sch_ego_dens <- make_ego_graph(school, 1) %>%
  vapply(graph.density, numeric(1))

head(sch_ego_dens)
[1] 0.2619048 0.2252747 0.2678571 0.2523810 0.2500000 0.4666667

We can also look at ego network size for each of the students in the school.

Code
ego.size <- ego_size(school)

mean(ego.size)
[1] 9.686657

For future use, we can store as a data frame.

Code
den.df<-data.frame(density=sch_ego_dens, size=ego.size)

Now we can calculate density both for the entire network and for each ego (type=local). We store all of this as a data frame as well.

Code
transitivity(school)
[1] 0.2128278
Code
transitive.sch <- transitivity(school, type="local") 

tr.df <- data.frame(transitivity=transitive.sch, size=ego.size)

Here we calculate constraint (Note that constraint is undefined for isolates) and store it.

Code
constraint.sch <- constraint(school)

con.df <- data.frame(constraint=constraint.sch, size=ego.size)

2. Evaluate Local Variation in the Graph

First let’s construct a mixing matrix for sex.

Code
mixingm(school, "sex")
   alter
ego    1    2
  1 1076  863
  2  762 1082

What about mixing across grades?

Code
mixingm(school, "grade")
    alter
ego    0   7   8   9  10  11  12
  0    1   1  13   4   1   0   0
  7    3  72  15   1   1   1   2
  8   16  14  84   8   2   6   1
  9    3   3   0 954  71  49  21
  10   1   0   0  65 756  60  44
  11   0   0   1  27  52 603  74
  12   2   0   0   9  30  78 634

We can also look at assortativity.

Code
assortativity(school, V(school)$sex)
[1] 0.1417051
Code
assortativity(school, V(school)$grade)
[1] 0.72985

And we can look at the odds of a within group tie.

Code
orwg(school, "sex")
[1] 1.31704
Code
orwg(school, "grade")
[1] 16.95481

Bonus: Build random graph for quick comparison

Here we use erdos.renyi.game to build a graph with the same number of nodes and edges, but with edges randomly drawn

Code
rg <- erdos.renyi.game(length(V(school)), length(E(school)), type="gnm", directed=T)

To construct an average attribute we need to see the distribution of girls and boys.

Code
table(V(school)$sex)

  1   2 
361 306 

Now we can add this distribution to randomly assign a sex variable.

Code
V(rg)$sex <- sample(c(1,2), length(V(rg)), replace=T, prob=c(306/679, 361/679))  

How does the mixing matrix compare.

Code
mixingm(sch23, "sex", use.density=F)
   alter
ego    1    2
  1 1076  863
  2  762 1082
Code
mixingm(rg, "sex", use.density=F)
   alter
ego    1    2
  1  798  929
  2  948 1108

3. Graph and Interpret Results

Plot degree distribution

Code
sch23.degree <- data.frame(v_deg, v_indeg, v_outdeg)

ggplot(data=sch23.degree, aes(v_deg)) + 
  
  geom_histogram(breaks=seq(2,14, by=1),
                 col="black", 
                 fill="black", 
                 alpha = .2)+
  labs(title="Total Degree Distribution") +
  labs(x="Degree", y="Count")

Plot degree of graph and sex

Code
V(sch23)$degree <- v_deg

plot.igraph(sch23, vertex.size=V(sch23)$degree/5, vertex.color=V(sch23)$sex, edge.arrow.size=.01,
            vertex.label=NA)

Example Rough Bonus Plots

Plot Transitivity by Size

Code
ggplot(data=tr.df,aes(size, transitivity))+
  geom_point()+
  geom_smooth(method='lm', level=.95, formula=y~x)+
  theme(panel.background = element_blank())

Plot Density by Size (Note error related to isolates)

Code
ggplot(data=den.df, aes(size, density))+
  geom_point()+
  geom_smooth(method='lm', level=.95, formula=y~x)+
  theme(panel.background = element_blank())

Plot Constraint by Size (Note error related to isolates)

Code
ggplot(data=con.df, aes(size, constraint))+
  geom_point()+
  geom_smooth(method='lm', level=.95, formula=y~x)+
  theme(panel.background = element_blank())