While many social scientists are easily seduced by the massive networks that we can observe and analyze, local networks remain a central concern of social scientists. As Smith (2020) describes, there are at least five reasons why local or ego networks remain important:
There are several ways that we can bring ego network data into R for analysis. Much of the data may look like a standard data frame: respondent/ego ID, alter1ID…Alter5ID, Alter1_Race…Alter5…Race, Alter1-Alter2_Edge…Alter4-Alter5_Edge. Many statistics can be developed by summing over the alter characteristics and dividing by network size or by indexing the edge columns.
SNA scholars have also developed packages to simplify these tasks although the data structures are not always obvious. egor is a recent package that looks particularly promising for these tasks and is derived from the deprecated egonetR that was popular (Note the main importing functions are onefile_to_egor, twofile_to_egor, threefile_to_egor that each have different data structure expectations.)
For this tutorial we use igraph to observe local network characteristics within a whole graph. We also load the netseg pacakge for mixing matrices and measures of network segregation as well as ggplot2.
library(igraph)
library(netseg)
library(ggplot2)
For this workshop we will return to the Valente school classroom. Remember that the Valente graph is a fifth grade class. We are uploading the graph from a pajek file (.net). We also bring in gender as a key attribute in the classroom. Note that 0s are boys and 1s are girls.
<- read.graph("data/valente.net", format="pajek")
valente <- as.matrix(read.table("data/valente.clu", skip=1))
gender V(valente)$gender <- as.vector(gender)
There are a couple of different ways of learning basic information about this graph. Let’s use functions from igraph. Let check whether it is directed, is it connected or are there multiple components including isolates, number of nodes, and the number of edges.
is.directed(valente)
## [1] TRUE
is.connected(valente)
## [1] TRUE
vcount(valente)
## [1] 37
ecount(valente)
## [1] 145
The network is directed and connected, it has 37 students/nodes, and 145 connections or ties between the students. If it was not connected, we could use components() to identify the number of components and $csize to see the size of those components.
like this for the valente graph:
cmps <- components(valente)
cmps$csize would consist of a vector of component sizes. If those sizes are 1s except for a large component, you can remove them by the following:
isolates <- which(degree(valente)==0)
val2 <- delete.vertices(valente, isolates)
This is identifying isolates or nodes with a degree of 0 and storing it as isolates and then deleteing those isolates from the valente graph and storing the new igraph object as val2. Important note: If your graph has attributes do not remove isolates until after storing those attributes in the igraph with all of the nodes.
Node degree is the most basic and one of the most widely used network measures. It is inherently a local network measure and each node has a degree score, but we can also talk about the average degree for the network as a whole.
Node degree is the number of edges adjacent to a node. It is a basic measure of connectivity and “importance.” In directed graphs may consider in-degree the number of edges sent TO a node and out-degree the number of edges sent FROM a node or the total degree which is their sum.
We can find degree in igraph - degree(g) - that creates a nodal undirected (or total) degree vector.
<- degree(valente) v_deg
What is the mean degree for the classroom?
mean(v_deg)
## [1] 7.837838
We can find the in-degree - or the number of received nominations - or the out-degree - the number of sent nominations by specifying the mode.
<- degree(valente, mode="in")
v_indeg <- degree(valente, mode="out") v_outdeg
Let’s see how degree is distributed in the Valente data by plotting the distribution using ggplot.
<- data.frame(v_deg, v_indeg, v_outdeg)
valente.degree
ggplot(data=valente.degree, aes(v_deg)) +
geom_histogram(breaks=seq(2,14, by=1),
col="black",
fill="black",
alpha = .2)+
labs(title="Total Degree Distribution") +
labs(x="Degree", y="Count")
We can look at the distribution of in-degree.
ggplot(data=valente.degree, aes(v_indeg)) +
geom_histogram(breaks=seq(-1,14, by=1), col="black", fill="black", alpha = .2)+
labs(title="In-Degree Distribution") +
labs(x="In-Degree", y="Count")
And the distribution of out-degree.
We can compare in-degree and out-degree.
ggplot(data=valente.degree,aes(v_indeg,v_outdeg))+
geom_point(aes(colour=factor(V(valente)$gender+1)))+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())
We can construct a variety of measures of network composition. For example we first might just want the percentage of nodes with a particular characteristics.
<- ifelse(gender==1,1,0)
girl <- ifelse(gender==0, 1, 0)
boy
mean(girl)
## [1] 0.5945946
mean(boy)
## [1] 0.4054054
We may want to know how many in-group or out-group ties there are. In sociology, this offers a good indication of social closure. We may also think about it as one indication of network segregation.
plot.igraph(valente, vertex.color=V(valente)$gender)
mixingm(valente, 'gender') # the density option would provide proportional representation
## alter
## ego 0 1
## 0 60 0
## 1 3 82
Now let’s compare to a random graph with the same number of nodes and edges. We use the erdos.renyi.game() function to do make this random graph.
<- erdos.renyi.game(length(V(valente)), length(E(valente)), type="gnm", directed=T) # a random graph with the same number of nodes & edges as the Valente graph above
rg
V(rg)$gender <- sample(c(0,1), length(V(rg)), replace=T, prob=c(22/37, 15/37)) #randomly assigning gender, matching the probabilities from the Valente graph
plot(rg, vertex.label= NA, edge.arrow.size=0.02,vertex.size = 5, vertex.color=V(rg)$gender*2, xlab = "Erdos-Renyi Random Graph")
mixingm(rg, 'gender')
## alter
## ego 0 1
## 0 27 34
## 1 38 46
Note that your results will differ from mine. These are random draws.
We can compare the Valente graph and the random graph based on assortativity - measure of homophily developed by Mark Newman that indexes the number of edges within group over the variance as a whole. It is basically Pearson’s correlation coefficient and similarly varies from perfectly disassortative (-1) to perfectly assortative (1).
See:
Newman, M. E. (2003). Mixing patterns in networks. Physical review E, 67(2), 026126.
assortativity(valente, V(valente)$gender)
## [1] 0.9585236
assortativity(rg, V(rg)$gender)
## [1] -0.009686431
One way of thinking about how a network is segregated is to calculate the odds of a within-group tie. The network segregation package has a function - orwg - to construct this ratio.
orwg(simplify(valente), "gender")
## [1] 58.15254
orwg(simplify(rg), "gender")
## [1] 1.036343
So the odds of a same gender tie are 58 times greater than the odds of a tie between boys and girls.
There are numerous other strategies for evaluating network segregation (e.g., Coleman’s index of segregation, the index of qualitative variation, etc.). Several of these are available in the netseg package.
Network density provides an indication of the strong ties within the network.
It is the number of ties/number of possible ties. We can find the measure for the entire graph.
graph.density(valente)
## [1] 0.1088589
We can also compute the density for each ego network within a graph.
If you have an igraph object, you can pull out each ego network using make_ego_net.
<- make_ego_graph(valente, 1) %>%
valente_ego_dens vapply(graph.density, numeric(1))
head(valente_ego_dens)
## [1] 0.5833333 0.2321429 0.4761905 0.3928571 0.5476190 0.8000000
We may want to know the relationship between ego network density and ego network size. Are larger networks more dense.
We can use the ego_size() function to calculate the size of each ego’s local network.
<- ego_size(valente) ego.size
And we can check out the average ego network size now by just using the mean() function in base R.
mean(ego.size)
## [1] 6.945946
We can store these in a data frame for easy access.
<-data.frame(density=valente_ego_dens, size=ego.size) den.df
And we can plot the relationship in ggplot.
ggplot(data=den.df, aes(size, density))+
geom_point()+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())
Transitivity is a measure of local clustering. It consists of the proportion of closed triads given all possible triads. The igraph default is for each node but you can also calculate for the whole graph. This is an indication of how much local clustering is occuring at the graph as a whole. Note that specifying “type=local” we are calculating over each respondents ego network.
transitivity(valente)
## [1] 0.5142857
<- transitivity(valente, type="local")
transitive.valente <- ego_size(valente)
ego.size
<- data.frame(transitivity=transitive.valente, size=ego.size) tr.df
We can plot transitivity by ego network size.
ggplot(data=tr.df,aes(transitivity,size))+
geom_point()+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())
And plot transitivity as node size in a network.
plot.igraph(valente, vertex.size=transitive.valente*20, edge.arrow.size=.5, vertex.color=V(valente)$gender)
Another local measure is the extent to which an actor may be constrained by local clustering. Constraint captures the extent to which a node’s local network is redundant. In other words, small networks with high redundancy will have high constraint scores. The formula itself has been called “daunting,” but translates to a combination of size, density, and hierarchy.
“Network constraint measures the extent to which your time and energy are concentrated in a single group” (Burt)
See:
Burt, R. S. (2015). Reinforced structural holes. Social Networks, 43, 149-161.
Burt, R. S. (1997). A note on social capital and network content. Social networks, 19(4), 355-373.
<- constraint(valente)
constraint.valente <- ego_size(valente)
ego.size
<- data.frame(constraint=constraint.valente, size=ego.size) con.df
We can look at the relationship between size and constraint.
ggplot(data=con.df,aes(constraint,size))+
geom_point()+
geom_smooth(method='lm', level=.95, formula=y~x)+
theme(panel.background = element_blank())
And plot as node size.
<- layout_with_kk(valente)
l
plot.igraph(valente, vertex.size=con.df$constraint*10, edge.arrow.size=.01, layout=l, vertex.color=V(valente)$gender)
The inverse of constraint is one way to think about brokerage. The maximal number of Burt’s constraint is 1.125. So, if we subtract 1.125 from constraint we have one brokerage score.
$inv <- 1.125-con.df$constraint
con.df
plot.igraph(valente, vertex.size=con.df$inv*10, edge.arrow.size=.5, vertex.color=V(valente)$gender)
Small worlds graphs have nodes with high transitivity -triads tend to close - and short average path length overall. The shortest path length, or geodesic, is the fewest number of edges that a thing would have to traverse to get from one node to another. You can use the shortest.paths function to get the matrix of shortest paths. A small world graph is the kind of graph that we would expect given the strength of weak ties hypothesis. Recall that transitivity of the graph can be found by transitivity(g).
We can build a random small world graph using the watts.strogatz.game and compare to the previous simulated graph and the Valente graph.
See: Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ’small-world’networks. nature, 393(6684), 440-442.
Watts, D. J. (2004). Small worlds: the dynamics of networks between order and randomness. Princeton university press.
<- watts.strogatz.game(1, 37, mean(degree(valente)), 0.5, loops = FALSE, multiple = FALSE)
sim
plot(sim, vertex.label= NA, edge.arrow.size=0.02,vertex.size = 5, xlab = "Small world model")
mean(shortest.paths(sim))
## [1] 1.567568
mean(shortest.paths(rg))
## [1] 1.92111
mean(shortest.paths(valente))
## [1] 2.726077
transitivity(sim)
## [1] 0.3642919
transitivity(rg)
## [1] 0.222561
transitivity(valente)
## [1] 0.5142857