Let’s get a quick look at our network for this week: Zachary’s Karate Club. The nodes are members of a karate club and edges are interactions. The network occurs just prior to a split between the Teacher (node 1) and the club president (node 34). This is the most common exemplar network. Its popularity inspired me to include the network as the front image of the Oxford Handbook of Social Networks.
I have stored the network as an R object that we can simply
load
.
As always we should examine the plot. I want to look at a few things
when thinking about groups. For example, the density of the network
might be a good initial look at group formation. We also leverage one
way of thinking about groups - as ego networks - using the
make_ego_graph(g)
function that we set to one for
first-order neighbors.
Also note that we store the layout of the graph using the
layout_with_kk(g)
function in igraph so
that each network throughout the workshop can have the same layout by
using the layout=
function when plotting networks.
layout.kk <- layout_with_kk(karate)
kdens <- make_ego_graph(karate, 1) %>%
vapply(graph.density, numeric(1))
plot.igraph(karate, layout=layout.kk, vertex.size=kdens*10)
Let’s start with several basic approaches to locating groups: components, bicomponents, cliques, and cores.
These are basic because there are not estimated by a community detection algorithm, but are more simply defined by node and path characteristics.
A component is a maximally connected subgraph where every node can be reached. Components could be isolates, dyads, triads, are larger structures, but they are otherwise disconnected from other parts of the network. Strong components are directed and weak components are undirected. The giant connected component is the largest maximally connected subgraph.
A bicomponent is a maximally connected subgraph with at least two paths between each node. Every bicomponent will have at minimum a degree of 2.
We can use components(g)
&
biconnected.components(g)
in igraph.
First we calculate components.
## [1] 34
Next, we can calculate bicomponents and plot the bicomponent.
biconnected.components
results in several objects, but to
pick the subgraph we can select from the list of bicomponents stored as
$components
. If we want to identify the largest bicomponent
we can use max(length)
. So, if we store the results of
biconnected.components
in an object called biconnected, we
can select the largest bicomponent by:
bicomponent.org$components[[max(length(bicomponent.org$components))]]
Next, we use the induced.subgraph(g, vids=x)
function in
igraph where g is the graph and x
equals the list of vertices that you want to use in the subgraph. Tihs
is the results of the above example.
bikarate <- lapply(biconnected_components(karate)$components,
length) %>% which.max()
bikaratevs <- biconnected_components(karate)$components[[bikarate]]
bikarate.g <- induced_subgraph(graph=karate, vids=bikaratevs)
plot.igraph(bikarate.g)
This is a densely connected graph, so it doesn’t change to much. Note that the leaf node, 12, is removed.
We may want to know groups of actors that really hang close to one another. Cliques capture one way of thinking about this groupiness. A clique is a maximally connected subgraph (e.g., every node is connected to one another) of at least three nodes. These are essentially groups of closed triads, so they are dense connections.
In igraph:
cliques(g)
| max_cliques(g)
|
largest_cliques(g)
cliques <- largest_cliques(karate)
c1 <- cliques[[max(length(cliques))]]
clique2 <- induced.subgraph(graph=karate,vids=c1)
plot(clique2)
k-Cores are more relaxed way of thinking about groups. These are subgraphs where each node is maximally connected to at least k neighboring nodes. Some scholars also think of coreness as a proxy for centrality in very large networks.
coreness(g)
in igraph cacluates cores.
Each node is assigned a k for its largest k-core. We can quickly examine
the distribution of largest cores using table
in base
R.
## kcore
## 1 2 3 4
## 1 11 12 10
We can store the core for each node as a vertex or nodal attribute
called “core” or whatever using the usual format
V(g)$core <- kcore
where kcore is the output for
coreness(g)
.
We can assign a color to each core in several ways. An effective
solution is to use brewer.pal
from
RColorBrewer to index the core groups as factors and
storing them as a vector in these steps:
Grab four colors as there are four core groups.
thing <- brewer.pal(4, "Set1")
Index to the four as stored as a vertex attribute in the graph. We
need this to be a as.factor
.
my_color <- thing[as.numeric(as.factor(V(g)$core))]
Then we plot the network in igraph using
vertex.color=my_color
. Note there is also a legend example
included here.
V(karate)$core <- kcore
coul <- brewer.pal(4, "Set1")
my_color <- coul[as.numeric(as.factor(V(karate)$core))]
plot.igraph(karate, , layout=layout.kk, vertex.color=my_color)
legend("bottomleft", legend=levels(as.factor(V(karate)$core)) ,
col = coul , bty = "n", pch=20 , pt.cex = 3, cex = 1.5,
horiz = FALSE, inset = c(0.1, 0.1))
K-components are defined by: - (a) the number of nodes that must be removed from the graph to disconnect the components. - (b) the number of ties that must be removed from the graph to disconnect the components.
Mathematically, a & b are equivalent.
To locate k-components we first identify cut-points/articulation points. These are nodes whose removal results in the graph splitting apart. These can also be seen as “important” nodes.
articulation.points(g)
in igraph
identify cut points.
Modularity maximization is a poular way of thinking about “groupiness” within a complete network.
Maximization is: - generally thought of as describing a tendency towards having more in-group than out-group ties - in-/out- groups are NOT defined on node characteristics (ala homophily), but on tie-patterns - typically estimated through divisive/agglomerative assignment or looking at the consequences of iteratively removing nodes.
Newman-Girvan and Louvain community detection both use modularity maximization.
Newman-Girvan is a divisive approach:
How to define “fitness”?
fg_comms <- cluster_fast_greedy(karate)
V(karate)$fg_comms <- fg_comms$membership
plot.igraph(karate, layout=layout.kk, vertex.color=V(karate)$fg_comms)
Louvain (Blondel et al. 2008) is a modularity optimization approach:
-Optimize modularity on all nodes then iterate - Does switching the node to another group increase modularity? - stop at solution that produces the largest increase in modularity - This has been optimized in the Leiden approach (see Leiden pacakge).
lv_comms <- cluster_louvain(karate)
V(karate)$lv_comms <- lv_comms$membership
plot.igraph(karate, layout=layout.kk, vertex.color=V(karate)$lv_comms)
The Leiden algorithm is a variation on Louvain that allows you to adjust a refinement parameter or to generate bigger or smaller groups.
cluster_leiden(g, resolution_parameter=#)
in
igraph with 1 as the default resolution parameter
estimates this solution. A higher resolution locates smaller
communities, while a lower resolution locates larger groups.
Here we explore a large and a small resolution.
leid_large_comms <- cluster_leiden(karate, resolution_parameter = 3)
V(karate)$leid_large_comms <- leid_large_comms$membership
plot.igraph(karate, layout=layout.kk, vertex.color=V(karate)$leid_large_comms)
leid_small_comms <- cluster_leiden(karate, resolution_parameter = .25)
V(karate)$leid_small_comms <- leid_small_comms$membership
plot.igraph(karate, layout=layout.kk, vertex.color=V(karate)$leid_small_comms)
You can see that the higher resolution is starting to parse smaller bits of the network, here, individual nodes. A very high resolution will basically just put every node into its own community.
Walktrap community detection locates subgraphs via random walks. Nodes that are connected via short random walks will be clustered together.
A random walk is exactly as it sounds: Imagine traversing nodes randomly along available edges.
Infomap community detection also uses random walks.
cluster_walktrap(g)
in igraph detects
communities useing walktrap community detection.
wt_comms <- cluster_walktrap(karate)
V(karate)$wt_comms <- wt_comms$membership
plot.igraph(karate, layout=layout.kk, vertex.color=V(karate)$wt_comms)
Another way of thinking about groups is to consider how actors or nodes are related based on the patterns of ties that they share. So, instead of being grouped by some type of proximity, groups are formed based on similarity. For example, we could use structural equivalence as a criteria for groupiness. Two nodes are in the same group if they have the exact same ties to one another, line cooks may have the exact same ties to chefs, managers, and other line cooks. Groups based on tie patterns are called blockmodels and this form of blockmodel based on structural equivalence is its most restrictive form. Obviously, these kinds of groups are pretty rare.
Most common blockmodels are relaxed forms of structural equivalence and allow us to ask which nodes are in similar positions. Blockmodels based on regular equivalence capture groups of nodes that share similar patterns of nodes, but not the exact same nodes such that a prep cook and a line cook could be in the same group if they are connected to similarly structured others (e.g., the same manager, different supervising chef, other prep or line cooks). This has implications for hierachies and for the distributions of resources, such as information.
The hierarchical aspect has been particularly useful in understanding organziations and the world system.
igraph doesn’t implement blockmodels. So we will use sna package instead.
First, we use asNetwork(g)
from
intergraph to switch from igraph to
sna.
Blockmodeling involves two major steps.
After plotting the network in sna using the
plot.network
function, we can use
equiv.clust(g)
to locate clusters. equiv.clust
“computes the distances between all pairs of positions” and
plot
the solution which is a dendogram. We can inspect
logical places to slice the solution.
After inspecting the dendogram, we can construct our blockmodel with a 4 block solution given the break around height=25.
blockmodel(g, k=[# of Blocks])
in sna
constructs blockmodels.
##
## Network Blockmodel:
##
## Block membership:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 2 2 2 3 3 3 2 3 3 3 2 2 2 3 3 3 2 3 2 3 2 3 3 3 3
## 27 28 29 30 31 32 33 34
## 3 3 3 3 3 3 4 4
##
## Reduced form blockmodel:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## Block 1 Block 2 Block 3 Block 4
## Block 1 NaN 1.00000000 0.28571429 0.0000000
## Block 2 1.0000000 0.28888889 0.02380952 0.1500000
## Block 3 0.2857143 0.02380952 0.07619048 0.5714286
## Block 4 0.0000000 0.15000000 0.57142857 1.0000000
We can also plot the block in igraph.
blks <- data.frame(id=kbm$order.vector, block=kbm$block.membership)
blks <- blks[order(blks$id),]
V(karate)$blocks <- blks$block
plot(karate, layout=layout.kk, vertex.color=V(karate)$blocks)
How do we evaluate solutions in terms of community detection. While
some of the algorithms intentionally maximize modularity, they may not
be the most maximizing. We can compare modularity across solutions using
modularity(g, membership, weights = NULL, ...)
in
igraph.
mod.df <- data.frame(louvain=modularity(karate, lv_comms$membership), fast_greedy=modularity(karate, fg_comms$membership), leiden_large=modularity(karate, leid_large_comms$membership), leiden_small=modularity(karate, leid_small_comms$membership), walktrap=modularity(karate, wt_comms$membership), blocks=modularity(karate, kbm.mem))
knitr::kable(mod.df)
louvain | fast_greedy | leiden_large | leiden_small | walktrap | blocks |
---|---|---|---|---|---|
0.4151052 | 0.3806706 | -0.0498028 | 0.3739316 | 0.3532216 | 0.1245069 |
Next, we can include a couple of examples that put community detection to good use. There are, of course, many examples at this point.