Background: At the individual level, one dimension of position in the network can be captured through centrality.
Conceptually, centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In practice, identifying exactly what we mean by ‘center’ is somewhat complicated.
Node Leve Measures: Degree, Closeness, Betweenness, Power
Graph Level measures: Centralization
Borgatti (2003; 2005) discusses centrality in terms of two key dimensions:
Substantively, the key question for centrality is knowing what is flowing through the network. The key features are:
Radial: Whether the actor retains the good to pass to others (Information, Diseases). Nodes are the endpoint. Medial: Whether they pass the good and then lose it (physical objects) - Nodes are itermediaries. Distance: Whether the ket factor is distance - typically paths separating nodes. Frequency: Whether the key factor is the number of alters one encounters (information)
The off-the-shelf measures do not always match the social process of interest, so researchers need to be mindful of this and the future of centrality may include induced measures that are more theoretically precise.
For the purposes of this workshop, we will need several packages
starting with igraph for most of the network analysis,
ggplot2 for some graphs, dplyr for
data wrangling, Hmisc for constructing a correlation
matrix, CINNA just to show the centrality options
available, intergraph to move from
igraph to sna and sna
for looking at brokerage. We use import::from
and select
only the brokerage
function from sna to
avoid confusing dependencies.
library(igraph)
library(ggplot2)
library(dplyr)
library(Hmisc)
library(CINNA)
library(intergraph)
::from(sna, brokerage=brokerage) import
Let’s get a quick look at our exemplar data set:
The Florentine Marriage Network. Here, the nodes are families and the edges are marital ties. Florence around 1430 was experiencing rapid economic and political transformation and we see two points of struggle between the Medicis and the Strozzis.
We have this stored as a pajek file and can load it in the usual way into igraph.
<- as.undirected(read.graph("data/flomarriage.net", format="pajek"))
flomarriage
<- as.matrix(read.table("data/flowealth.clu", skip=1))
wealth V(flomarriage)$wealth <- as.vector(wealth)
<- as.matrix(read.table("data/floparty.clu"))
party V(flomarriage)$party <- as.vector(party)
<- delete_vertices(simplify(flomarriage), degree(flomarriage)==0)
flomarriage
plot.igraph(flomarriage, vertex.color=as.factor(V(flomarriage)$party))
If we want a method that allows us to distinguish “important” actors, intuitively we may rely on popularity: The actor with the most ties is the most important: So our old friend degree is back!
Recall degree is the number of edges incident to a node - in an
undirected graph this is the number of alters that you have. In
igraph we use the degree
function.
<- degree(flomarriage) degree_centrality
plot.igraph(flomarriage, vertex.size=degree_centrality*5,
edge.arrow.size=.5, vertex.color=as.factor(V(flomarriage)$party))
We can also construct degree as a graph property capturing the extent
to which the graph is controlled by a single node:
centralization.degree
in igraph.
centralization.degree(flomarriage, normalized=T)
## $res
## [1] 1 3 2 3 3 1 4 1 6 1 3 3 2 4 3
##
## $centralization
## [1] 0.2380952
##
## $theoretical_max
## [1] 210
Here $res
are the degree scores for each node,
centralization
is the graph level index, and the
max
is the theoretical maximum for the unnormalized graph
if everyone was connected to everyone else and is the denominator in the
normalization indicated by normalized=T
.
A second measure of centrality is closeness centrality. An actor is considered important if they are relatively close to all other actors.
Closeness (closeness
in igraph) is
based on the inverse of the geodesic path - or shortest distance - of
each actor to every other actor in the network. This is typically
normalized.
We can also calculate centralization for closeness:
centralization.closeness
in igraph.
<- closeness(flomarriage, normalized=TRUE, mode="ALL")
close_centrality
<- centralization.closeness(flomarriage)
cc
$centralization cc
## [1] 0.3224523
plot.igraph(flomarriage, vertex.size=close_centrality*50,
edge.arrow.size=.5,vertex.color=as.factor(V(flomarriage)$party))
Borgatti and Everett (2020) call degree and closeness radial measures
of centrality because they focus upon each node as an endpoint. Some
questions ask us to consider the potential for nodes to mediate the flow
of bits (e.g. information, disease, etc.). Betweenness centrality
betweenness
in igraph is a medial measure
as it asks which nodes sit between other nodes. The score is the number
of geodesics that a node sits on in a graph. This measure is frequently
used to capture “structural holeness” of nodes.
<- betweenness(flomarriage) between_centrality
plot.igraph(flomarriage, vertex.size=between_centrality/2,
edge.arrow.size=.5, vertex.color=as.factor(V(flomarriage)$party))
We may also conceive of power as a measure of proximity. The powerful are connected to other powerful people. There are a series of “beta” centrality scores that weigh the power of proximate nodes: 1-step nodes power is weighed more heavily than 2-step nodes and so on.
Bonacich Power Centrality power_centrality
in
igraph captures how an actor’s centrality (prestige) is
equal to a function of the prestige of those they are connected to.
Thus, actors who are tied to very central actors should have higher
prestige/ centrality than those who are not.
Eigenvector centrality eigen_centrality
in
igraph is often conceptualized as a capturing
influence. Eigenvector centrality is a relative measure where
connections to high-scoring nodes are more highly weighted. PageRank is
a version of eigenvector centrality.
<- power_centrality(flomarriage)
bonacich_centrality
<- eigen_centrality(flomarriage)
eig_centrality
<- centr_eigen(flomarriage)
ce
print(ce$centralization)
## [1] 0.5277086
This can be an interesting way to think about kind of subgraph groupiness as a type of centrality. The k-core is the maximal subgraph where each vertex has at least k degrees. This is useful for very large graphs like Gonzales-Bailon et al. (2011).
coreness
in igraph decomposes
cores.
coreness(flomarriage)
## Acciaiuoli Albizzi Barbadori Bischeri Castellani Ginori
## 1 2 2 2 2 1
## Guadagni Lamberteschi Medici Pazzi Peruzzi Ridolfi
## 2 1 2 1 2 2
## Salviati Strozzi Tornabuoni
## 1 2 2
median(coreness(flomarriage))
## [1] 2
First we can look at some bivariate relationships.
<- eig_centrality$vector
ecent
<- data.frame(V(flomarriage)$name, V(flomarriage)$party, V(flomarriage)$wealth,
flo.df degree(flomarriage), closeness(flomarriage), betweenness(flomarriage), eig_centrality$vector)
<- rename(flo.df, c(family = 1, party=2, wealth=3, degree=4, close=5, between=6, eig=7))
flo.df
<- flo.df %>% mutate(medici=recode(party, 'Medici'='1', "Split"='0', 'Oligarch'='0'))
flo.df
<- flo.df[,3:8]
nflo.df
<- rcorr(as.matrix(nflo.df))
cmat
::corrplot.mixed(cmat$r, order = 'AOE') corrplot
We can also evaluate the relationship between wealth and centrality
using OLS regression using lm
in base R.
<- lm(wealth~medici + between, data=flo.df)
wlth.reg
summary(wlth.reg)
##
## Call:
## lm(formula = wealth ~ medici + between, data = flo.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.625 -19.556 -1.115 10.776 101.038
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.4184 14.8810 2.447 0.0307 *
## medici1 -1.8467 19.8027 -0.093 0.9272
## between 0.9154 0.8069 1.134 0.2788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.54 on 12 degrees of freedom
## Multiple R-squared: 0.097, Adjusted R-squared: -0.0535
## F-statistic: 0.6445 on 2 and 12 DF, p-value: 0.5422
Regressions of this type violate the independence assumption of standard regressions and, while somewhat commonly used, you may want to bootstrap standard errors. You can use the boot package for this among a few other strategies.
Next, we can graphically compare centrality scores to how constrained families are within the network. First, we compare betweenness with constraint. Recall that constraint is Burt’s measure of redundancy in an ego’s network.
library(ggplot2)
library(ggrepel)
$cons <- constraint(flomarriage)
flo.df
$prty <- as.numeric(as.factor(flo.df$party))
flo.df
ggplot(flo.df, aes(x=cons, y=between, label=family)) +
geom_point() + geom_label_repel(aes(label=family, fill = factor(prty+4)), max.overlaps=Inf)+
geom_smooth(method='lm') +
scale_fill_manual(values=c("tomato", "darkgoldenrod", "deepskyblue"),
name="Party",
breaks=c("5", "6", "7"),
labels=c("Medici", "Oligarchs", "Split Loyalists"))+
theme_bw()
ggplot(flo.df, aes(x=cons, y=degree, label=family)) +
geom_point() + geom_label_repel(aes(label=family, fill = factor(prty+4)), max.overlaps=Inf)+
geom_smooth(method='lm') +
scale_fill_manual(values=c("tomato", "darkgoldenrod", "deepskyblue"),
name="Party",
breaks=c("5", "6", "7"),
labels=c("Medici", "Oligarchs", "Split Loyalists"))+
theme_bw()
ggplot(flo.df, aes(x=wealth, y=eig, label=family)) +
geom_point() +
geom_label_repel(aes(label=family, fill = factor(prty+4)), max.overlaps=Inf) +
geom_smooth(method='lm') +
scale_fill_manual(values=c("tomato", "darkgoldenrod", "deepskyblue"),
name="Party",
breaks=c("5", "6", "7"),
labels=c("Medici", "Oligarchs", "Split Loyalists"))+
theme_bw()
Check out the CINNA package for an off-the-shelf centralities that you read about and want to explore. It has a lot of options!
Gould and Fernandez recognize - kind of following Borgatti and Brass’s (2019) claim that nodes matter for structure and not just structure for nodes - that nodes are playing different local roles especially as it pertains to mediation across groups. Their theoretical ideas about brokerage have had lasting influence, while their actual measurement strategies have maybe received less attention than deserved. The sna package has a function to calculate the brokerage roles that Gould and Fernandez identify.
First, we need to bring port the network from an igraph object to an
sna object and we can use the asNetwork
function in
intergraph for that switch.
<- asNetwork(flomarriage) flonet
Now we can use the brokerage
function in
sna to calculate the distribution of each brokerage. We
need the network and the grouping attribute of interest (recall that
Gould and Fernandez are explicitly interested in how nodes mediate
across groups).
The matrix of observed brokerage scores is stored as
$raw.nli
and the aggregated observed scores is stored as
raw.gli
.
<- brokerage(flonet, party)
flobroker
::kable(flobroker$raw.nli, caption="Node-Level Brokerage Table") knitr
w_I | w_O | b_IO | b_OI | b_O | t | |
---|---|---|---|---|---|---|
Acciaiuoli | 0 | 0 | 0 | 0 | 0 | 0 |
Albizzi | 0 | 2 | 0 | 0 | 4 | 6 |
Barbadori | 0 | 0 | 0 | 0 | 2 | 2 |
Bischeri | 2 | 0 | 1 | 1 | 0 | 4 |
Castellani | 0 | 2 | 1 | 1 | 0 | 4 |
Ginori | 0 | 0 | 0 | 0 | 0 | 0 |
Guadagni | 6 | 0 | 3 | 3 | 0 | 12 |
Lamberteschi | 0 | 0 | 0 | 0 | 0 | 0 |
Medici | 2 | 6 | 8 | 8 | 4 | 28 |
Pazzi | 0 | 0 | 0 | 0 | 0 | 0 |
Peruzzi | 2 | 0 | 0 | 0 | 0 | 2 |
Ridolfi | 0 | 0 | 2 | 2 | 0 | 4 |
Salviati | 2 | 0 | 0 | 0 | 0 | 2 |
Strozzi | 0 | 2 | 3 | 3 | 0 | 8 |
Tornabuoni | 0 | 0 | 2 | 2 | 0 | 4 |
::kable(flobroker$raw.gli, caption="Aggregate Brokerage Table") knitr
x | |
---|---|
w_I | 14 |
w_O | 12 |
b_IO | 20 |
b_OI | 20 |
b_O | 10 |
t | 76 |
What do the brokerage strategies of the Florentine families suggest about how power is distributed in this network?