Set up
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
load('data/example-data-compcog.RData')
There should be 3 objects loaded into your R workspace:
pnet
refers to a section of the phonological network
first described by Vitevitch (2008). The nodes represent English words,
and edges connect words that are phonological neighbors of each other
based on the 1-edit distance metric computed on their phonological
transcriptions (Luce & Pisoni, 1998). E.g., /k@t/–/k@p/ are
neighboring nodes. Specifically, this network is the 2-hop network of
the word ‘speech’ - in addition to ‘speech’ itself, its immediate
phonological neighbors and the neighbors of its neighbors are included
in this representation.
pnet_edgelist
is the raw data that is used to create
pnet
. It is an edgelist whereby each row depicts one edge
in the network.
snet
refers to a section of the word association network
using data from De Deyne et al. (2020). The nodes represent English
words, and edges connect words that are produced as free associations of
other words. E.g., “cat”–“dog” are neighboring nodes. This network has
edges with 2 interesting properties. (i) Each edge has a
weight
attribute that corresponds to the associative
strength of two nodes, or the proportion of participants who provided a
specific response to the cue word. (ii) The edges are also
directed
such the direction goes from the cue word
to the response word, i.e., “cat”->“dog”. Specifically, this
network is the 1-hop network of the word ‘cheese’ - in addition to
‘cheese’ itself, its immediate associates and the cue words that led to
the response ‘cheese’ are included in this representation. See https://smallworldofwords.org/en/project/research for
more information.
summary(pnet)
## IGRAPH ddad942 UN-- 39 121 --
## + attr: name (v/c)
summary(snet)
## IGRAPH 9a00b73 DNW- 211 1324 --
## + attr: name (v/c), weight (e/n)
Notice the two numbers on the first row: The first tells you the
number of nodes or vertices in the network, the second tells you the
number of edges or links.
There are also some cyptic looking characters. Let’s unpack that. The
first character is either “U” or “D” and tells you whether the edges are
“undirected” or “directed”. The second character “N” (if present) tells
you that the nodes have “names”. The third character “W” (if present)
tells you that the edges are weighted. The final character is blank
unless you have a special type of network known as a bipartite network
(in which case the fourth character is “B”).
On the second row, additional information about node or edge
attributes can be found if they have been specified in the construction
of the network. Both networks have node names, and the semantic network
has an edge attribute known as weight
.
Loading a network into igraph
There are a number of different ways to convert your raw data into a
network representation that igraph
can analyze. This data
first needs to be organized as an edgelist or an adjacency matrix or an
adjacency list and then loaded into RStudio. Below I provide a quick
example of how an edgelist can be converted into a
network.
head(pnet_edgelist)
## [,1] [,2]
## [1,] "biC;beach" "iC;each"
## [2,] "biC;beach" "liC;leach"
## [3,] "iC;each" "liC;leach"
## [4,] "pi;p" "pis;peace"
## [5,] "biC;beach" "piC;peach"
## [6,] "iC;each" "piC;peach"
pnet2 <- graph_from_edgelist(pnet_edgelist,
directed = F)
summary(pnet2)
## IGRAPH 2e75c97 UN-- 39 121 --
## + attr: name (v/c)
There are many useful functions in igraph
(those that
begin with graph_from_*
) for converting various kinds of
datasets into networks. The igraph
manual is a good place
to start learning about these functions and their arguments to specify
edge or node attributes of the network. Outside of igraph
you will probably have to spend some time curating and wrangling your
data so that it is in the right formats for igraph
.
Visualizations
Here are visualizations of the networks we will play around with
today.
plot(pnet,
vertex.frame.color = 'white',
vertex.label.color = 'grey30',
edge.color = 'black',
vertex.label = gsub('[[:print:]]+;', '', V(pnet)$name),
layout = layout_with_lgl,
frame = TRUE,
margin = c(0,0,0,0),
main = 'A partial phonological network of English words')

plot(snet,
vertex.frame.color = 'white',
vertex.label.color = 'grey30',
edge.color = 'black',
vertex.color = 'seagreen',
vertex.size = 5,
edge.arrow.size = 0.4,
layout = layout_with_graphopt,
frame = TRUE,
margin = c(0,0,0,0),
main = 'A partial semantic network')

Disclaimers
Due to limited time, I assume that you have some familiarity with R
and RStudio. A gentle introduction to R programming can be found here:
https://psyr.djnavarro.net/
I also choose to not cover in detail a number of important topics
such as how to convert your raw data into an igraph
network
object, network visualization in igraph
(for a great
introduction to this topic, see https://kateto.net/network-visualization), and other
methods of quantifying the community structure of the network (i.e.,
meso-level). Some of these topics are briefly touched on in a different
tutorial and you can find the materials here: https://vpf-netsci.netlify.app/part2-demo.html
Measuring the network
Once we have a network representation, the tools of network science
can be applied to analyze the networks in different ways. In this
tutorial we focus on a descriptive analysis of the network and
review various network measures that can be used to describe or quantify
network structure at three different levels of the network: the
micro-level (referring to the local structure and other properties of
individual nodes), the meso-level (subgroups or clusters of nodes), and
the macro-level (referring to the overall or global structure of the
network).

Micro-level (node-level)
Micro-level network measures provide you with information about
specific nodes in the network. These are generally known as centrality
measures in the network science literature. Centrality is the network
scientist’s way of quantifying the relative “importance” of a given node
relative to other nodes in the network. There are many different
definitions of what counts as “central”, as you will see in the
following subsections. There is no single “correct” or “best” metric -
which metrics are most useful to you will depend on the nature of the
system that you are modeling as well as the network behavior that you
interested in.
Degree (unweighted edges)
The degree of node i refers to the number
of edges or links connected to that node.
If your network has directed edges, in-degree refers to the
number of edges that are going towards the target node, whereas
out-degree refers to the number of edges that are going away
from the target node.

degree(graph = pnet)
## xpik;apeak biC;beach sid;cede iC;each liC;leach pi;p
## 2 5 4 5 5 9
## p@C;patch pis;peace piC;peach pik;peak pil;peal piz;pease
## 6 9 22 12 9 9
## pit;peat pin;peen pip;peep piv;peeve pRC;perch pIC;pitch
## 9 9 9 9 6 6
## poC;poach pWC;pouch priC;preach pUC;putsch riC;reach sik;seek
## 6 6 2 6 6 5
## slik;sleek snik;sneak sped;spade spik;speak spEk;speck spEd;sped
## 3 3 3 11 5 4
## spiC;speech spid;speed spYk;spike spok;spoke spuk;spook sp^d;spud
## 3 8 4 4 4 3
## stid;steed swid;swede tiC;teach
## 3 3 5
degree(graph = snet, v = 'cheese')
## cheese
## 231
degree(graph = snet, v = 'cheese', mode = 'in')
## cheese
## 185
degree(graph = snet, v = 'cheese', mode = 'out')
## cheese
## 46
degree(graph = snet, v = 'cheese', mode = 'all')
## cheese
## 231
Strength (weighted edges)
The strength of node i refers to the sum of
its adjacent edge weights. Only applicable to weighted
networks.
strength(graph = snet) |> head(5)
## age aged ages aging american
## 0.57 0.55 0.04 0.17 0.36
strength(graph = snet, v = 'age', mode = 'in')
## age
## 0.38
strength(graph = snet, v = 'age', mode = 'out')
## age
## 0.19
strength(graph = snet, v = 'age', mode = 'all')
## age
## 0.57
Relevant finding in psychology: De Deyne et al. (2019) found that
strength (operationalized as association frequency) was superior to, or
at least similar to, word frequency as a predictor of lexical decision
and semantic decision performance.
Local Clustering Coefficient (unweighted)
The local clustering coefficient, C, of
node i measures the ratio of the actual number of edges
existing among nodes directly connected to the target node i to
the number of all possible edges among these nodes.
C ranges from 0 to 1. When C = 0, none of the
neighbors of a target node are neighbors of each other. When C
= 1, every neighbor is also a neighbor of all the other neighbors of a
target word.
You can think of the local clustering coefficient as providing a
measure of the level of interconnectivity among the local neighborhood
of the node.

Both words have the same number of neighbors, but different local
clustering coefficients.
Relevant finding in psychology: Chan & Vitevitch (2009) showed
that words with higher clustering coefficients were responded to more
slowly than words with lower clustering coefficient in spoken word
recognition tasks (even after controling for the size of the
phonological neighborhood.
transitivity(graph = pnet, type = 'local', vids = 'spik;speak') |> round(3)
## [1] 0.218
A couple of things to note:
It is important to specify type = local
for local
clustering coefficients, as compared to the global clustering
coefficient of the entire graph (this is a macro-level measure that we
will visit later)
Many of these functions contain additional arguments for
indicating whether to consider the directionality and weights of the
edges. If your graph is undirected and unweighted, these are ignored by
default. If your graph is directed and weighted, you can indicate
whether to include or exclude this information for the computation of
the network measure.
Local Clustering Coefficient (weighted)
If you have a weighted network, you can compute local clustering
coefficients using Barrat et al.’s (2004) generalization of transitivity
to weighted networks by specifying type = 'weighted'
. If
your network is unweighted, the generalization will return the
unweighted C (see example of ‘speak’ below).
transitivity(graph = snet, type = 'local', vids = 'cheese') |> round(3)
## [1] 0.043
transitivity(graph = snet, type = 'weighted', vids = 'cheese') |> round(3)
## [1] 0.056
transitivity(graph = pnet, type = 'local', vids = 'spik;speak') |> round(3)
## [1] 0.218
transitivity(graph = pnet, type = 'weighted', vids = 'spik;speak') |> round(3)
## [1] 0.218
Closeness Centrality
Closeness centrality of node i is the inverse of the average
of the length of the shortest path between node i and all other
nodes in the network. If a node has high closeness centrality, it means
that on average, it takes few steps to travel from that node to all
other nodes in the network. If a node has low closeness centrality, it
means that on average, it takes more steps to travel from that node to
all other nodes in the network.
Closeness centrality is commonly viewed as an indicator of the
accessibility of a node in the network from all other locations
in the network.
This is a famous network (Krackhardt’s Kite) that nicely illustrates
the differences between degree, closeness, and betweenness
centrality.
closeness(graph = snet, normalized = T, mode = 'all', weights = NA) |> head()
## age aged ages aging american appetizer
## 0.5060241 0.5072464 0.5023923 0.5060241 0.5134474 0.5121951
closeness(graph = snet, normalized = T, mode = 'in', weights = NA) |> head()
## age aged ages aging american appetizer
## 0.6666667 0.6666667 NaN 0.5000000 0.2790698 0.3279743
closeness(graph = snet, normalized = T, mode = 'out', weights = NA) |> head()
## age aged ages aging american appetizer
## 0.3380282 0.3503650 0.3317422 0.3333333 0.3520408 0.3650794
closeness(graph = snet, normalized = T, mode = 'all') |> head()
## age aged ages aging american appetizer
## 18.46966 14.54294 18.60053 18.48592 19.84877 17.90281
Note that closeness centrality can only be meaningfully computed for
connected graphs (so that a path exists between any pair of nodes). If
there are distinct network components, this means that for some sets of
node pairs, the path between them does not exist and closeness cannot be
computed. Usually, network scientists focus their analysis on the
largest connected component of the network and ignore the smaller
connected components (viewed as outliers).
It is typical to have normalized = T
so that the values
are normalized with respect to the size of the network. As usual, you
can specify the mode
and weights
arguments
accordingly if you have directed/weighted networks to get the
corresponding versions of closeness centrality computed. However,
caution is needed as the interpretation of weights
in this
context is to interpret them as distances rather than
connection strengths: higher weights = longer distances (From
igraph
manual: “If the graph has a weight edge attribute,
then this is used by default. Weights are used for calculating weighted
shortest paths, so they are interpreted as distances.”). It is highly
recommended to read the manual carefully to understand the measures that
are being computed.
Relevant finding in psychology: Siew (2018) showed that the
closeness centrality of words in an orthographic similarity network
influenced word naming and visual lexical decision performance
differently. Words of higher closeness centralities were responded to
more quickly in lexical decision, but were named more slowly.
Betweenness Centrality
Betweenness centrality is a measure of the degree to which nodes
stand in between each other. A node with a high betweenness centrality
is a node that is frequently found in the short paths of other pairs of
nodes in the network. In contrast, a node with a low betweenness
centrality is a node that is not usually found in the short paths of
node pairs. Betweeenness can be viewed as an indicator if whether a node
represents a “bottleneck” in the system.
betweenness(graph = pnet, normalized = T, weights = NA, directed = F) |> head()
## xpik;apeak biC;beach sid;cede iC;each liC;leach pi;p
## 0.00000000 0.00000000 0.01730678 0.00000000 0.00000000 0.00000000
betweenness(graph = snet, normalized = T, weights = NULL, directed = T) |> head()
## age aged ages aging american appetizer
## 0.0002050581 0.0009881305 0.0000000000 0.0001367054 0.0103941672 0.0098454470
The same considerations (about connected graphs, additional arguments
for weighted and directed graphs, normalization, interpretation of
weights as distances) from the closeness centrality section applies to
this section as well.
Page Rank Centrality
PageRank is a centrality measure developed by Google to rank webpages
(the historic paper describing the algorithm can be viewed here). The
general idea is that a random walker will traverse the network space and
their paths are biased by the link connectivity structure of the
network. The random walker restarts the walk after some time (simulating
“boredom” of the surfer). The number of visits received by a node
provides an indicator of its importance in the network. Intuitively, we
expect that nodes have a high PageRank if there are many nodes that
point to it, or if there are nodes that point to it that themselves have
a high PageRank.
page_rank(graph = pnet, directed = F, weights = NA)$vector |> head()
## xpik;apeak biC;beach sid;cede iC;each liC;leach pi;p
## 0.01085805 0.02109911 0.02350068 0.02109911 0.02109911 0.02849254
page_rank(graph = snet, directed = T, weights = NULL)$vector |> head()
## age aged ages aging american appetizer
## 0.0016034240 0.0012433152 0.0007545644 0.0009712994 0.0011928822 0.0008034249
The weights
and directed
arguments can be
adjusted depending on your graph type. It is important to note that the
interpretation of edge weights here is that of “connection strength”
(from igraph
manual: “This function interprets edge weights
as connection strengths. In the random surfer model, an edge with a
larger weight is more likely to be selected by the surfer.”). This is
different from the “distance” interpretation of edge weights by
closeness and betweenness.
Relevant finding in psychology: Griffiths et al. (2007) showed that
Page Rank centralities of words in a word association network provided
good predictions for the words generated by participants in a letter
fluency task.
Macro-level (network-level)
In this section, we will review network science measures that
describe the overall or global structure of the entire network. You can
think of these measures as providing a “bird’s eye view” of your
network, and they are useful for comparing different network
representations.
Average Shortest Path Length
Average shortest path length (ASPL) refers to the
mean of the shortest possible path between all possible pairs of nodes
in the network. (This loosely corresponds to the idea of “six degrees of
separation” in social networks.)

Example depicting the shortest path between nodes 25 and
16.
average.path.length(graph = pnet)
## [1] 2.557355
mean_distance(graph = pnet)
## [1] 2.557355
mean_distance(graph = snet, weights = NULL, directed = T)
## [1] 0.1090017
mean_distance(graph = snet, weights = NULL, directed = F)
## [1] 0.06239856
mean_distance(graph = snet, weights = NA, directed = T)
## [1] 2.926504
Relevant finding in psychology: Siew (2018) showed that concept
networks (constructed from concept maps generated by students) with
larger average shortest path lengths were associated with higher quiz
scores, after controlling for network size.
Global Clustering Coefficient
Global clustering coefficient refers to the number
of closed triangles in the network relative to the number of possible
triangles. It is a measure of overall level of local
connectivity among nodes in the network.
A simple way of thinking about this concept is that it is measuring
the probability that each pair of “friends” of a given node are also
friends with each other.
transitivity(graph = pnet, type = 'global')
## [1] 0.6805869
transitivity(graph = snet, type = 'global')
## [1] 0.1588113
Small World Index
The term “small world” has a specific meaning in network science as
compared to the layperson’s. A network is considered to have small world
characteristics if (i) its ASPL is shorter than that of a
randomly generated network with the same number of nodes and edges, and
(ii) its global C is larger than that of a randomly generated
network with the same number of nodes and edges. There are various ways
to compute a value that quantifies the “small worldness” of a network,
although we do not cover them here (see Humphries and Gurney, 2008, for
an example, and Neal, 2017, for a comparison of different methods).
The main take home message is that a small world network has high
levels of local clustering (nodes whose neighbors are also neighbors of
each other), but there also exists a number of shortcuts that
drastically reduces the overall distances/path lengths between nodes.
See below for an illustration of this idea.

Network Density
Network density refers to the ratio of the number of
(existing) edges and the number of possible edges among nodes in the
network.

Simple example of networks with lower and higher network
densities.
graph.density(graph = pnet)
## [1] 0.1632928
graph.density(graph = snet)
## [1] 0.02988039
Network Diameter
Network diameter refers to length of the longest
shortest path between nodes in the network. Instead of getting the mean
of all the shortest paths as you did in ASPL, what is the
maximum length of those short paths?

Simple example of networks with higher and lower network
diameters
diameter(graph = pnet)
## [1] 4
diameter(graph = snet, directed = T, weights = NULL)
## [1] 0.88
diameter(graph = snet, directed = F, weights = NULL)
## [1] 0.56
diameter(graph = snet, directed = T, weights = NA)
## [1] 7
References
Barrat, A., Barthélemy, M., Pastor-Satorras, R., & Vespignani, A.
(2004). The architecture of complex weighted networks. Proceedings of
the National Academy of Sciences, 101(11), 3747–3752. https://doi.org/10.1073/pnas.0400087101
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E.
(2008). Fast unfolding of communities in large networks. Journal of
Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Chan, K. Y., & Vitevitch, M. S. (2009). The influence of the
phonological neighborhood clustering coefficient on spoken word
recognition. Journal of Experimental Psychology: Human Perception and
Performance, 35(6), 1934–1949. https://doi.org/10.1037/a0016902
De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., &
Storms, G. (2019). The “Small World of Words” English word association
norms for over 12,000 cue words. Behavior Research Methods, 51,
987–1006.
Fortunato, S. (2010). Community detection in graphs. Physics Reports,
486(3-5), 75-174.
Humphries, M. D., & Gurney, K. (2008). Network
‘small-world-ness’: A quantitative method for determining canonical
network equivalence. PloS One, 3(4).
Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the
Mind: Predicting Fluency With PageRank. Psychological Science, 18(12),
1069–1076. https://doi.org/10.1111/j.1467-9280.2007.02027.x
Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words:
The Neighborhood Activation Model. Ear and Hearing, 19(1), 1–36.
Neal, Z. P. (2017). How small is it? Comparing indices of small
worldliness. Network Science, 5(1), 30–44. https://doi.org/10.1017/nws.2017.5
Newman, M. E. (2006). Modularity and community structure in networks.
Proceedings of the National Academy of Sciences, 103(23), 8577-8582.
Siew, C. S. Q. (2013). Community structure in the phonological
network. Frontiers in Psychology, 4, 553.
Siew, C. S. Q. (2018). The orthographic similarity structure of
English words: Insights from network science. Applied Network Science,
3(1), 13.
Siew, C. S. Q. (2018). Using network science to analyze concept maps
of psychology undergraduates. Applied Cognitive Psychology.
Vitevitch, M. S. (2008). What can graph theory tell us about word
learning and lexical retrieval? Journal of Speech, Language, and Hearing
Research, 51(2), 408–422. https://doi.org/10.1044/1092-4388(2008/030)
---
title: "CompCog 2023 Network Analysis Tutorial"
subtitle: "Modeling psychological data with cognitive networks"
author: "Cynthia Siew"
date: "1st Feb 2023"
output:
  html_document:
    toc: TRUE
    toc_float: TRUE
    df_print: paged
    code_download: true
---
# Set up 

```{r set-up}
library(igraph)

load('data/example-data-compcog.RData')
```

There should be 3 objects loaded into your R workspace:

- `pnet`
- `snet` 
- `pnet_edgelist`

`pnet` refers to a section of the phonological network first described by Vitevitch (2008). The nodes represent English words, and edges connect words that are phonological neighbors of each other based on the 1-edit distance metric computed on their phonological transcriptions (Luce & Pisoni, 1998). E.g., /k\@t/--/k\@p/ are neighboring nodes. Specifically, this network is the 2-hop network of the word 'speech' - in addition to 'speech' itself, its immediate phonological neighbors and the neighbors of its neighbors are included in this representation.

`pnet_edgelist` is the raw data that is used to create `pnet`. It is an edgelist whereby each row depicts one edge in the network.

`snet` refers to a section of the word association network using data from De Deyne et al. (2020). The nodes represent English words, and edges connect words that are produced as free associations of other words. E.g., "cat"--"dog" are neighboring nodes. This network has edges with 2 interesting properties. (i) Each edge has a `weight` attribute that corresponds to the associative strength of two nodes, or the proportion of participants who provided a specific response to the cue word. (ii) The edges are also `directed` such the direction goes *from* the cue word *to* the response word, i.e., "cat"->"dog". Specifically, this network is the 1-hop network of the word 'cheese' - in addition to 'cheese' itself, its immediate associates and the cue words that led to the response 'cheese' are included in this representation. See https://smallworldofwords.org/en/project/research for more information. 

```{r preview}
summary(pnet) # undirected, unweighted edges 

summary(snet) # directed, weighted edges
```

Notice the two numbers on the first row: The first tells you the number of nodes or vertices in the network, the second tells you the number of edges or links. 

There are also some cyptic looking characters. Let's unpack that. The first character is either "U" or "D" and tells you whether the edges are "undirected" or "directed". The second character "N" (if present) tells you that the nodes have "names". The third character "W" (if present) tells you that the edges are weighted. The final character is blank unless you have a special type of network known as a bipartite network (in which case the fourth character is "B"). 

On the second row, additional information about node or edge attributes can be found if they have been specified in the construction of the network. Both networks have node names, and the semantic network has an edge attribute known as `weight`.   

## Loading a network into `igraph`

There are a number of different ways to convert your raw data into a network representation that `igraph` can analyze. This data first needs to be organized as an edgelist or an adjacency matrix or an adjacency list and then loaded into RStudio. Below I provide a quick example of how an **edgelist** can be converted into a network. 

```{r}
head(pnet_edgelist) # preview first few rows of the edgelist 

pnet2 <- graph_from_edgelist(pnet_edgelist,
                             directed = F) # the edges are undirected in this network 

summary(pnet2)
```

There are many useful functions in `igraph` (those that begin with `graph_from_*`) for converting various kinds of datasets into networks. The `igraph` manual is a good place to start learning about these functions and their arguments to specify edge or node attributes of the network. Outside of `igraph` you will probably have to spend some time curating and wrangling your data so that it is in the right formats for `igraph`. 

## Visualizations 

Here are visualizations of the networks we will play around with today. 

```{r}
plot(pnet,
     vertex.frame.color = 'white',
     vertex.label.color = 'grey30',
     edge.color = 'black',
     vertex.label = gsub('[[:print:]]+;', '', V(pnet)$name), # regex to only print the word names and not transcriptions
     layout = layout_with_lgl,
     frame = TRUE,
     margin = c(0,0,0,0),
     main = 'A partial phonological network of English words')
```

```{r}
plot(snet,
     vertex.frame.color = 'white',
     vertex.label.color = 'grey30',
     edge.color = 'black',
     vertex.color = 'seagreen',
     vertex.size = 5,
     edge.arrow.size = 0.4,
     layout = layout_with_graphopt,
     frame = TRUE,
     margin = c(0,0,0,0),
     main = 'A partial semantic network')
```

## Disclaimers

Due to limited time, I assume that you have some familiarity with R and RStudio. A gentle introduction to R programming can be found here: https://psyr.djnavarro.net/

I also choose to not cover in detail a number of important topics such as how to convert your raw data into an `igraph` network object, network visualization in `igraph` (for a great introduction to this topic, see https://kateto.net/network-visualization), and other methods of quantifying the community structure of the network (i.e., meso-level). Some of these topics are briefly touched on in a different tutorial and you can find the materials here: https://vpf-netsci.netlify.app/part2-demo.html 

# Measuring the network 

Once we have a network representation, the tools of network science can be applied to analyze the networks in different ways. In this tutorial we focus on a *descriptive* analysis of the network and review various network measures that can be used to describe or quantify network structure at three different levels of the network: the micro-level (referring to the local structure and other properties of individual nodes), the meso-level (subgroups or clusters of nodes), and the macro-level (referring to the overall or global structure of the network). 

![](https://www.mdpi.com/education/education-10-00101/article_deploy/html/images/education-10-00101-g001.png)

## Micro-level (node-level)

Micro-level network measures provide you with information about specific nodes in the network. These are generally known as centrality measures in the network science literature. Centrality is the network scientist's way of quantifying the relative "importance" of a given node relative to other nodes in the network. There are [many](http://schochastics.net/sna/periodic.html) different definitions of what counts as "central", as you will see in the following subsections. There is no single "correct" or "best" metric - which metrics are most useful to you will depend on the nature of the system that you are modeling as well as the network behavior that you interested in. 

### Degree (unweighted edges)

The **degree** of node *i* refers to the number of edges or links connected to that node.

If your network has directed edges, *in-degree* refers to the number of edges that are going towards the target node, whereas *out-degree* refers to the number of edges that are going away from the target node. 

![](https://www.tlab.it/en/allegati/help_en_online/tlab_image/in_out_degree.jpg)

```{r}
# undirected network
degree(graph = pnet) # for all nodes in the network

degree(graph = snet, v = 'cheese') # for a specific node in the network 

# directed network
degree(graph = snet, v = 'cheese', mode = 'in') # in-degree = incoming edges
degree(graph = snet, v = 'cheese', mode = 'out') # out-degree = outgoing edges 
degree(graph = snet, v = 'cheese', mode = 'all') # in-degree + out-degree
```

### Strength (weighted edges)

The **strength** of node *i* refers to the sum of its adjacent edge *weights*. Only applicable to weighted networks. 

```{r}
strength(graph = snet) |> head(5)

# for directed networks
strength(graph = snet, v = 'age', mode = 'in') # in-degree = incoming edges
strength(graph = snet, v = 'age', mode = 'out') # out-degree = outgoing edges 
strength(graph = snet, v = 'age', mode = 'all') # in-degree + out-degree
```

<p style="color:blue;">*Relevant finding in psychology: De Deyne et al. (2019) found that strength (operationalized as association frequency) was superior to, or at least similar to, word frequency as a predictor of lexical decision and semantic decision performance.*</p>

### Local Clustering Coefficient (unweighted)

The **local clustering coefficient**, *C*, of node *i* measures the ratio of the actual number of edges existing among nodes directly connected to the target node *i* to the number of all possible edges among these nodes. 

*C* ranges from 0 to 1. When *C* = 0, none of the neighbors of a target node are neighbors of each other. When *C* = 1, every neighbor is also a neighbor of all the other neighbors of a target word.

You can think of the local clustering coefficient as providing a measure of the level of interconnectivity among the local neighborhood of the node. 

![](img/hcc-lcc.jpg)

*Both words have the same number of neighbors, but different local clustering coefficients.*

<p style="color:blue;">*Relevant finding in psychology: Chan & Vitevitch (2009) showed that words with higher clustering coefficients were responded to more slowly than words with lower clustering coefficient in spoken word recognition tasks (even after controling for the size of the phonological neighborhood.*</p>

```{r}
transitivity(graph = pnet, type = 'local', vids = 'spik;speak') |> round(3) # for a specific node in the network 

# if you do not specify the vids argument then you get a value for all nodes in the network, ordered by V(pnet)$name 
```

A couple of things to note:

1. It is important to specify `type = local` for local clustering coefficients, as compared to the global clustering coefficient of the entire graph (this is a [macro-level measure](#global-clustering-coefficient) that we will visit later) 

2. Many of these functions contain additional arguments for indicating whether to consider the directionality and weights of the edges. If your graph is undirected and unweighted, these are ignored by default. If your graph is directed and weighted, you can indicate whether to include or exclude this information for the computation of the network measure.

### Local Clustering Coefficient (weighted)

If you have a weighted network, you can compute local clustering coefficients using Barrat et al.'s (2004) generalization of transitivity to weighted networks by specifying `type = 'weighted'`. If your network is unweighted, the generalization will return the unweighted C (see example of 'speak' below). 

```{r}
# weighted network 
transitivity(graph = snet, type = 'local', vids = 'cheese') |> round(3) 

transitivity(graph = snet, type = 'weighted', vids = 'cheese') |> round(3) 

# unweighted network 
transitivity(graph = pnet, type = 'local', vids = 'spik;speak') |> round(3)

transitivity(graph = pnet, type = 'weighted', vids = 'spik;speak') |> round(3) # no difference with the previous
```

### Closeness Centrality 

Closeness centrality of node *i* is the inverse of the average of the length of the shortest path between node *i* and all other nodes in the network. If a node has high closeness centrality, it means that on average, it takes few steps to travel from that node to all other nodes in the network. If a node has low closeness centrality, it means that on average, it takes more steps to travel from that node to all other nodes in the network.

Closeness centrality is commonly viewed as an indicator of the *accessibility* of a node in the network from all other locations in the network. 

![](https://www.reliantsproject.com/wp-content/uploads/2020/06/reliants_keyconcepts-04.png)
*This is a famous network (Krackhardt's Kite) that nicely illustrates the differences between degree, closeness, and betweenness centrality.*

```{r}
# closeness centralities for directed networks, ignoring weights  
closeness(graph = snet, normalized = T, mode = 'all', weights = NA) |> head() # both in- and out-
closeness(graph = snet, normalized = T, mode = 'in', weights = NA) |> head() # only incoming edges considered 
closeness(graph = snet, normalized = T, mode = 'out', weights = NA) |> head() # only outgoing edges considered

# weights are considered by default if graph has a weight attribute 
closeness(graph = snet, normalized = T, mode = 'all') |> head()
```

Note that closeness centrality can only be meaningfully computed for connected graphs (so that a path exists between any pair of nodes). If there are distinct network components, this means that for some sets of node pairs, the path between them does not exist and closeness cannot be computed. Usually, network scientists focus their analysis on the largest connected component of the network and ignore the smaller connected components (viewed as outliers). 

It is typical to have `normalized = T` so that the values are normalized with respect to the size of the network. As usual, you can specify the `mode` and `weights` arguments accordingly if you have directed/weighted networks to get the corresponding versions of closeness centrality computed. However, caution is needed as the interpretation of `weights` in this context is to interpret them as **distances** rather than *connection strengths*: higher weights = longer distances (From `igraph` manual: "If the graph has a weight edge attribute, then this is used by default. Weights are used for calculating weighted shortest paths, so they are interpreted as distances."). It is highly recommended to read the manual carefully to understand the measures that are being computed. 

<p style="color:blue;">*Relevant finding in psychology: Siew (2018) showed that the closeness centrality of words in an orthographic similarity network influenced word naming and visual lexical decision performance differently. Words of higher closeness centralities were responded to more quickly in lexical decision, but were named more slowly.*</p>

### Betweenness Centrality 

Betweenness centrality is a measure of the degree to which nodes stand in between each other. A node with a high betweenness centrality is a node that is frequently found in the short paths of other pairs of nodes in the network. In contrast, a node with a low betweenness centrality is a node that is not usually found in the short paths of node pairs. Betweeenness can be viewed as an indicator if whether a node represents a "bottleneck" in the system. 

```{r}
# undirected, unweighted network 
betweenness(graph = pnet, normalized = T, weights = NA, directed = F) |> head()

# directed, weighted network 
betweenness(graph = snet, normalized = T, weights = NULL, directed = T) |> head() # use weight and direction
```

The same considerations (about connected graphs, additional arguments for weighted and directed graphs, normalization, interpretation of weights as distances) from the closeness centrality section applies to this section as well. 

### Page Rank Centrality

PageRank is a centrality measure developed by Google to rank webpages (the historic paper describing the algorithm can be viewed [here](http://infolab.stanford.edu/~backrub/google.html)). The general idea is that a random walker will traverse the network space and their paths are biased by the link connectivity structure of the network. The random walker restarts the walk after some time (simulating "boredom" of the surfer). The number of visits received by a node provides an indicator of its importance in the network. Intuitively, we expect that nodes have a high PageRank if there are many nodes that point to it, or if there are nodes that point to it that themselves have a high PageRank. 

```{r}
# undirected, unweighted network 
page_rank(graph = pnet, directed = F, weights = NA)$vector |> head()

# directed, weighted network 
page_rank(graph = snet, directed = T, weights = NULL)$vector |> head()
```

The `weights` and `directed` arguments can be adjusted depending on your graph type. It is important to note that the interpretation of edge weights here is that of "connection strength" (from `igraph` manual: "This function interprets edge weights as connection strengths. In the random surfer model, an edge with a larger weight is more likely to be selected by the surfer."). This is different from the "distance" interpretation of edge weights by closeness and betweenness. 

<p style="color:blue;">*Relevant finding in psychology: Griffiths et al. (2007) showed that Page Rank centralities of words in a word association network provided good predictions for the words generated by participants in a letter fluency task.*</p>

## Meso-level (community structure)

A common feature of many real-world networks is that they have **community structure**. Nodes are considered to be part of the same community if the density of connections among those nodes is relatively higher than the density of connections between nodes from different communities (Newman, 2006).

**Modularity, Q**, is a measure of the density of links inside communities in relation to the density of links between communities (Fortunato, 2010). Networks with higher Q are said to show strong evidence of community structure. 

![](img/karate-communities.png)

*Communities are depicted in different colors from another famous network: Zachary's Karate Club Network*

**How do network scientists "find" communities in networks?** 

Many community detection methods have been developed by network scientists to detect communities in networks. Each differs in their implementation, and reflects the creator's implicit definition of what is a community. In this tutorial we go through only one of these methods (Louvain) to demonstrate an example of community detection.

If you are interested to learn more about community detection, check out Fortunato (2010) who provided a comprehensive comparison of various community detection techniques. 

### Louvain method ("greedy, maximization method")

The core idea behind this method is that communities are essentially “mergers” of small communities (Blondel et al., 2008), reflecting the self-similar nature of complex networks.

1. Each node is assigned to one community such that there are as many communities as there are nodes. Then remove node *i* from its community and placing it in the community of the neighbor which yields the greatest gain in modularity. 
  - repeat for all nodes in the network

2. A new network is built where nodes are the *communities found in the previous phase*. Repeat Step 1. 
  - repeat Step 1 and 2 until it is not possible to further increase the value of Q

```{r}
set.seed(8)

# run the community detection algorithm 
results_louvain <- cluster_louvain(graph = pnet)

# overall results 
modularity(results_louvain)
sizes(results_louvain)

# specific community membership for each node 
cbind(
  results_louvain$names,
  results_louvain$membership
) 
```

Saving the community detection results as a `communities` object enables the use of special functions like `modularity()` and `sizes()` to obtain the modularity of the network and its community sizes. I have also included code that shows how to extract the community memberships of all nodes in the network for further analysis. This applies to the other community detection algorithms as well. 

<p style="color:blue;">*Relevant finding in psychology: Siew (2013) found robust community structure in the phonological network, and words in the same community tended to share similar lexical and phonological properties.*</p>

## Macro-level (network-level)

In this section, we will review network science measures that describe the overall or global structure of the entire network. You can think of these measures as providing a "bird's eye view" of your network, and they are useful for comparing different network representations. 

### Average Shortest Path Length 

**Average shortest path length** (ASPL) refers to the mean of the shortest possible path between all possible pairs of nodes in the network. (This loosely corresponds to the idea of "six degrees of separation" in social networks.)  

![](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse2.mm.bing.net%2Fth%3Fid%3DOIP.7M5pmG4w5Nnp610i1j7rLQHaFv%26pid%3DApi&f=1)

*Example depicting the shortest path between nodes 25 and 16.*

```{r}
# undirected, unweighted network 
average.path.length(graph = pnet) 

# an alternative function - both give the same result 
mean_distance(graph = pnet)

# directed, weighted network 
mean_distance(graph = snet, weights = NULL, directed = T) # consider weights and direction
mean_distance(graph = snet, weights = NULL, directed = F) # ignore direction 
mean_distance(graph = snet, weights = NA, directed = T) # ignore weights 
```

<p style="color:blue;">*Relevant finding in psychology: Siew (2018) showed that concept networks (constructed from concept maps generated by students) with larger average shortest path lengths were associated with higher quiz scores, after controlling for network size.*</p>

### Global Clustering Coefficient  

**Global clustering coefficient** refers to the number of closed triangles in the network relative to the number of possible triangles. It is a measure of overall level of *local* connectivity among nodes in the network. 

A simple way of thinking about this concept is that it is measuring the probability that each pair of "friends" of a given node are also friends with each other.

```{r}
transitivity(graph = pnet, type = 'global')
transitivity(graph = snet, type = 'global')
```

### Small World Index 

The term "small world" has a specific meaning in network science as compared to the layperson's. A network is considered to have small world characteristics if (i) its ASPL is *shorter* than that of a randomly generated network with the same number of nodes and edges, and (ii) its global C is *larger* than that of a randomly generated network with the same number of nodes and edges. There are various ways to compute a value that quantifies the "small worldness" of a network, although we do not cover them here (see Humphries and Gurney, 2008, for an example, and Neal, 2017, for a comparison of different methods).

The main take home message is that a small world network has high levels of local clustering (nodes whose neighbors are also neighbors of each other), but there also exists a number of shortcuts that drastically reduces the overall distances/path lengths between nodes. See below for an illustration of this idea. 

![](img/small-world.gif)

### Network Density

**Network density** refers to the ratio of the number of (existing) edges and the number of possible edges among nodes in the network. 

![](img/density.png)

*Simple example of networks with lower and higher network densities.*

```{r}
graph.density(graph = pnet)
graph.density(graph = snet)
```

### Network Diameter 

**Network diameter** refers to length of the longest shortest path between nodes in the network. Instead of getting the mean of all the shortest paths as you did in ASPL, what is the *maximum* length of those short paths? 

![](img/diameter.png)

*Simple example of networks with higher and lower network diameters*

```{r}
# undirected, unweighted graph 
diameter(graph = pnet)

# directed, weighted graph 
diameter(graph = snet, directed = T, weights = NULL) # consider weights and direction
diameter(graph = snet, directed = F, weights = NULL) # ignore direction 
diameter(graph = snet, directed = T, weights = NA) # ignore weights 
```

# Additional Resources 

Ognyanova, K. (2021) Network visualization with R. Retrieved from www.kateto.net/network-visualization. https://kateto.net/network-visualization

The official `igraph` manual (v.1.3.4). https://igraph.org/r/doc/ 

Gephi: A multi-platform, free to download GUI app for network analysis and visualization. https://gephi.org/

# References 

Barrat, A., Barthélemy, M., Pastor-Satorras, R., & Vespignani, A. (2004). The architecture of complex weighted networks. Proceedings of the National Academy of Sciences, 101(11), 3747–3752. https://doi.org/10.1073/pnas.0400087101

Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.

Chan, K. Y., & Vitevitch, M. S. (2009). The influence of the phonological neighborhood clustering coefficient on spoken word recognition. Journal of Experimental Psychology: Human Perception and Performance, 35(6), 1934–1949. https://doi.org/10.1037/a0016902

De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods, 51, 987–1006.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75-174.

Humphries, M. D., & Gurney, K. (2008). Network ‘small-world-ness’: A quantitative method for determining canonical network equivalence. PloS One, 3(4).

Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the Mind: Predicting Fluency With PageRank. Psychological Science, 18(12), 1069–1076. https://doi.org/10.1111/j.1467-9280.2007.02027.x

Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The Neighborhood Activation Model. Ear and Hearing, 19(1), 1–36.

Neal, Z. P. (2017). How small is it? Comparing indices of small worldliness. Network Science, 5(1), 30–44. https://doi.org/10.1017/nws.2017.5

Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582.   

Siew, C. S. Q. (2013). Community structure in the phonological network. Frontiers in Psychology, 4, 553.

Siew, C. S. Q. (2018). The orthographic similarity structure of English words: Insights from network science. Applied Network Science, 3(1), 13.

Siew, C. S. Q. (2018). Using network science to analyze concept maps of psychology undergraduates. Applied Cognitive Psychology.

Vitevitch, M. S. (2008). What can graph theory tell us about word learning and lexical retrieval? Journal of Speech, Language, and Hearing Research, 51(2), 408–422. https://doi.org/10.1044/1092-4388(2008/030)

Copyright © 2023 CSQ Siew. All rights reserved.

This work is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.