#this is an example code script
plot(1:10,1:10)
title("Example plot in a code chunk")
E2M2 2022: Network Tutorial
This is a Quarto document. It allows us to combine background text, code, and its output in one document. It can be “rendered” into an HTML file that can be read in any browser. All code chunks look like the following below, and can be run in RStudio just like a line in a standard .R
script:
A “code-only” version of this tutorial is available in the network-tutorial.R
script.
Background on the tutorial
This is a tutorial that was developed as part of the E2M2 workshop held in December 2022. It is based on code developed by Fidisoa Rasambainarivo, further developed by Michelle Evans. If you have any questions, please send an email to Michelle at mv.evans.phd@gmail.com.
The tutorial follows an example scenario when network analysis may be needed by public health authorities. In this example, an outbreak of an unknown parasite community has occured near the great Republic of E2M2. Public health officials want to evaluate the risks of disease spreading in a small community. They have collected interaction data from a group of 79 scientists from 3 institutions. This dataset records whether each person spent time with another person during the past 3 days.
Packages we will need
Here are the packages we’ll need for this tutorial. I also choose to set some base configurations that I like, such as never reading in strings as factors and setting a limit for the number of digits to include when printing to the console.
options(stringsAsFactors = F, scipen = 999, digits = 4)
#networks
library(igraph)
library(asnipe)
#data manipulation
library(tidyr)
library(dplyr)
Loading the datasets
Network Data
First, we will explore the dataset to get a feel for its structure and format.
<- read.csv("edgelist-e2m2.csv")
edgelist head(edgelist)
Node1 Node2 Link
1 BET.14.01 BET.14.01 0
2 BET.14.01 BET.14.02 0
3 BET.14.01 BET.14.03 0
4 BET.14.01 BET.14.04 0
5 BET.14.01 BET.15.01 0
6 BET.14.01 BET.15.02 0
str(edgelist)
'data.frame': 6241 obs. of 3 variables:
$ Node1: chr "BET.14.01" "BET.14.01" "BET.14.01" "BET.14.01" ...
$ Node2: chr "BET.14.01" "BET.14.02" "BET.14.03" "BET.14.04" ...
$ Link : int 0 0 0 0 0 0 0 0 0 0 ...
In its current structure, each potential pair is represented by row in the dataframe. Whether or not that pair of nodes is connected is described by the column Link
, with a 0
corresponding to no social contact over the prior 3 days and a 1
indicating contact.
We want to “widen” this dataset so that it becomes a matrix where each row or column represents an individual node and the value of the cell for column i
and row j
is the value of the edge between individual i
and j
.
<- pivot_wider(edgelist, names_from = Node2, values_from = Link)
network #move name of node1 to the rowname and turn into a matrix
<- select(network, -Node1) %>%
network.mat as.matrix()
rownames(network.mat) <- network$Node1
#set diagonal (contact with oneself) to 0
diag(network.mat) <- 0
#investigate small portion of this matrix to see what it looks like
1:6,1:6] network.mat[
BET.14.01 BET.14.02 BET.14.03 BET.14.04 BET.15.01 BET.15.02
BET.14.01 0 0 0 0 0 0
BET.14.02 0 0 0 0 0 0
BET.14.03 0 0 0 0 0 0
BET.14.04 0 0 0 0 0 0
BET.15.01 0 0 0 0 0 0
BET.15.02 0 0 0 0 0 0
Metadata
The public health officials also have information, or metadata, about each citizen (node). This includes their sex, institution, and what their superpower is.
<- read.csv("nodeinfo_all.csv",header = T)
nodeinfo str(nodeinfo)
'data.frame': 79 obs. of 4 variables:
$ CitizenID : chr "BET.14.01" "BET.14.02" "BET.14.03" "BET.14.04" ...
$ Institution: chr "student" "student" "student" "student" ...
$ Sex : chr "F" "M" "F" "F" ...
$ superpower : chr "Mogul" "Mogul" "Mogul" "Wizard" ...
head(nodeinfo)
CitizenID Institution Sex superpower
1 BET.14.01 student F Mogul
2 BET.14.02 student M Mogul
3 BET.14.03 student F Mogul
4 BET.14.04 student F Wizard
5 BET.15.01 student F Mogul
6 BET.15.02 student M Mogul
Visualizing the network
If we want to visualize the network as series of nodes and edges, we need to turn it into a graph
object using the igraph
package. It can then be plotted automatically via the plot
function
<- graph.adjacency(network.mat, mode = "undirected", diag = FALSE, weighted = TRUE)
net plot(net)
We can use the metadata about the nodes to make this graph more informative by assigning it to the nodes/vertices defined as V(net)
:
#first check that the order of nodes in the nodeinfo matches that in the network
$CitizenID == V(net)$name #all should be TRUE nodeinfo
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE
#if it is not true, then you need to reorder the nodeinfo based on order of names in network
<- nodeinfo[match(V(net)$name,nodeinfo$CitizenID),]
nodeinfo
# shape is dependent on sex
V(net)$shape <- ifelse(nodeinfo$Sex == "F", "circle", "square")
#color is dependent on institution
#create a named color palette
<- c("instructor" = "blue", "mentor" = "yellow", "student" = "red")
color.pal
#assign color by matching institution to this palette
V(net)$color = color.pal[match(nodeinfo$Institution, names(color.pal))]
#plot with legend
<- layout_nicely(net, dim=2) #uses a specific algorithm to draw the network
fr plot(net,
vertex.label=NA,
vertex.size=4,
layout=fr,
edge.lty=1,edge.width=1,
margin=c(-0.1,-0.1,0,0))
legend(x=0, y=-0.9, c("Male","Female"),
pch=c(0,1), pt.cex=1.5, cex=.8, bty="n", ncol=1,y.intersp = 1.2)
legend(x=-0.9, y=-0.9, c("student", "mentor", "instructor"),
pch=16, col=c("red","yellow","blue"), bg=c("red","yellow","blue"), pt.cex=1.5, cex=.8, bty="n", ncol=1, y.intersp= 1.2)
It is also possible to plot social networks using ggplot
via the ggraphs
package. For more information, refer to the ggplot2 book.
Calculating network characteristics
There are many existing functions in the igraph
package for calculating characteristics of individual nodes and and at the level of the network
Node characteristics
First, we will calculate two characteristics for each node : degree and betweenness. We can add these columns to our dataframe of node metadata and also visualize these characteristics on the network
$deg <- degree(net)
nodeinfo$bet <- betweenness(net, directed=F)
nodeinfohead(nodeinfo)
CitizenID Institution Sex superpower deg bet
1 BET.14.01 student F Mogul 3 29.123
2 BET.14.02 student M Mogul 2 1.727
3 BET.14.03 student F Mogul 1 0.000
4 BET.14.04 student F Wizard 6 122.116
5 BET.15.01 student F Mogul 3 10.862
6 BET.15.02 student M Mogul 4 18.817
# add the degree as a characteristic of the node to the network
V(net)$size <- nodeinfo$deg/2
plot(net,
vertex.label.cex=0.5,
vertex.label.dist=0,
layout=fr,
edge.lty=1, edge.width=1,
margin=c(-0.1,-0.1,0,0))
legend(x=0, y=-0.9, c("Male","Female"),
pch=c(0,1), pt.cex=1.5, cex=.8, bty="n", ncol=1,y.intersp = 1.2)
legend(x=-0.9, y=-0.9, c("student", "mentor", "instructor"),
pch=16, col=c("red","yellow","blue"), bg=c("red","yellow","blue"), pt.cex=1.5, cex=.8, bty="n", ncol=1, y.intersp= 1.2)
Network-level characteristics
Mean Distance: What is the average number of steps between any two pairs of individuals?
mean_distance(net, directed=FALSE)
[1] 2.823
Edge Density: What is the proportion of realized links compared to all possible links? (How dense is the network?)
edge_density(net)
[1] 0.0727
Transitivity: What is the probability that the adjacent vertices of a focal vertex are also connected? (I.e. are my friends also friends with each other?)
transitivity(net)
[1] 0.001388
Multiple Regression Quadratic Assignment Procedures (MRQAP)
The public health officials had the following question:
Are individuals of the same institution or sex more likely to interact with each other than predicted by chance?
We can answer this question via MRQAP from the asnipe
package. This tests whether y (interaction between two nodes) is related to x1 (sex similarity) while controlling for x2 (group belonging) and vice versa.
Create matrices of dependent variables
First, we need to create matrices that describe an individual’s membership of our groups (sex and institution):
<- select(nodeinfo, ID = CitizenID, Group = Sex)
sex.info <- get_group_by_individual(sex.info, data_format = "individuals")
gbs <- get_network(gbs, data_format = "GBI") net.sx
Let’s compare how the information is reported in each of our datasets to better understand what the above code does:
1:6,] sex.info[
ID Group
1 BET.14.01 F
2 BET.14.02 M
3 BET.14.03 F
4 BET.14.04 F
5 BET.15.01 F
6 BET.15.02 M
1:6,1:6] net.sx[
BET.14.01 BET.14.02 BET.14.03 BET.14.04 BET.15.01 BET.15.02
BET.14.01 0 0 1 1 1 0
BET.14.02 0 0 0 0 0 1
BET.14.03 1 0 0 1 1 0
BET.14.04 1 0 1 0 1 0
BET.15.01 1 0 1 1 0 0
BET.15.02 0 1 0 0 0 0
It has created a new matrix of connections between individuals in row i
and column j
, where a value of 1
signifies that individuals i
and j
are in the same group.
<- select(nodeinfo, ID = CitizenID, Group = Institution)
inst.info <- get_group_by_individual(inst.info, data_format = "individuals")
gbs <- get_network(gbs, data_format = "GBI") net.inst
Statistical Model
We can then use these matrices as dependent variables in our statistical model to explain the similarity in our contact matrix as a function of shared group membership. We have set the number of randomisations low (99) to speed up the model, but if you were running this analysis for real you would want to use 999.
<- mrqap.dsp(network.mat ~ net.sx + net.inst,
net.model intercept = TRUE,
directed = "undirected",
diagonal = F, test.statistic = "beta",
randomisations = 99)
You can call the model object to look at its summary
net.model
MRQAP with Double-Semi-Partialing (DSP)
Formula: network.mat ~ net.sx + net.inst
Coefficients:
Estimate P(β>=r) P(β<=r) P(|β|<=|r|)
intercept 0.021732 1.0000 0.0000 0.0000
net.sx -0.001244 0.3131 0.6869 0.6768
net.inst -0.021127 0.0000 1.0000 0.0000
Residual standard error: 0.1099 on 3078 degrees of freedom
F-statistic: 13.85 on 2 and 3078 degrees of freedom, p-value: 1.028e-06
Multiple R-squared: 0.008919 Adjusted R-squared: 0.008275
AIC: -839.2
It shows the overall estimate of the coefficent (Estimate
) as well as a comparison of that estimate to the probability that a similar pattern could be found randomly.
The estimate for the coefficient of the effect of shared sex is very small and doesn’t have a high probability of being less than or greater than random.
The estimate for the coefficient of the effect of a shared institute has a high probability of being less than random, suggesting that being of the same institute means individuals are less likely to interact. However, note that this coefficient is small, and the overall R-squared of the model very low, so it doesn’t seem to have a strong effect.
As a sanity check, we can make a new network where all of the mentors are connected and see how this changes our statistical output.
<- nodeinfo$Institution=="mentor"
ind.mentor <- network.mat
net2 <- 1
net2[ind.mentor, ind.mentor]
<- mrqap.dsp(net2 ~ net.sx + net.inst,
net.model2 intercept = TRUE,
directed = "undirected",
diagonal = F, test.statistic = "beta",
randomisations = 99)
net.model2
MRQAP with Double-Semi-Partialing (DSP)
Formula: net2 ~ net.sx + net.inst
Coefficients:
Estimate P(β>=r) P(β<=r) P(|β|<=|r|)
intercept 0.0213387 0.7778 0.2222 0.2222
net.sx -0.0004561 0.4545 0.5455 0.9192
net.inst 0.4706864 1.0000 0.0000 0.0000
Residual standard error: 0.3407 on 3078 degrees of freedom
F-statistic: 714.1 on 2 and 3078 degrees of freedom, p-value: 0
Multiple R-squared: 0.3169 Adjusted R-squared: 0.3165
AIC: -839.2
In this new dataset, we can see that being in the same institution greatly increases a dyad’s probality of interacting, given by the much higher coefficient and R-squared.
Simulations on networks
The igraph
package comes with a function that runs simulations of an SIR model on a network graph. You can change the values of beta and gamma to see how that changes the epidemic dynamics on the network.
Calling plot
will automatically plot all of the simulations, plus a median line and confidence intervals.
<- sir(net, beta = 0.2, gamma = 0.2 , no.sim = 1000)
net.sims
#number of susceptible
plot(net.sims, comp = "NS")
#number of infected
plot(net.sims, comp = "NI")
#number of recovered
plot(net.sims, comp = "NR")