Visualising network communities in R

One of the most natural ways to visualise a social network is a network diagram which consists of a series of dots representing the entity in the network and lines representing the relationships between the entities. This type of visualisation looks as follows:

Drawing

We can encode community information to this type of plot by colouring the nodes according to the communitiy memberships.

As Mike Bostock showed, another interesting way to visualise a network is with the adjacency matrix. As a reminder the Adjacency matrix is defined as follows:

\begin{equation*} A_{ij} = \left\{ \begin{array}{lr} 1 & : \text{if }i\text{ connected to }j\\ 0 & : \text{otherwise} \end{array} \right. \end{equation*}

The usefulness of the adjacency matrix visualisation depends largely on the order of the rows and columns in the plot. By finding communities in the network, we can reorder the rows and columns of the adjacency matrix accordingly to produce a plot that looks as follows:

Drawing

In this post I show how to create the above plot using atomic plot functions in R. The coloured blocks down the diagonal of the plot represent the different communities in the network.

To get started, the first thing we need is some data in a matrix object. We load the same Les Misérables data as used for the visualisation here:

library(jsonlite)
library(igraph)

json_data <- fromJSON("https://bost.ocks.org/mike/miserables/miserables.json")

The fromJSON function reads the les miserables data into a list containing information about the nodes and edges in the graph. We extract two fields from the list, a vector of character names and the edge data

characters <- json_data[[1]]$name
edges <- json_data[[2]]

We can view these two objects:

> head(characters)
[1] "Myriel"          "Napoleon"        "Mlle.Baptistine" "Mme.Magloire"
[5] "CountessdeLo"    "Geborand"

> head(edges)
  source target value
1      1      0     1
2      2      0     8
3      3      0    10
4      3      2     6
5      4      0     1
6      5      0     1

The edges object contains information about which node (source column) is connected to which (target column) and the co-occurence number (value column). This a succint way to represent a matrix as we only have rows for the non-zero matrix elements. Representing a matrix in this way is known as Triplet representation.

Next, we convert each of the numeric columns of edges into character names by iterating through each row of edges and replacing the source and target numbers with the name of the character at that index of the characters array. We add one to the index to account for the fact that R indexes from one not zero.

edges_w_names <- t(apply(edges, MARGIN=1, function(x) {
  src_idx <- as.numeric(x["source"]+1)
  dest_idx <- as.numeric(x["target"]+1)
  c(characters[src_idx], characters[dest_idx], x["value"])
}))
edges_w_names <- as.data.frame(edges_w_names, stringsAsFactors=F)

colnames(edges_w_names) <- c("source", "target", "value")
edges_w_names$source <- as.character(edges_w_names$source)
edges_w_names$target <- as.character(edges_w_names$target)
edges_w_names$value <- as.numeric(edges_w_names$value)

edges_w_names is a three column data.frame that looks as follows:

> head(edges_w_names)
           source          target value
1        Napoleon          Myriel     1
2 Mlle.Baptistine          Myriel     8
3    Mme.Magloire          Myriel    10
4    Mme.Magloire Mlle.Baptistine     6
5    CountessdeLo          Myriel     1
6        Geborand          Myriel     1

Now we convert from the character name matrix in Triplet representation into a fully dense matrix object. In order to do this we make use of the graph.data.frame object from the igraph package. The get.adjacency function is builds a dense matrix from the igraph object. Because the edges in our network are un-directed the adjacency matrix will be a symmetric matrix

gdf <- graph.data.frame(edges_w_names, directed=F)
adj_mat <- get.adjacency(gdf, sparse=F, attr="value")

Now we have the igraph gdf object, we can use it to detect communities in our network. There are several clustering routines built into igraph. Here we use the cluster_infomap function which returns amongst other things an object containing membership attributes for each node in the graph

# cluster graph
members <- igraph::cluster_infomap(gdf)$membership
# how many communities are there
n_cluster <- length(unique(members))

# reorder accordign to cluster memberships
adj_mat <- adj_mat[order(members), order(members)]

Now we're ready to create the actual plot. I've added comments into the code below to try and better explain what's happening

# open a png handle
png("lesmis.png", width = 7000, height = 7000)

# set up plot parameters
par(bg = 'black',
    mar = c(50, 50, 50, 50),
    family = 'Roboto Light',
    col = 'white',
    col.main = 'white',
    ps = 300)

# create an empty plot (use axis=F so we can customise axis later)
plot(axes = F,
     0,
     type = 'n',
     xlim = c(1, ncol(adj_mat)),
     ylim = c(1, ncol(adj_mat)),
     main = "Les Misérables Co-occurance Matrix")

# define community colours
cluster_cols <- categorical_pal(n_cluster)

# iterate over each row and column of adjacency matrix and draw rectangles onto the plot
lapply(1:nrow(adj_mat), function(i) {
  lapply(1:ncol(adj_mat), function(j) {
    # each entry in matrix is an edge so if communities differ pick the smallest
    cluster_no <- min(sort(members)[j], sort(members)[i])

    # draw squares on the plot
    rect(
      xleft = i - .5,
      xright = i + .5,
      ytop = j + .5,
      ybottom = j - .5,
      pch = '+',
      cex = .5,
      col=ifelse(adj_mat[i,j]==0, 'black', cluster_cols[cluster_no])
    )})
})

# draw the x and y axis with the character names
axis(tick=T, side = 1, labels = colnames(adj_mat), at = 1:nrow(adj_mat), col = "white", col.ticks = "white", col.axis="white", las=2, cex.axis=0.2)
axis(tick=T, side = 2, labels = rownames(adj_mat), at = 1:ncol(adj_mat), col = "white", col.ticks = "white", col.axis="white", las=1, cex.axis=0.2)

# add a grid to the plot
abline(h=1:ncol(adj_mat), v=1:ncol(adj_mat), lty=3)

# close the plot device (i.e. write the png to disk)
dev.off()

We have shown how using the atomic plot functions in R we can create some nice looking visualisations of network data. The full code is available at this github repo