In this tutorial, I’m going to introduce you to two of my favorite packages for working with and visualizing networks - tidygraph and ggraph, both developed by Thomas Lin Pederson.

These packages take igraph networks, and then use tools from the tidyverse to make it easier to manipulate and visualize them. An igraph network is a complicated object. tidygraph extends the tidy paradigm to networks by representing networks as two tables—a table of nodes and node attributes and a table of edges and edge attributes.

Loading packages

We’ll load all the packages we need

library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:stats':
## 
##     filter
library(ggraph)
## Loading required package: ggplot2
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::%--%()      masks igraph::%--%()
## ✖ dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## ✖ purrr::compose()       masks igraph::compose()
## ✖ tidyr::crossing()      masks igraph::crossing()
## ✖ dplyr::filter()        masks tidygraph::filter(), stats::filter()
## ✖ dplyr::lag()           masks stats::lag()
## ✖ purrr::simplify()      masks igraph::simplify()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
set_graph_style() # This sets the default style to the graph style

Getting to the data

Creating a tidygraph network

This tutorial assumes that you know how to create an igraph network. Once you’ve got an igraph network object, convert it to a tidygraph network with as_tbl_graph(), like so:

G <- erdos.renyi.game(50, .4)
G <- as_tbl_graph(G)

We can then look at the tidygraph object, and see the two dataframes.

G
## # A tbl_graph: 50 nodes and 508 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 50 × 0 (active)
## #
## # Edge Data: 508 × 2
##    from    to
##   <int> <int>
## 1     1     3
## 2     1     4
## 3     1     7
## # ℹ 505 more rows

Mutating a table

Because a network is really composed of two tibbles, we can perform many tidyverse/dplyr operations on them. In order to know which table to use, we have to use activate(nodes) or activate(edges).

For example, the code below activates the nodes table and then uses mutate to create a variable called degree.

(Note that the code throughout this tutorial uses “pipes”. Pipes (|>) let you express a sequence of operations, by taking the output of the previous operation and using it as the input of the next operation.)

create_notable('zachary') |>
  activate(nodes) |>
  mutate(degree = centrality_degree())
## # A tbl_graph: 34 nodes and 78 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 34 × 1 (active)
##    degree
##     <dbl>
##  1     16
##  2      9
##  3     10
##  4      6
##  5      3
##  6      4
##  7      4
##  8      4
##  9      5
## 10      2
## # ℹ 24 more rows
## #
## # Edge Data: 78 × 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1     3
## 3     1     4
## # ℹ 75 more rows

Because the networks are just stored as data frames, that means that we can export them as tibbles and then do things like use ggplot to graph attributes of a network. This code below creates an edge attribute called bw which is a measure of edge betweenness, and then makes a histogram of the distribution of bw.

create_notable('zachary') |>
  activate(edges) |>
  mutate(bw = centrality_edge_betweenness()) |>
  as_tibble() |>
  ggplot() +
  geom_histogram(aes(x=bw)) +
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plots

The companion package to tidygraph is ggraph. ggraph is a set of tools based on ggplot2. The key idea behind both ggraph and ggplot2 is that you can build a plot by adding layers according to a “grammar of graphics” that let you add to and change things about the plot.

ggraph includes tons of really cool types of plots but for this tutorial I am going to focus on standard plots that show nodes as circles and edges as lines. There are three key components that should be part of any of these plots:

Node and Edge Aesthetics

There are a lots of different “geoms” for displaying nodes and edges (full list here). We are going to focus on using the simplest - geom_node_point() and geom_edge_fan().

The primary way to gain understanding or make an argument through network plots is through changing the color, size, etc. of nodes and edges.

If you want to change things based on a value that changes, then you need to put it in a “mapping”. This is the first “argument” to the node or edge geom, and appears within aes(). Aesthetics that apply to all of the nodes or edges appear outside of the mapping.

For example, in this graph the geom_edge_fan has color and width set to .2 and 'lightblue', respectively. These apply to all of the edges.

On the other hand, the geom_node_point has color set to group. This means that the color should vary based on what the group variable is set to for each node.

create_notable('zachary') |>
  activate(nodes) |> 
  mutate(group = as.factor(group_infomap())) |> # Creates a `group` variable based on the infomap algorithm
  ggraph(layout = 'stress') +
  geom_edge_fan(width = .2, color = 'lightblue') + 
  geom_node_point(aes(color = group)) + 
  coord_fixed() + 
  theme_graph()

Colors

Often, we want to color things based on variables that already exist in our data. For these examples, let’s move to a new dataset. The following code loads in data from a Dutch school collected by Andrea Knecht and described here. I have cleaned it up a bit, using just Wave 2 from the data and changed it into CSV files - one for the nodes and one for the edges.

This code downloads these CSV files and creates a network from them called G. If we look at the node data, we can see that there are a lot of attributes about each student that we might want to visualize in a plot.

nodes = read_csv('https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_nodes.csv')
edges = read_csv('https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_edges.csv')

G = graph_from_data_frame(d=edges, v = nodes) |> as_tbl_graph()

G
## # A tbl_graph: 26 nodes and 203 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 26 × 7 (active)
##    name  delinquency alcohol_use sex     age ethnicity religion
##    <chr>       <dbl>       <dbl> <chr> <dbl>     <dbl>    <dbl>
##  1 1               2           4 F        12         1        2
##  2 2              NA           2 F        12         1        2
##  3 3               2           1 F        12         2        3
##  4 4               2           1 M        12         1        2
##  5 5               1           1 M        12         1        2
##  6 6               1           1 F        12         1       NA
##  7 7               2           3 F        12         1        2
##  8 8               1           1 F        13         1        2
##  9 9               2           3 F        12         1        2
## 10 10              2           2 F        12         1        1
## # ℹ 16 more rows
## #
## # Edge Data: 203 × 3
##    from    to type      
##   <int> <int> <chr>     
## 1     1     3 friendship
## 2     1    12 friendship
## 3     3     1 friendship
## # ℹ 200 more rows

For example, we may want to visualize alcohol use. This is how you would change the color of nodes based on alcohol use. The scale_color_viridis() at the bottom changes from the default color scale to the viridis pallette which is prettier and easier to read.

G |>
  ggraph(layout = 'stress') +
  geom_edge_fan(width = .5, color = 'gray') +
  geom_node_point(aes(color=alcohol_use), size = 3) +
  scale_color_viridis()