R Data Visualisation Example using ggplot2

Import library

library(ggplot2)
library(ggridges)

Generate Sample Data for Plotting purposes

# Sample data
df <- diamonds[1:100, c("color", "depth")]

Ridgeline plots (joy plots) with geom_density_ridges

The geom_density_ridges function from the ggridges package allows creating a ridgeline visualization. Given a numerical variable (depth) and a categorical variable (color) a density estimation of the data will be calculated and displayed for each group.

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges() # function 1
## Picking joint bandwidth of 0.678

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges2() # function 2 
## Picking joint bandwidth of 0.678

Cut the trailing tails

The rel_min_height argument of the function can be used to cut the trailing tails. You will need to fine tune the value depending on your data.

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges(rel_min_height = 0.005) 
## Picking joint bandwidth of 0.678

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges2(rel_min_height = 0.005) 
## Picking joint bandwidth of 0.678

Scale

In addition, the scale argument controls the scaling of the ridgelines relative to the spacing between them.

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges(scale = 3) 
## Picking joint bandwidth of 0.678

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges2(scale = 3) 
## Picking joint bandwidth of 0.678

Alternative stats

The stat argument can be used to select the statistical transformation to be used.

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges(stat = "binline", bins = 20, draw_baseline = FALSE) 

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges2(stat = "binline", bins = 20, draw_baseline = FALSE) 

Color customization

Fill color and transparency

The default gray color of the ridgelines can be changed with the fill argument of the geom_density_ridges function. Note that you can also specify a level of transparency with alpha.

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges(fill = "lightblue", alpha = 0.5) 
## Picking joint bandwidth of 0.678

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges2(fill = "lightblue", alpha = 0.5) 
## Picking joint bandwidth of 0.678

Border color

The color argument of the function controls the color of the lines. As in other plots you can also change the line type and the width of the lines.

ggplot(df, aes(x = depth, y = color)) +
  geom_density_ridges2(fill = "white",
                      color = 4,
                      linetype = 1,
                      lwd = 0.5) 
## Picking joint bandwidth of 0.678

Color based on group

You can also fill the densities based on the categorical variable, passing it to the fill argument of aes. The color palette can be changed with scale_fill_manual, for instance.

ggplot(df, aes(x = depth, y = color, fill = color)) +
  geom_density_ridges() 
## Picking joint bandwidth of 0.678

Cyclical color scales

The scale_fill_cyclical and scale_color_cyclical functions can be used to add cyclical fill and border colors to the density estimations.

ggplot(df, aes(x = depth, y = color, fill = color, color = color)) +
  geom_density_ridges() +
  scale_fill_cyclical(name = "Cycle", guide = "legend",
                      values = c("#99E6FF", "#4CA6FF")) +
  scale_color_cyclical(name = "Cycle", guide = "legend",
                       values = c(1, 4)) 
## Picking joint bandwidth of 0.678

Mapping the tails probabilities onto color

Similarly, using stat(ecdf) it is possible to add a gradient to the densities displaying the tail probabilities.

ggplot(df, aes(depth, y = color,
               fill = 0.5 - abs(0.5 - stat(ecdf)))) +
  stat_density_ridges(geom = "density_ridges_gradient", calc_ecdf = TRUE) +
  scale_fill_gradient(low = "white", high = "#87CEFF",
                      name = "Tail prob.") 
## Picking joint bandwidth of 0.678

Highlight the tails of the distribution

The same approach described above can be used to highlight the tails of the distributions.

ggplot(df, aes(x = depth, y = color, fill = stat(quantile))) +
  stat_density_ridges(quantile_lines = TRUE,
                      calc_ecdf = TRUE,
                      geom = "density_ridges_gradient",
                      quantiles = c(0.05, 0.95)) +
  scale_fill_manual(name = "Prob.", values = c("#E2FFF2", "white", "#B0E0E6"),
                    labels = c("(0, 5%]", "(5%, 95%]", "(95%, 1]")) 
## Picking joint bandwidth of 0.678