R Data Visualisation Example: Violin plot by group using ggplot2

Import library

library(ggplot2)

Generate Sample Data for Plotting purposes

# Sample data
# Sample data set
warpbreaks
##    breaks wool tension
## 1      26    A       L
## 2      30    A       L
## 3      54    A       L
## 4      25    A       L
## 5      70    A       L
## 6      52    A       L
## 7      51    A       L
## 8      26    A       L
## 9      67    A       L
## 10     18    A       M
## 11     21    A       M
## 12     29    A       M
## 13     17    A       M
## 14     12    A       M
## 15     18    A       M
## 16     35    A       M
## 17     30    A       M
## 18     36    A       M
## 19     36    A       H
## 20     21    A       H
## 21     24    A       H
## 22     18    A       H
## 23     10    A       H
## 24     43    A       H
## 25     28    A       H
## 26     15    A       H
## 27     26    A       H
## 28     27    B       L
## 29     14    B       L
## 30     29    B       L
## 31     19    B       L
## 32     29    B       L
## 33     31    B       L
## 34     41    B       L
## 35     20    B       L
## 36     44    B       L
## 37     42    B       M
## 38     26    B       M
## 39     19    B       M
## 40     16    B       M
## 41     39    B       M
## 42     28    B       M
## 43     21    B       M
## 44     39    B       M
## 45     29    B       M
## 46     20    B       H
## 47     21    B       H
## 48     24    B       H
## 49     17    B       H
## 50     13    B       H
## 51     15    B       H
## 52     15    B       H
## 53     16    B       H
## 54     28    B       H

Basic Violin plot by group using ggplot2

A violin plot by group can be created in ggplot passing the numerical (breaks) and the categorical (tension) variable to aes and using geom_violin.

ggplot(warpbreaks, aes(x = tension, y = breaks)) +
  geom_violin() 

Horizontal violin plot

If you want a horizontal violin plot instead of vertical you can pass the categorical variable (tension) to y or use coord_flip as in the example below.

ggplot(warpbreaks, aes(x = tension, y = breaks)) +
  geom_violin() +
  coord_flip() 

Avoid trimming the trails

By default, the trails of the violin are trimmed to the range of the data. To avoid trimming set trim = FALSE inside geom_violin.

ggplot(warpbreaks, aes(x = tension, y = breaks)) +
  geom_violin(trim = FALSE) 

Adding quantiles

The desired quantiles can be added passing a vector to the draw_quantiles argument, as in the example below.

ggplot(warpbreaks, aes(x = tension, y = breaks)) +
  geom_violin(trim = FALSE,
              draw_quantiles = c(0.25, 0.5, 0.75)) 

Adding box plots

You can also overlay box plots to the violin plots to show the median and the outliers. Recall to set a small width for the box plots.

ggplot(warpbreaks, aes(x = tension, y = breaks)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.07) 

Fill and border colors

Fill color by group

If you want to fill the violins by group pass the categorical variable to the fill argument of aes.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.07) 

Fill color by subgroup

If you have other categorical variable you can create subgroups and fill the areas based on these subgroups.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = wool)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.07, position = position_dodge(width = 0.9)) 

Color scale

The default colors can be changed. For instance, you can use the brewer palette as follows.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.07) +
  scale_fill_brewer() 

Custom colors

If you want to use your custom color palette you can use scale_fill_manual and input the colors to the values argument.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE) + 
  geom_boxplot(width = 0.07) +
  scale_fill_manual(values = c("#BCE4D8", "#49A4B9", "#2C5985")) 

Fill transparency

The fill transparency can be modified with the alpha argument of the geom_violin function.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE, alpha = 0.5) +
  geom_boxplot(width = 0.07) 

Border color

The color argument of geom_violin can be used to change the color of the borders.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE, color = "blue") +
  geom_boxplot(width = 0.07) 

Border color by group

However, if you want to set a border color based on the groups, you can pass the categorical variable to the color argument of the aes function.

ggplot(warpbreaks, aes(x = tension, y = breaks, color = tension)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.07) 

Custom border colors

Similarly to changing the fill colors, you can customize the border colors, but with scale_color_manual.

ggplot(warpbreaks, aes(x = tension, y = breaks, color = tension)) +
  geom_violin(trim = FALSE) + 
  geom_boxplot(width = 0.07) +
  scale_color_manual(values = c("#F4D166", "#EC6E1C", "#B71D3E")) +
  coord_flip() 

Legend customization

Legend title

The legend title displays the name of the categorical variable. To change this default title use the guides function as follows.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE) + 
  geom_boxplot(width = 0.07) +
  guides(fill = guide_legend(title = "Title")) 

Key labels

The legend key labels are the names of the groups. These labels can be changed with scale_fill_manual if you change the fill colors or with scale_fill_hue to only change the labels.

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE) + 
  geom_boxplot(width = 0.07) +
  scale_fill_hue(labels = c("G1", "G2", "G3")) 

Remove legend

Finally, if you want to remove the default legend you can set the legend position to “none” or add show.legend = FALSE to geom_violin and geom_boxplot (if you added it).

ggplot(warpbreaks, aes(x = tension, y = breaks, fill = tension)) +
  geom_violin(trim = FALSE) + 
  geom_boxplot(width = 0.07) + 
  theme(legend.position = "none")