R Data Visualisation Example: Histogram plot by group using ggplot2

Import library

library(ggplot2)

Generate Sample Data for Plotting purposes

# Sample data
set.seed(007)

# Data
x <- c(rnorm(200, mean = -2, 1.5),
       rnorm(200, mean = 0, sd = 1),
       rnorm(200, mean = 2, 1.5))
group <- c(rep("A", 200), rep("B", 200), rep("C", 200))
df <- data.frame(x, group)

head(df)
##           x group
## 1  1.430871     A
## 2 -3.795158     A
## 3 -3.041439     A
## 4 -2.618439     A
## 5 -3.456010     A
## 6 -3.420920     A

Basic Histogram plot using ggplot2

Default ggplot2 and base R histograms

The default histograms in ggplot2 and in base R are different, as ggplot2 uses 30 bins by default while base R hist function uses the Sturges method to calculate the number of bins. As you can see, the ggplot2 histograms tend to be too binned due to this default. You can change the bin width or the number of bins to the desired value.

# Default histogram ggplot2
ggplot(df, aes(x = x)) + 
  geom_histogram() 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The default histogram with hist function seems to be more appropriate, as the Sturges method is computed.

# Default histogram base R
hist(x) 

Sturges method

If you want to create a histogram in ggplot2 which uses the Sturges method you can calculate the breaks as follows and pass them to the breaks argument.

# Data
set.seed(3)
x <- rnorm(450)
df <- data.frame(x)

# Calculating the Sturges bins
breaks <- pretty(range(x),
                 n = nclass.Sturges(x),
                 min.n = 1)
df$breaks <- breaks

# Histogram with Sturges method
ggplot(df, aes(x = x)) + 
  geom_histogram(color = 1, fill = "white",
                 breaks = breaks) +
  ggtitle("Sturges method")