Density plot with ggplot2

Density Plot

A density plot is a representation of the distribution of a continuous variable. It is considered as the smoothed version of the histogram. Sometimes, we use both the density plot and histogram on the same graph to fully understand the distribution of a continuous variable.

In this tutorial, we will use the iris dataset from datasets package as shown below:

knitr::kable(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

This dataset gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively. It contains 5 variables and 150 observations. The Sepal.Length variable in this data is of continuous type for which the primary descriptive statistics can be given as:

summary(iris$Sepal.Length, digits = 3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.30    5.10    5.80    5.84    6.40    7.90

A density plot can be constructed by means of geom_density() calling with ggplot() function from ggplot2 package as:

library(ggplot2)
ggplot(iris, aes(x = Sepal.Length)) + geom_density()

To adjust the bandwidth (smoothness) of the density plot, we have to specify the adjust argument to geom_density() as:

ggplot(iris, aes(x = Sepal.Length)) + geom_density(adjust=0.5)

Similarly, to change the color of the outline of density plot, specify the color argument to geom_density() as:

ggplot(iris, aes(x = Sepal.Length)) + geom_density(adjust=0.5, color = "blue")

The inside color of the density plot can be changed by specifying the fill argument to geom_density() as:

ggplot(iris, aes(x = Sepal.Length)) + geom_density(adjust=0.5, color = "blue", fill = "pink")

To change the transparency of the inside color, we can use the alpha argument with geom_density() as:

ggplot(iris, aes(x = Sepal.Length)) + geom_density(adjust=0.5, color = "blue", fill = "pink", alpha = 0.5)

aplha takes on [0, 1].

Multiple density plots

If we want to draw the density plot for each category of the factor variable Species of iris dataset, we have to supply the color argument to the aes() of ggplot() function. e.g.,

ggplot(iris, aes(x = Sepal.Length, color = Species)) + geom_density()

Further, if we want to fill above density plots for each category of the factor variable Species with colors, we have to supply the fill argument to the aes() of ggplot() function as:

ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) + geom_density()

Now, to make the above density plots transparent use alpha argument with geom_density() function as:

ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) + geom_density(alpha = 0.5)

Multiple density plots using facets

We use facet_grid() function to split the plot into multiple panels as:

ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) + geom_density() + facet_grid(Species ~ .)

Stacked density plot

If you want to create a stacked density plot, you probably want to count (density * n) variable instead of the default density. Loses marginal density

ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_density(position = "stack")

We can also specify the different themes like theme_bw() for making our density plot in R.

ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) + geom_density(alpha = 0.5) + theme_bw()

ggplot2 package provides many themes to create the plots in R.

Pooling histogram and density plots

Sometimes, we want to plot the histogram with density plots together to fully grasp the distribution of a continuous variable. Now, we will check the distribution of Sepal.Length according type of species defined by the variable Species of iris dataset. These pooled plots can be drawn as:

ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) + geom_histogram(aes(y = ..density..), alpha = 0.5, position = "identity") + geom_density(alpha = 0.2)

This way we can make different types of density plots with the help of ggplot2 package in R.

Related