Bar chart with ggplot2

Bar chart

A bar chart is used to present categorical data with rectangular bars. In a bar chart, the height of the rectangular bars is proportional to the number of cases in each group. A bar chart can be plotted horizontally as well as vertically. In this tutorial, we will use the mpg dataset from ggplot2 package as shown below.

library(ggplot2)
knitr::kable(head(mpg))
manufacturer model displ year cyl trans drv cty hwy fl class
audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
audi a4 2.0 2008 4 auto(av) f 21 30 p compact
audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
audi a4 2.8 1999 6 manual(m5) f 18 26 p compact

This dataset entails fuel economy data from 1999 and 2008 for 38 popular models of car. It contains 11 variables and 234 observations. The class variable in this data is of character type which have 7 categories of the cars. These categories are:

unique(mpg$class)
## [1] "compact"    "midsize"    "suv"        "2seater"    "minivan"   
## [6] "pickup"     "subcompact"

A bar plot can be constructed by means of geom_bar() calling with ggplot() function from ggplot2 package. geom_bar() uses stat_count() by default. It means that it provides the count measure of the supplied variable to geom_bar(). Here, we have plotted class variable as vertical bar chart.

ggplot(mpg, aes(x = class)) + geom_bar()

For bar graph we always use a single categorical variable with ggplot() function.

To change the color of the bars we have to specify the fill argument to geom_bar() as:

ggplot(mpg, aes(x = class)) + geom_bar(fill = "blue")

Similarly, to change the color of the borders of each bar specify the color argument to geom_bar() as:

ggplot(mpg, aes(x = class)) + geom_bar(fill = "blue", color = "red")

Further, if we want to use a unique color for each bar according to the class variable we have to supply fill argument to the aes() of ggplot() function. e.g.,

ggplot(mpg, aes(x = class, fill = class)) + geom_bar() 

These vertical bar charts can also be made horizontally by specifying the coord_flip() after geom_bar() as:

ggplot(mpg, aes(x = class, fill = class)) + geom_bar() + coord_flip()

We can also specify the different themes like theme_classic() for making our plot in R.

ggplot(mpg, aes(x = class, fill = class)) + geom_bar(fill = "red") + theme_classic()

ggplot2 package provides many themes to create the plots in R.

The intensity of the color can also be changed by specifying the alpha argument to geom_bar() as:

ggplot(mpg, aes(x = class)) + geom_bar(fill = "red", alpha = 0.5) + theme_classic()

aplha takes on [0, 1].

Stacked bar chart

Suppose we want to count the number of type of car by its drive (front-wheel drive, rear wheel drive or 4wd). In this case, we will use stacked bar chart. These types of chart can be produced by specifying the fill argument with drv variable in aes().

ggplot(mpg, aes(x = class, fill = drv)) + geom_bar()

The stacked bar chart can also be presented in terms of percentages by specifying the position = 'fill' in geom_bar() function.

ggplot(mpg, aes(x = class, fill = drv)) + geom_bar(position = 'fill') + ylab("percentage")

Instead of stacked bar chart we can make the side by side bars for grouping variable drv by specifying the position = position_dodge() argument to geom_bar() function:

ggplot(mpg, aes(x = class, fill = drv)) + geom_bar(position = position_dodge())

Moreover, if you want to provide the gap among bars within a category, you need to specify the position = position_dodge2() argument to geom_bar() function:

ggplot(mpg, aes(x = class, fill = drv)) + geom_bar(position = position_dodge2())

Note

Sometimes we come across datasets which contain only summary measures like counts, percentages, etc. In this case, the bar chart can be plotted with the geom_col() as it uses stat_identity() instead of stat_count() function. Let us suppose that we have 3 apple, 5 mango and 7 orange. Now create a data set as per our information:

library(tibble)
data <- tibble(
    fruit = c("apple", "mango", "orange"),
    count = c(3, 5, 7)
)
head(data)
## # A tibble: 3 x 2
##   fruit  count
##   <chr>  <dbl>
## 1 apple      3
## 2 mango      5
## 3 orange     7

The bar plot for the summary measure (count) of fruit variable can be plotted with geom_col() function as:

ggplot(data, aes(x = fruit, y = count)) + geom_col(fill = "blue")

This way we can make different types of bar graphs with the help of ggplot2 package in R.

Related