R
No Comments

How to make a barplot using ggplot

ggplot barplot

In this article, we will learn how to create a barplot using ggplot2 in R.We will also create basic and grouped bar charts using geom_bar().

A Barplot or Bar graph is one of the most commonly used plots to visualize the relationship between a numerical and a categorical variable. Here each entity of the categoric variable is represented as a bar and the size or the height of the bar represents its numeric value.

A typical barplot looks as below.

ggplot barplot example

The above barplot shows the relationship between the categorical variable Group based on its numerical variable height. The bars are proportional to its height and the above plot can be used to derive various insights about the categorical variable group such as the group with highest or lowest height, how the height of group1 is compared to group3 etc.

To start with , let’s create a basic bar chart using ggplot.I have also included reproducible code samples for each type.

Basic barplot

The data:

To create a barplot using ggplot first install the ggplot2 library and create the dataset.

The above data set has two columns namely, name and value where value is numeric and name is considered as a categorical variable.

To create a barplot we will be using the geom_bar() of ggplot by specifying the aesthetics as aes(x=name, y=value) and also passing the data as input.

And the barplot looks as below.

Basic Barplot

To get a bar graph of counts, don’t map a variable to y, and use stat=”bin” (which is the default) instead of stat=”identity” as below,

And the barplot plotted against count looks as below.

barplot plotted against count

In the above barplot the bars are of equal height of 1 .This is because there is only one entry for each of the name values A, B, C, D and E.

The bars of a barplot can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart. We will see how to create a horizontal barplot later in this article below.

Now let’s see how to customize the above barplot by changing theme, colors, title, labels, barwidth etc.

Customizing barplot

Changing theme

Using ggplot-barplot it is possible to change the theme of a barplot to any of the below available themes.

themes of barplot

To change the theme of a barplot to a dark theme, use theme_dark() use the below code.

customize ggplot barplot theme

Barplot with labels

Adding labels to a bar graph will help you interpret the graph correctly. You can label each bar with the actual value by using the geom_text() and adjusting the position of the labels using vjust and size parameters.

For the above barplot lets create the labels outside the bars by setting vjust=-0.3, and size=5

Now let’s create labels inside the bars by setting vjust=1.6 and  size=5 as below.

Change barplot line colors by group

Let’s see how to change the line colors of a barplot by coloring them based on their group.

Line colors can be set based on group by specifying the color argument of ggplot as color=as.factor(name).

Changing barwidth

The width of the bars can be changed using the width argument of geom_bar().The width argument can take a range of values between 0 and 1, where 1 being full width. Larger values make the bars wider, while smaller values make the bars narrower and the default bar width is 0.9.

Let’s create a barplot with width=0.2 and see how it differs from the default barplot.

As we can see the changing the barwidth to 0.2 has created narrower bars compared to the default barplot.

Reordering bars

When creating a bar plot with categorical labels, ggplot by default orders the bars in alphabetical order of the categorical labels. In ggplot, bars can be reordered in ascending or descending order using the reorder().

To reorder the bars in ascending order of their height use the reorder() by passing the name and value as arguments.

To reorder them in descending order, use a minus sign on the value variable as below.

And the resulting barplot looks as below.

 

Choosing items to display

In ggplot it is possible to limit the categorical variables to be displayed using the limits parameter of scale_x_discrete().Let’s say we want to display the bars for only A and B, it can be done by passing just A and B to the limits argument of scale_x_discrete()as below,

Changing colors

For the below barplots we have used the mtcars dataset of R and have used the categorical variable gear to create bars based on their count.

The mtcars dataset looks as below,

Let’s see how to change the bar colors using the fill argument of geom_bar().

In the above plot color=”blue” represents the line color and fill=rgb(0.1,0.4,0.5,0.7) is the color filled inside each of the bars.

Colors using scale_fill_hue:

Now let’s use the scale_fill_hue() to fill colors based on categories.

As we can see in the above barplot each of the 3 categories has been filled with three different colors.

Colors using scale_fill_manual():

Using ggplot it is also possible to fill the bars with manual colors using the scale_fill_manual().Lets create a barplot by manually specifying the barcolors as red, green and blue.

Colors using scale_fill_brewer()

Using ggplot is also possible to create the above barplot using the scale_fill_brewer() by setting the palette. Let’s create a barplot with palette = “Set2”.

 

Apart from set2 the following palettes are available for use using RColorBrewer.

Change colors using greyscale:

Barcolors can be filled with greyscale values using the scale_fill_grey() as below

In the above barplot , the bars have been filled with darker to lighter grey scales based on their height.

Changing legend position

We can change the legend position of a barplot using legend.position argument of the theme() of ggplot.

Let’s create a barplot with legends placed on top of the plot.

Note: The allowed values for the arguments of legend.position are : “left”, “top”, “right”, “bottom”.

Horizontal barplot

In certain cases where the labels are long it often makes sense to turn your barplot horizontal. Using ggplot it can be done using the coord_flip().

The horizontal barplot for the df dataset looks as below.

Grouped, stacked and percent stacked barplot

If the data contains several groups of categories it can be displayed in a bar graph in one of two ways.

You can either decide to show the bars in groups (grouped barplot) or you can choose to have them stacked (stacked barplot). Let’s see more about these grouped and stacked barcharts below.

Lets first create a dataset with groups to explain the below barcharts.

The dataset looks as below.

The dataset has three variables with the numeric value (name), and 2 categorical variables for the group (name) and the subgroup levels(group).

Grouped barchart

In a grouped bar chart, for each categorical group there are two or more bars. These bars are color-coded to represent a particular grouping.

For example, a business owner with two stores might make a grouped bar chart with different colored bars to represent each store: the horizontal axis would show the months of the year and the vertical axis would show the revenue and the barplot as a whole can be used to visualize the revenue of the two stores for all the months of the year.

To create a grouped barplot just set the position=”dodge” in the geom_bar() function and map the categorical variable group to fill as below.

The above grouped barchart has created bars for the three groups for each of the name values A, B, C and D.

In a grouped barchart, there is no space between bars within each group by default. However, some space can be added between bars within a group, by making the width smaller and setting the value for position_dodge to be larger than width as below,

And the grouped barchart with space added between the bars looks as below,

Stacked barchart

A stacked barplot is very similar to the grouped barplot. However, a stacked bar chart stacks bars that represent different groups on top of each other. The height of the resulting bar shows the combined result of the groups. i.e., the subgroups are just displayed on top of each other, not beside as in a grouped barchart.

The only thing to change to get a stacked barchart is to switch the position argument to stack.

In the above barchart the groups are stacked on top of each other for each of the name values A, B, C and D.

Note that, stacked bar charts are not suited to data sets where some groups have negative values. In such cases, grouped bar chart are preferable.

Percent stacked barplot

In a percent stacked barplot the percentage of each subgroup is represented instead of count or y values. This allows to study the evolution of their proportion in the whole.

To create a percent stacked barplot just switch to position=”fill”.

As we can see in the above plot ,the bars are plotted against its percentage values. It is now visually easier to interpret the proportion of groups within each bar corresponding to the name variables using this barplot.

Conclusion

So, in this article we have discussed how to create a basic and grouped barplot using ggplot2 and also discussed its various customization options. Hope this article helped you get a better understanding about ggplot2 barplot.

Do let us know your comments and feedbacks about this article below.

Improve Your Data Science Skills Today!

Subscribe To Get Your Free Python For Data Science Hand Book

data-science-hand-book


You must be logged in to post a comment.
Improve Your Data Science Skills Today!

Subscribe To Get Your Free Python For Data Science Hand Book


data-science-hand-book

Arm yourself with the most practical data science knowledge available today.

KEEP LEARNING

Menu