dplyr rename() – For Renaming Columns

In this post, we will learn about dplyr rename function.dplyr rename is used to modify dataframe column names or  tibble column names. dplyr rename comes from Tidyverse group of packages developed by Hadley Wickham. I have found that using dplyr rename, just like other dplyr functions, is the most intuitive and easiest.

 

dplyr rename

As a first step, let us install dplyr and hflights packages.Please go through this post to learn the installation process.

Post successful installation, load dplyr and hflights in R Studio console using library() function.

We will use tbl_df() function to generate a tibble called tbl from hflights.

The first step before using rename() is to know what are the existing column names.This is done using colnames().

colnames() in R

dplyr rename column

Lets use dplyr rename  to modify column names in a dataframe or a tibble.

We will  now try to modify only those column names from the tbl, where the names end with the string “Time”.

First, let us select those specific columns  and save it as tbl_times.

dplyr select()

Rename single column

Now , tbl_times contains four columns DepTime, ArrTime, ActualElapsedTime and  AirTime.

Let’s try to modify DepTime column name to DepartureTime by using r dplyr rename column.

apply dplyr rename() on single column of a dataframe

Verify the column names after applying the dplyr rename() function.

colnames after applying rename in R

Remember that unless you save the changes back to a variable , the changes made to a dataframe using dplyr operations doesn’t come into effect.

So, if you want the renamed column name to be applied to your tibble, you will need to save it back to a variable again.

verifying column names after applying dplyr rename

Hope this dplyr rename example is clear.

Using dplyr select() to rename a column:

We can use dplyr select to rename a dataframe column  as well.

But the main difference is that, dplyr select() keeps only the variable you specify; dplyr rename() keeps all variables of dataframe intact.

What I mean is , if my dataframe has col1, col2, col3 and col4, and I am modifying col1 to column1 using select, then only column1 will be present in the resulting dataframe.

If I use rename() , then column1, col2, col3, and col4 will be present in the resulting dataframe. Columns that aren’t mentioned in the rename() call are simply left untouched while using dplyr rename().

See an example here below. I have modified ArrTime to ArrivalTime, but tbl_times now contains ArrivalTime only ! Hence, it is better to use dplyr rename instead of dplyr select to modify column names.

rename column using dplyr select

dplyr rename columns

This is similar to the code for renaming single column that we had seen above, except that we use pairs of new and old column names now.

rename multiple columns with dplyr rename

Let’s see the code for dplyr rename multiple columns in action.

Imagine that you want to rename 100’s of columns at once.Using dplyr rename() is not a good option in that scenario.

This is where the three variants of dplyr rename() – namely , rename_all(), rename_if(), rename_at() comes in handy.

rename_at()

Use rename_at() to rename multiple columns at once.
In this example, we want to modify the following column names by replacing “Arr” with “Arrival”.

We are selecting the columns whose names start with “Arr” inside the vars function and then, we are using the str_replace function from stringr R package to replace “Arr” with “Arrival” inside the funs function.

The dot as the first argument inside the str_replace function is the placeholder to hold the columns returned by the vars function.

So, that means ArrTime and ArrDelay columns will be changed to ArrivalTime and ArrivalDelay.

Check out the result of applying rename_at here:-

rename_at example

rename_if()

Use rename_if() to change the names of dataframe columns according to a logical condition.

Below is an example of choosing the columns whose data types are numeric(such as integer and double ) and implementing str_replace function to alter them.

All the numerical columns , which contains the string “Num” , are modified post this operation. For example, FlightNum is changed to FlightNumber !

rename_if example

rename_all()

Use rename_all() to change the names of dataframe columns without any logical condition.

For example, consider that you would like to change column names, irrespective of it being  a numeric or not , and if they contain Num in the column name, you want to modify it to Number.

Post this operation, you can see that FlightNumber got changed to FlightNumberber and TailNum changed to TailNumber.

rename_all example

Rename columns with base R functions

Along with dplyr rename() , you can also rename columns of a dataframe using a logical vector or an index.

Let us now modify the column name “Month” of hflights to “month” using logical vector.

  1. Generate a logical expression  by  comparing the names vector to the target element, as shown in the second step below
  2. Use the logical expression as an index to assign the new column name to the relevant element of the names vector.

rename dataframe column using boolean index selection

Another approach to rename columns of a dataframe is by using the appropriate index on the names vector.

Let us now modify the column name “Distance” to “distance”. Since the column “Distance” has an index of 16, assign the new column name “distance” to the element of the names vector selected using the index.

rename column using index of dataframe

That’s it for now on dplyr rename(). We will learn about advanced topics of dplyr in my data science online course.

In case you like this article, please share it 🙂

Recent Posts

Menu