Intro to Python for Data Science

The aim of this post is to provide a quick intro to Python for Data Science. Python is a general-purpose programming language and is not specific to data science. Let us start learning the basic concepts in Python needed to perform tasks for any typical data science project.

Operators

An operator in a programming language is a symbol that tells the compiler or interpreter to perform a specific mathematical, relational or logical operation and produce the final result.

Python has multiple operators that allow you to express calculations between variables.

Let’s discuss on Arithmetic operators first.

Below is a list of the main arithmetic operators in Python.

  • Addition +
  • Subtraction –
  • Multiplication *
  • Division /
  • Modulo %
  • Exponent(or)Power **

Let us try some simple expressions that use each of these operators, and see the result of each expression when I run it in the Python console.

python arithmetic operators

When we have multiple arithmetic operators in the same expression, how does the Python interpreter process it?

The Python interpreter needs to determine in which order the calculations will be performed in. The following are 2 different ways that someone may try to calculate an average (but only one is correct):

Hope you remember PEMDAS from your High School mathematics.

Python uses the PEMDAS rule to determine the specific priority that expressions have. It uses this rule to split up the expression into a specific order of subexpressions and then evaluates each one while evaluating an expression. Here’s the ordering:

  • Parentheses -> Exponent -> Multiplication or Division -> Addition or Subtraction

Variables

Variables are used as placeholders. They are used to do further processing within your program. To declare a variable, we need to use the assignment operator =.

The usual convention is to create a variable with lower case and include an underscore if you want to have multiple words in your variable name.

python variable declaration

Please note that we should not use reserved words as variable names. Though there will not be a syntax violation, it is recommended not to use reserved words as variable names.

Datatypes

Let us now briefly discuss the data types in this Python data science tutorial.

  • Integer
  • Float
  • Boolean
  • String

Integer and Float are numerical data types. A float data type is used to represent fractional or decimal values, like 20.35. A Boolean data type can have only two values, True or False.A string data type can be created using single quotes(”), double quotes(“”), or triple quotes(“”” “””). Among these three representations, double quotes are commonly used to create strings. Triple quotes are used when we need to create multi-line strings, like below.

 

multiline strings in python

Note that Python has built-in type inference, which means the way in which you enter a value tells Python what data type it is. We don’t need to specify data types like in Scala. Python will use the data type to determine how the value should be handled. For example, Python allows integer variables to be divided, but not string variables.

Python has a data type called Boolean that helps express conditional logic. There are only two Boolean values: True and False. Because they’re words, Boolean values may look like strings, but they’re an entirely separate class. For example, string operations like concatenation won’t work with Booleans.

The following code example assigns True to t and False to f:

Often while working with Python in Data Science projects, we will deal with complex datasets.To process them, we will need to use compound data types – namely lists, tuples, and dictionaries.

lists

Any Python tutorials for beginners would not be complete without discussing lists. A list is an ordered collection of data. It is similar to arrays in other programming languages. But note that there is an important difference. In a list, you can store multiple data types. So it is like a container that can hold multiple values of different data types.

To create an empty list, assign a pair of empty brackets [] to a variable. You may also create a list with initialization, as shown in the code snippet below:

python list

The Python list can store multiple data types. I have created a list containing a string and an int, in the below example.

python list creation example

Access elements of a list

Now that we know how to create a list and add values to it, let’s learn how to access and work with the values in a list we have created.Each value in the list has an index, or position, associated with it.A list starts at index 0 and goes till one less than the length of the list.If you have a list with 5 values, for example, the indexes will range from 0 to 4.

To access the first element in a list, we use the index value 0. The second element is accessed with index value 1, the third element with index value 2, and so on. To return the value that has a given index, pass the integer for the index into bracket notation. In the following code, we access the first, second, and third elements in the list countries.

We assign each of the accessed values to new variables:

The Python interpreter expects that the bracketed integer value will be within the list’s range of indexes.

Passing in a non-integer value or an integer value outside of the range of indexes (e.g. index 7 for a list only containing 5 elements) will result in an error.

We can access elements of a list using negative index also.

listname[-1] will return the last element of a list, listname[-2] will return the last but one element and so on…

Slicing

If we have a list containing thousands of values and want to retrieve the ones between index 10 and 500, this would be a lot of work with what we know so far.

lists have a feature called slicing that allows you to return all of the values between a starting index and an ending index.

When you slice a list, you return a new list containing just the values you’re interested in.

The value at the starting index and all of the values in between will be returned. The value at the ending index will not.

To slice a list, pass the starting and ending index positions into the brackets as integer values, separated by a colon :.

In the following code, we use the slice 2:4 to return a new list containing the values at indices 2 and 3:

List methods

append()

Use append method to add values to a list.

Then the countries list will look like below:-

Suppose I have declared two lists

Note that list1.append(list2) will modify list1 to be a nested list containing the elements [1,2,3,4,5,[6,7,8,9]].

extend()

If we want to add all elements of a list to another list, use list.extend() method.

list3.extend(list4) will modify list3 to have the elements [1,2,3,4,5,6,7,8,9]

len()

To find the number of elements in a list, use the len() function.

membership operator

Python has the in statement, which allows us to check whether a list contains a specific element.

The in statement checks whether certain value occurs within a list, and returns True if it does. If not, the statement returns False.

We can also use the in statement to check whether a key occurs in a dictionary.

tuples

A python tuple can be defined by enclosing values in small brackets. Its elements cannot be changed. But note that the elements of a list can be changed.

python tuples

Dictionaries

A Python dictionary is a key value data store. It is similar to a list in some ways. A dictionary has indexes, but the indexes are not sequential numbers. In a dictionary, we can create our own indexes with values of any data type, including strings.

To create an empty dictionary, assign a pair of curly brackets {} to a variable:

To add values to an existing dictionary, we specify the index to the left of the equals sign, and the value it should have on the right side. We use square brackets ([) to specify the index.

We call the index and value as key/value pairs.To look up the value for a key, we would use

This would return the value 105803 corresponding to the key “Luxembourg”

python dictionary creation example

A dictionary key can be a string, integer, or float.

We can also create a dictionary and add elements to it in a single step.We do this by entering the dictionary key, then a colon (:) , then the value. We separate each key/value pair with a comma

We can modify the value we’ve associated with a key:

python dictionary modification example

Unlike lists, dictionaries have no inherent order to the values. Dictionaries are useful whenever we want the key to be something unique that we care about.

The type() function

Use the type function to know the data type of a variable.

Let’s declare an integer variable and look at its type.

python type function on integer

Now let’s invoke type function on a string variable.

python type function on string

This will return the string class ‘str’.If we see the data type for a Boolean variable, we’ll see class ‘bool’, shorthand for Boolean.

python type function on Boolean

Type conversions

We can convert between different data types using these type conversion functions in Python.

  • int(): convert to integer value
  • float(): convert to float value
  • str(): convert to string value
python datatype conversion example

Python comments

You can organize your Python code by inserting comments. Comments are notes that help people understand the code.

The Python interpreter recognizes comments and treats them as plain text and won’t attempt to execute them along with the rest of the code.

These are the two main types of comments you can add to your code:

  • inline comment
  • single-line comment

An inline comment is useful whenever you want to add more detail to, a specific statement.

To add an inline comment, at the end of a statement, start with the hash character (#) and then add your comment.

To add a single-line comment, start with line with a hash (#) , write the comment and end the comment with a carriage return.

A single-line comment spans the full line and is useful when you want to separate your code into sections.

While you don’t need to add a space after the hash character (#), this is considered a good style and makes your comments cleaner and easier to read.

The concepts discussed until now provided a beginner friendly intro to python. Please practice all of these concepts using Jupyter notebook

File Handling

In Data Science, datasets are often represented in the form of files that we need to work with.So a common task for a data scientist is to work with files using Python.To open a file , use the open() function.

This function accepts two different arguments (inputs) in the parentheses, always in the following order:

  • the name of the file (as a string)
  • the mode of working with the file (as a string)

For example, to open a file named “file1.txt” in read mode, we write the following:

The open() function returns a File object fo.This object allows us to call methods specific to the File class.Use the read() function to read the contents of  file1.txt and assign that object to a variable called data.The content of file1.txt will be stored in the variable data in the form of a string.

Strings

An intro to learning Python would be incomplete without knowing about Strings and String Manipulation. Let us learn some of the frequently used methods in String class.

split()

We can use the split() method to turn a string object into a list of strings.The split() method takes a string input corresponding to the delimiter or separator.This delimiter determines how the string is split into elements in a list.

Note that if you don’t use any delimiter, then split() method will consider a single space as the default delimiter.

python split without delimiter

upper()

The upper() method will generate a new string containing all the letters of the original string converted into  upper case.Since strings are immutable, the original string will remain as is.

python upper method

lower()

The lower() method will generate a new string containing all the letters of the original string converted into lower case.

join()

The join() method takes a list/tuple/string to join with another string.A simple example is

This will generate a string ‘def’  from the list [‘d’,’e’,’f].

So the join method can be used to quickly convert a list of characters to a string.

Note that the string to the left of the join method acts as a separator while generating the output.  So, if you use

it will generate the string ‘d,e,f’

python join method

replace()

To replace specific parts of a string, we use the replace() method.

python replace method

format()

An alternative to using replace() would be to use the format() method.
format() is a built-in string method.

We can specify the location of what want to replace.

python format method

0 corresponds to the first argument within the format method.

To denote the location of an argument, we use flower brackets {} around the argument name.

Instead of using 0 and 1 , we can specify a parameter within format method.

A parameter allows us to replace any occurrence of that value within our string.See an example below.

python format method with parameters

string concatenation

We can concatenate two strings using the + operator.

Loops

Loops are used in Python to do a repetitive operation. For example, if you want to add 1 to each element of a list, the Python language allows you to do this with a for loop.

for loop

We can break the for loop down into two main components: the for loop itself, and the loop body that contains the code we want to run during each iteration.

Syntax – the words for and in need to be included in the statement
Iterator variable – the variable name you decide to use to refer to each element in the list
Sequence – the variable you want to iterate over
Colon – loop statements must end with a colon (:)

Loop Body

Indentation – every line of code within the loop should be indented four spaces.
Logic – the actual code we want to execute for each element.

We update the iterator variable in each iteration of the loop.

Note that we can have nested for loops within our Python scripts.

If elif else statement

We can use If to write a statement that tests whether certain condition is True and then implement an action.

Our if statement will evaluate to either True or False, and the specified code will run only when True. We can also have nested if statements within a for loop.

We need to format if statements in the following way:

  • End the conditional statement with a colon (:)
  • Indent the code (that we want run when True) below the conditional statement
    if statements can contain multiple lines in the body, as long as their indentation aligns.

That’s all for now on this introductory tutorial to python for data science. Hope you all have liked it. For learning advanced concepts of Python, please subscribe to my course.

Do you have any comments on this article? Feel free to share it.

Recent Posts

Menu