Intermediate Python for Data Science

In our previous post, we got started with an introduction to python for data science.

Now as part of this post, we will go a bit deeper and learn the concepts of intermediate python for data science.

We will discuss on functions, modules, importing and exporting from Python, working with csv files and important concepts in Python collections like lists, sets and dictionaries.

While this article discusses these concepts, note that the quick way to learn these Python concepts and other data science topics quickly is to attend my data science online course.

how to define functions in Python

Functions are a reusable piece of code packaged together. We can reuse a function by calling it with relevant parameters.

The parameters that a function takes are called the inputs of the function, and the result that it returns is called the output.

To define a function we need to

  1. use the def keyword.
  2. provide a name for the function
  3. provide input arguments, in case the function will need to accept arguments
  4. use the continuity operator  :
  5. provide function body
  6. provide a return value

Let us declare a function that returns the simple interest, considering the points mentioned above.

We should take note of the indentation of the function. Realize that after the colon, we indent the remainder of the function by one tab, which is the equivalent of 4 space bar strokes. This is to clarify to Python what part of the code belongs to the function. Therefore, indentation makes functional differences in Python’s interpretation of the code.

Lets us execute simple_interest function.

python function execution

Advantages of using functions

Other than reusability, there are 3 main advantages of using functions:

  • They allow us to use other people’s code without the necessity to have a deep understanding of how it was written (e.g., we use the print() function without reading the code inside it). We call this information hiding.
  • They break down complex logic into smaller components or modules. Instead of writing very lengthy and complicated code, we can progress function by function.
  • If we were writing a larger piece of code as a function, it would be easier to manage rather than the code that executes the same behavior. This would make testing easier as well. We refer to this as modularity.
  • Modularity makes it easier for someone else to read, understand, use, and build upon our code.There are two common strategies to modular programming:
    • Transforming our code into functions
    • Using object-oriented programming
  • They streamline our code and make it easier to maintain. Programmers reuse the same functions in multiple situations across a project. This means that they generalize the function as much as possible to maximize its usefulness. we call this process abstraction, which is an important part of reducing our code’s complexity, especially for larger projects
how to define optional arguments in Python functions?

Suppose in the above function, I want to define the optional rate as 8. Then I need to change the rate parameter as given below. Note that if I don’t pass the rate parameter while calling this function, then it will be taken as 8.If I pass a rate parameter as 9, then 9 will be considered. So defining optional parameters will give developers more flexibility

python function with optional arguments

Note that we can also call a function from another function.

What is a module in Python?

A module is a collection of functions and variables that have been bundled together in a single file. This single file, is generally centered around a specific theme. For example, numpy and pandas are modules.

Modules improve the readability of our code by abstracting away the implementation while allowing us to understand exactly what the code does.To load a module, we use the import statement.

For readability, it’s usually a good idea to import the modules we’ll need in the beginning of our script.Once we import a module, we get access to all the functions and variables within the module.

How to give alias while importing module

Sometimes, modules have long names. This means, we have to use the full module name everytime we use any of it’s objects. Instead, we can give the module name an alias:

To use a specific function within math module, we’ll use the dot notation followed by the name of the function.There’s no need to call a specific module to access the function.

Popular Python modules have documentation describing the names of the functions and variables we could use within the module. Generally, whenever you use a built-in module, it’s good practice to reference the documentation.

To summarize, use import statement to load a module. Load modules in the beginning of a script. – To use a function within a module, remember module.function(). – When using a built-in function like def() or type(), you do not need to add any module name in front the function.

Importing specific function from a module

If we are using only a few functions from a module, importing all the functions may not be an efficient use of our computer memory.We can specify which functions we’d like to use in our import.

Refer to the example given below.Even though numpy module has lot of functions available, I am importing only genfromtxt and random functions

After importing these functions, we won’t need to include the module name when calling the function.

We can directly call genfromtxt and random within our program.

Generally, if we know what functions we want to use, it’s better practice to import specific function names.

How to work with csv files in Python?

The csv module is used to work with csv files in Python.

Follow the below steps to load a csv file into a list.This module has a function called reader() which takes a file object as an argument and returns an object that represents our data.

  • create a file object to point to the csv file.
  • use the csv.reader() method to take the file object as an argument and return an object that represents our data
  • call the list() built-in function on the returned object to generate a list of lists.

input file

csv file handling

what is namespace in Python

A namespace is an area used to keep track of variables.

At any particular point in a Python program, there are several namespaces available.

Each function has its own namespace, called the local namespace, which keeps track of the function’s variables, including function arguments and locally defined variables.

Each module has its own namespace, called the global namespace, which keeps track of the module’s variables, including functions, classes, any other imported modules, and module-level variables and constants.

And there is the built-in namespace, accessible from any module, which holds built-in functions and exceptions.

When a line of code asks for the value of a variable x, Python will search for that variable in all the available namespaces, in order:

local namespace – specific to the current function or class method. If the function defines a local variable x, or has an argument x, Python will use this and stop searching.

global namespace – specific to the current module. If the module has defined a variable, function, or class called x, Python will use that and stop searching.

built-in namespace – global to all modules. As a last resort, Python will assume that x is the name of built-in function or variable.

If Python doesn’t find x in any of these namespaces,it gives up and raises an exception.

what is dir() function?

Use dir() function to see what all variables and methods are available in namespace.

When we load any module , say numpy, all the variables and methods which are defined in numpy module will be added to a global namespace, so that we can access them during the python session.

Since print() is a built-in function, the interpreter automatically stores print() into our namespace. As a result, we have access to print() anywhere in our script.

When we create a variable, the variable is also added to the namespace.

python dir function

We can also use dir() to list the valid names for a specific variable or module in our workspace. Let’s use the dir() function on numpy:

python dir function

local and global variables in Python

Let’s take a look at the below code:

Here, we’ve defined two variables(total,count) of int type and one variable of list type(l). Notice the positioning of the total and count variables compared to the l list

Variables declared inside a function definition are called local variables.

Let’s examine the total variable. We’ve defined total within the add() function. Within our code, we can access the total variable within the local namespace

However, if we try to access this variable outside of the accessible area, it will return an error.

Variables defined within the add() function are called local variables. Local variables can’t be accessed outside the function.

On the other hand, we’ve defined l as a list of numbers.

We’ve defined this outside the function, which means, this list is accessible throughout the entire script. This list is accessible in the global namespace.

We can access these values both inside and outside the function.This is called a global variable.

list comprehensions in python

We learned how to iterate over multiple values using a for loop on a list.

To review, let’s look at a for loop in action:


python for data science for loop

We can also re-write this for loop to populate abs_diff list in only one line of code, using a list comprehension.

A list comprehension is a concise way of creating lists. If I want to write the equivalent for the above for loop using a list comprehension, I will write

python for data science list comprehension

Note that we cannot use list comprehensions to add elements to an existing list.

dict comprehensions in python

Similar to a list comprehension, we have dictionary comprehensions as well.They create a new dictionary in a concise manner. We can write a dict comprehensions like shown below.

python for data science dict comprehension

Note that we cannot use them to add keys to an existing dictionary.

how to count no.of occurrences of a list item

If you want only one item’s count, use the list.count() method.Don’t use this if you want to count multiple items.

Calling count in a loop requires a separate pass over the list for every count call, which can be catastrophic for performance.

If you want to count all items, or even just multiple items, use Counter, as shown below.

The returned Counter object behaves very similar to a dictionary, but contains other useful methods. To use Counter(), just pass in any iterable object to the object’s constructor.

python counter

how to convert a python dictionary to a list

Let’s suppose I am working with a dictionary. I want to convert it to a list. The process to do this is as follows:

  • Invoke items() method of the dictionary and then
  • invoke list function on it.

This will generate a list of tuples from a dictionary. Recall that a tuple is an immutable data type , and behaves similar to lists.

python for data science-convert dict to list
how to sort a list

To sort a list of values, use the list.sort() method.

When we call list.sort(), we do not need to store this expression in a variable like list_marks = list_marks.sort().

This is because this method modifies the associated list directly instead of returning a new list object.

If we use reverse=True within the sort method, then the list will get sorted in descending order.

list sort reverse

There is also another system function called sorted().

sorted() will return a new list that contains the sorted list. But it will not modify the original list.

python sort

how to sort a list of lists
Using itemgetter

If we have a list of lists to be sorted, then we have to take a slightly different approach.We will need to use itemgetter in that scenario. Refer to below example where I want to sort a list of lists, based on second item in each list

python itemgetter

Note that we can add reverse=True parameter , if we want to sort the list in descending order of the integer elements in this example.

python itemgetter reverse sort


Using key parameter

Let’s suppose we have a list of lists containing integers like shown below.How do I sort this list based on length of each member list.?

I mean, the lengths of each of member lists in this example are 5,3 and 4 respectively. Now I want to sort this list such that, the member list which has 3 elements appears first, and then the member list which has 4 elements and so on.

To achieve this, we need to pass len function as a value for the key parameter within sort method.

If we want to sort this list in descending order of lengths of each member list, then we will add one more parameter , as shown below:

sorting nested list in reverse order

Please note that the key parameter , will accept any function as its value. It can even accept a lambda function as its value.

What is lambda function in Python?

In python, there are two ways of writing functions. We learned the first way using def in this post.

For example, in order to write a function that takes an integer argument and returns twice its input, we write it using def  as below

We can re-create the same function using the lambda keyword, by not using def and return keywords , like shown below.

Lambda functions can be used wherever functions are required. But remember that we can write only one expression within a lambda function.

python lambda function

Errors and exceptions in python

Syntax errors are common in any programming language.When you write code which does not conform to the rules of the parsing by underlying compiler, there will be  syntax errors 

syntax error in python

In the above example, the error is detected at the function print(), since a colon (‘:’) is missing before it.

Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. Errors detected during execution are called exceptions.Refer to an example below, where an exception , called ZeroDivisionError,  is raised due to division by zero.

exceptions in python

how to handle exceptions in python?

To handle exceptions, we will use try, except and finally blocks.Refer to an example given below.

In this example, we wrote a function divide(x,y) which will divide the first input argument(x) with the second argument(y) and store the result in the try block.

If y is passed as zero, it will result in a divide by zero exception. We will handle this in except block, using ZeroDivisionError.

If y is not passed as zero, the exception ZeroDivisionError is not raised , and the result variable will be printed using the else block.

The finally block will get executed irrespective of whether the exception is raised or not.

exception handling in python

Do you have any comments on this article? Please feel free to add your comments/suggestions.

Recent Posts