This article is not about machine learning, but about a piece of software engineering that often comes handy in data science practice. When writing code, everybody gets errors. Sometimes it is difficult to debug them. Using a debugger may help, but can also be intimidating. This is a TLDR tutorial on using pdb in IPython, focused on looking at variables inside functions.
The advantage of having global variables is that when a script stops with an error, you can look at the variables to discover the cause. With functions, no such luck. And of course any non-trivial software needs functions.
This is where the debugger comes in. Python comes with a debugger called pdb. We will look at using it within IPython.
May your code be as strong and as free of errors as this blockchain
First, you turn it on:
In [1]: pdb
Automatic pdb calling has been turned ON
Better yet, you have an option of choosing to go into debug mode only after an error. In such case, you just type debug post factum. Thanks to Ehud Ben-Reuven for pointing this out.
Both pdb and debug are IPython magic commands, so their official syntax is %pdb and %debug. The shorter forms work if there aren’t any globals of these names.
Now, let’s define a function:
def add_two( x ):
return x + 2
When an error occurs, you’ll get a debugger prompt:
In [3]: add_two('two')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-a77015a373f8> in <module>()
----> 1 add_two('two')
<ipython-input-8-a95bdacf8a55> in add_two(x)
1 def add_two( x ):
----> 2 return x + 2
TypeError: cannot concatenate 'str' and 'int' objects
> <ipython-input-8-a95bdacf8a55>(2)add_two()
1 def add_two( x ):
----> 2 return x + 2
ipdb>
Or, with automatic pdb calling off (spot one difference):
In [3]: add_two('two')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-a77015a373f8> in <module>()
----> 1 add_two('two')
<ipython-input-1-3da9f0985321> in add_two(x)
1 def add_two( x ):
----> 2 return x + 2
TypeError: cannot concatenate 'str' and 'int' objects
In [4]: debug
> <ipython-input-1-3da9f0985321>(2)add_two()
1 def add_two( x ):
----> 2 return x + 2
ipdb>
In this example the error is obvious and spelled out in the message, so we don’t really need a debugger. The point is to get familiar with it for real-world usage.
The pdb accepts a number of commands, many of which are one letter. You can list them with h.
For our purposes, the most important command is p, for “print”:
ipdb> h p
p expression
Print the value of the expression.
ipdb> p x
'two'
Frames
When using an external library, stack trace consists of many frames. In English it means that a function called a function which called a function and so on, and somewhere in this stack the error occured. Most often one is interested in finding an error in own code, but the debugger lands at the innermost, deepest function. We need to move up the stack, and the command to do it is u. You type it until the familiar snippet of code shows. Now you can inspect the variables.
Traces/breakpoints
You don’t need an error to stop a program execution and get into the debugger. It is enough to set a trace, or a breakpoint, somewhere in the code:
import pdb; pdb.set_trace()
When execution reaches this point, the program will stop and you’ll get a debugger prompt. To continue exectution, use the c command.
You can also add a breakpoint from within the debugger with the b command.
Quitting
q quits the debugger. You can also exit with Ctrl-D.
When you’re done debugging and don’t want debugger prompt on errors, you turn it off:
In [5]: pdb
Automatic pdb calling has been turned OFF
That’s the gist of. Not really a rocket science, is it?