Pickling in Python - The Very Basics
April 14, 2016
You just ran through a time-consuming process to load a bunch of data into a python object. Maybe you scraped data from thousands of websites. Maybe you computed a zillion digits of pi. If your laptop battery dies or if python crashes, your information will be lost.
Pickling allows you to save a python object as a binary file on your hard drive. After you pickle your object, you can kill your python session, reboot your computer if you want, and later load your object into python again.
You could back up your pickle file to Google Drive or DropBox or a plain old USB stick if you wanted. You could email it to a friend.
A word of warning: don't load pickles that you don't trust. Malicious people can make malicious pickles that may execute unexpected code on your computer (SQL injection, password brute forcing, etc). Stay away from bad pickles.
import pickle
# make an example object to pickle
some_obj = {'x':[4,2,1.5,1], 'y':[32,[101],17], 'foo':True, 'spam':False}
To save a pickle, use pickle.dump
.
A convention is to name pickle files *.pickle
, but you can name it whatever you want.
Make sure to open
the file in 'wb'
mode (write binary). This is more cross-platform friendly than 'w'
mode (write text) which might not work on Windows, etc.
with open('mypickle.pickle', 'wb') as f:
pickle.dump(some_obj, f)
# note that this will overwrite any existing file
# in the current working directory called 'mypickle.pickle'
For the purposes of demonstration, I'll delete the original object from memory to show you that it's really gone.
del some_obj
print some_obj
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-3080f97d7e85> in <module>()
1 del some_obj
2
----> 3 print some_obj
NameError: name 'some_obj' is not defined
Loading the pickled file from your hard drive is as simple as pickle.load
and specifying the file path:
with open('mypickle.pickle') as f:
loaded_obj = pickle.load(f)
print 'loaded_obj is', loaded_obj
loaded_obj is {'y': [32, [101], 17], 'x': [4, 2, 1.5, 1], 'foo': True, 'spam': False}
Pickling pandas DataFrames
Pandas has a very easy to use pickling functions. First we'll make an example dataframe:
import pandas as pd
df = pd.DataFrame([range(11), range(100,110)], columns=list('abcdefghijk'))
df
a | b | c | d | e | f | g | h | i | j | k | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | NaN |
pandas.DataFrame.to_pickle
Save the dataframe to a pickle file called my_df.pickle
in the current working directory.
Then for the purposes of demonstration again, I'll delete the original DataFrame
df.to_pickle('my_df.pickle')
del df
pandas.DataFrame.read_pickle
To load the pickled dataframe, simply do:
df2 = pd.read_pickle('my_df.pickle')
df2
a | b | c | d | e | f | g | h | i | j | k | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
1 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | NaN |
And that's all you need to know for simple pickling in python.
It's an easy way to back up important objects, pass objects between scripts, or even email a python object to another pythonista (but they shouldn't open it unless they're sure you haven't given them a malicious pickle...)