High Performance Python
Download
Introduction
Python is easy to learn. You’re probably here because now that your code runs correctly,
you need it to run faster. You like the fact that your code is easy to modify and you can
iterate with ideas quickly. The trade-off between easy to develop and runs as quickly as
I need is a well-understood and often-bemoaned phenomenon. There are solutions.
Some people have serial processes that have to run faster. Others have problems that
could take advantage of multicore architectures, clusters, or graphics processing units.
Some need scalable systems that can process more or less as expediency and funds allow,
without losing reliability. Others will realize that their coding techniques, often bor‐
rowed from other languages, perhaps aren’t as natural as examples they see from others.
In this book we will cover all of these topics, giving practical guidance for understanding
bottlenecks and producing faster and more scalable solutions. We also include some
war stories from those who went ahead of you, who took the knocks so you don’t
have to.
Python is well suited for rapid development, production deployments, and scalable
systems. The ecosystem is full of people who are working to make it scale on your behalf,
leaving you more time to focus on the more challenging tasks around you
Who This Book Is For
You’ve used Python for long enough to have an idea about why certain things are slow
and to have seen technologies like Cython, numpy, and PyPy being discussed as possible
solutions. You might also have programmed with other languages and so know that
there’s more than one way to solve a performance problem.
While this book is primarily aimed at people with CPU-bound problems, we also look
at data transfer and memory-bound solutions. Typically these problems are faced by
scientists, engineers, quants, and academics
We also look at problems that a web developer might face, including the movement of
data and the use of just-in-time (JIT) compilers like PyPy for easy-win performance
gains.
It might help if you have a background in C (or C++, or maybe Java), but it isn’t a prerequisite.
Python’s most common interpreter (CPython—the standard you normally
get if you type python at the command line) is written in C, and so the hooks and libraries
all expose the gory inner C machinery. There are lots of other techniques that we cover
that don’t assume any knowledge of C.
You might also have a lower-level knowledge of the CPU, memory architecture, and
data buses, but again, that’s not strictly necessary.
Who This Book Is Not For
This book is meant for intermediate to advanced Python programmers. Motivated nov‐
ice Python programmers may be able to follow along as well, but we recommend having
a solid Python foundation.
We don’t cover storage-system optimization. If you have a SQL or NoSQL bottleneck,
then this book probably won’t help you
What You’ll Learn
Your authors have been working with large volumes of data, a requirement for I want
the answers faster! and a need for scalable architectures, for many years in both industry
and academia. We’ll try to impart our hard-won experience to save you from making
the mistakes that we’ve made.
At the start of each chapter, we’ll list questions that the following text should answer (if
it doesn’t, tell us and we’ll fix it in the next revision!).
We cover the following topics:
• Background on the machinery of a computer so you know what’s happening behind
the scenes
• Lists and tuples—the subtle semantic and speed differences in these fundamental
data structures
• Dictionaries and sets—memory allocation strategies and access algorithms in these
important data structures
• Iterators—how to write in a more Pythonic way and open the door to infinite data
streams using iteration
• Pure Python approaches—how to use Python and its modules effectively
• Matrices with numpy—how to use the beloved numpy library like a beast
• Compilation and just-in-time computing—processing faster by compiling down to
machine code, making sure you’re guided by the results of profiling
• Concurrency—ways to move data efficiently
• multiprocessing—the various ways to use the built-in multiprocessing library
for parallel computing, efficiently share numpymatrices, and some costs and benefits
of interprocess communication (IPC)
• Cluster computing—convert your multiprocessing code to run on a local or re‐
mote cluster for both research and production systems
• Using less RAM—approaches to solving large problems without buying a humun‐
gous computer
• Lessons from the field—lessons encoded in war stories from those who took the
blows so you don’t have to