High Performance Python

High Performance Python

Download

Introduction

Python is easy to learn. You’re probably here because now that your code runs correctly, you need it to run faster. You like the fact that your code is easy to modify and you can iterate with ideas quickly. The trade-off between easy to develop and runs as quickly as I need is a well-understood and often-bemoaned phenomenon. There are solutions.

Some people have serial processes that have to run faster. Others have problems that could take advantage of multicore architectures, clusters, or graphics processing units. Some need scalable systems that can process more or less as expediency and funds allow, without losing reliability. Others will realize that their coding techniques, often bor‐ rowed from other languages, perhaps aren’t as natural as examples they see from others.

In this book we will cover all of these topics, giving practical guidance for understanding bottlenecks and producing faster and more scalable solutions. We also include some war stories from those who went ahead of you, who took the knocks so you don’t have to.

Python is well suited for rapid development, production deployments, and scalable systems. The ecosystem is full of people who are working to make it scale on your behalf, leaving you more time to focus on the more challenging tasks around you

Who This Book Is For 
You’ve used Python for long enough to have an idea about why certain things are slow and to have seen technologies like Cython, numpy, and PyPy being discussed as possible solutions. You might also have programmed with other languages and so know that there’s more than one way to solve a performance problem.

While this book is primarily aimed at people with CPU-bound problems, we also look at data transfer and memory-bound solutions. Typically these problems are faced by scientists, engineers, quants, and academics

We also look at problems that a web developer might face, including the movement of data and the use of just-in-time (JIT) compilers like PyPy for easy-win performance gains.

It might help if you have a background in C (or C++, or maybe Java), but it isn’t a prerequisite. Python’s most common interpreter (CPython—the standard you normally get if you type python at the command line) is written in C, and so the hooks and libraries all expose the gory inner C machinery. There are lots of other techniques that we cover that don’t assume any knowledge of C.

You might also have a lower-level knowledge of the CPU, memory architecture, and data buses, but again, that’s not strictly necessary.

Who This Book Is Not For 
This book is meant for intermediate to advanced Python programmers. Motivated nov‐ ice Python programmers may be able to follow along as well, but we recommend having a solid Python foundation. We don’t cover storage-system optimization. If you have a SQL or NoSQL bottleneck, then this book probably won’t help you

What You’ll Learn 
Your authors have been working with large volumes of data, a requirement for I want the answers faster! and a need for scalable architectures, for many years in both industry and academia. We’ll try to impart our hard-won experience to save you from making the mistakes that we’ve made. At the start of each chapter, we’ll list questions that the following text should answer (if it doesn’t, tell us and we’ll fix it in the next revision!). We cover the following topics:
• Background on the machinery of a computer so you know what’s happening behind the scenes
• Lists and tuples—the subtle semantic and speed differences in these fundamental data structures
• Dictionaries and sets—memory allocation strategies and access algorithms in these important data structures
• Iterators—how to write in a more Pythonic way and open the door to infinite data streams using iteration
• Pure Python approaches—how to use Python and its modules effectively
• Matrices with numpy—how to use the beloved numpy library like a beast
• Compilation and just-in-time computing—processing faster by compiling down to machine code, making sure you’re guided by the results of profiling
• Concurrency—ways to move data efficiently
• multiprocessing—the various ways to use the built-in multiprocessing library for parallel computing, efficiently share numpymatrices, and some costs and benefits of interprocess communication (IPC)
• Cluster computing—convert your multiprocessing code to run on a local or re‐ mote cluster for both research and production systems
• Using less RAM—approaches to solving large problems without buying a humun‐ gous computer
• Lessons from the field—lessons encoded in war stories from those who took the blows so you don’t have to
Share This