The Python JITs are coming

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jake Edge
June 15, 2016

Python Language Summit

Nathaniel Smith envisions a future where just-in-time (JIT) compiler techniques will be commonly used in Python, especially for scientific computing. He presented his ideas on where things are headed at the 2016 Python Language Summit. He currently works at the University of California, Berkeley on NumPy and other scientific Python projects. Part of what he has been doing is "working on the big picture of what JITs will mean for scientific computing".

The adoption of Python 3 in scientific computing has been quite slow since 3.0 was released in 2008. But it appears to be reaching an inflection point in 2016; everyone is teaching Python 3, he said, and one-third of all NumPy downloads are for Python 3. He is hearing "when do we drop support for Python 2?" from projects these days.

That is typical of various phenomena, where there is a long period with lots of work going on, but seemingly no change. At some point, the curve of adoption (for example) goes nearly vertical; it goes from 20% to 80% quickly. "Transitions are sneaky like that", Smith said, and you see the same curve in unrelated areas, like epidemiology, where virus propagation has a similar trend. The percentage of web browser users with JavaScript JITs over time exhibits a similar curve as well.

Smith has a hypothesis that in the next two to four years, there will be a significant transition to JIT-based Python—not IronPython, Jython, or some other alternative implementation, but for CPython itself. PyPy is ten years old at this point, but to a first approximation, no one is using it. PyPy has always been resource-limited, but there is growing interest in the industry for CPython JIT technology. There has been a "huge uptick" in companies investing in Python JITs and, at the same time, JIT technology is commodifying. LLVM or libraries from Microsoft and IBM can be used to ease the building of JITs.

One of the major blockers is being resolved right now, he said. He called it "the PyPy problem": small programs don't need PyPy, but large programs can't use it because they need to access C-based extensions (e.g. NumPy). PyPy has learned "by bitter experience" that support for the C extension libraries is required.

But the advent of "whole-language JITs" has come. That means that Pyjion can pass all of the NumPy tests and Pyston is nearly there (99.27% in early May), he said. PyPy is "holding [its] nose" and implementing the technique, which has allowed it to go from 92.4% of NumPy tests passing (using the standard NumPy) in April to 96.2% in May. Others are learning from PyPy, so the numbers are changing rapidly; by the end of May, Pyston was passing all of the tests, he said.

So a few months from now, we will go from zero "drop-in compatible JITs" for Python to three. They may not be ready for deployment in production quite yet, but they are getting there.

That transition will have consequences and it is worth thinking about what is needed to get ready for them. It will lead to changes in the Python ecosystem. He is organizing a Python compilers workshop in conjunction with SciPy, which will be held in Austin, Texas in mid-July. Some of the consequences will be discussed there.

The first consequence that Smith described is that, for libraries like NumPy, there is a "catch-22". If it needs to be fast for CPython, it has to be written in C, but if it needs to be fast for a JIT, you cannot use C. He showed a simple mysum() function that totaled up the elements in an iterable. If it is passed a Python object like list(range(N)), the JIT knows what it is and can do lots of optimizations. But if it is passed a NumPy array, which is "opaque C stuff", the JIT doesn't understand it, so it will have trouble even achieving the performance of a non-NumPy version on a JIT-less CPython.

One way to handle that would be for the JITs to gain knowledge of the NumPy internals. "As a NumPy developer, you can imagine how I feel about this", he said. But there are lots of projects that already have that knowledge (e.g. Numba, PyPy, Cython, and more; he predicts Pyston will get it "any day now"). Those projects don't like reaching into NumPy either, but have to for performance reasons.

His dream is to have one codebase that can work in any of these environments. It could be based on Cython, since "we know it works". The code could be converted to C for CPython or used directly by PyPy, Numba, and others.

"JIT engines are viciously complicated beasts", Smith said. Another consequence of the shift to JIT-based Python will be that the development and maintenance of JIT implementations will require focused and sustained effort that only companies can provide—at least currently. There are two paths forward. One is that CPython will still be driven by a diverse set of volunteers and the JITs will be driven mostly or completely by dedicated corporate teams.

There are some reasons why that path might not be the right one, though. "Companies are great", Smith said, but only represent one slice of the community. For example, he works on Python packaging for scientific-computing packages because it is hard for companies to justify doing that kind of work upstream when they have a business model based on the pain of that packaging.

There is an alternative, emerging model that would add paid contributors that work for the community. A small number of them could make a big difference as they could keep the big picture in mind and cover gaps that the companies are not filling. He pointed to the $6 million in funding for the Jupyter project as an example. Jupyter (formerly IPython) is "an overgrown REPL", but it was able to attract that kind of funding; a Python-JIT project could too.

"If that's what we want, we need to start planning now", Smith said. The Python Software Foundation (PSF) is not set up to handle that kind of mission. "Building that kind of institutional capacity takes time", he said, so work on that should start soon.

Index entries for this article
Conference	Python Language Summit/2016

(Log in to post comments)

The Python JITs are coming

Posted Aug 23, 2016 17:38 UTC (Tue) by leoluk (guest, #97665) [Link]

There's yet another solution for the NumPy issue: PyPy is re-implementing NumPy in pure Python so that it can be JITed - NumPyPy. No idea how that interacts with the C extensions like Pandas or SciPy.

That being said, JIT optimization for NumPy may not even be that important since NumPy and its ecosystem is very fast already and for many use cases, the slow parts written in Python (i.e. algorithms, tooling) need optimization.

The Python JITs are coming

Posted Aug 23, 2016 21:36 UTC (Tue) by rrdharan (subscriber, #41452) [Link]

I believe "(formerly iPython)" is incorrect, or at least misleading, and slightly better terse way of phrasing it would be "(formerly IPython Notebook)"? Note also that the I is capitalized.

In actuality the situation and relationship between IPython, IPython notebook, and which projects are moving under the Jupyter umbrella is complicated:
https://ipython.org/

The Python JITs are coming

Posted Aug 24, 2016 1:21 UTC (Wed) by jake (editor, #205) [Link]

> Note also that the I is capitalized.

That part, at least, we got right ... I believe that was how the speaker referred to it (i.e. "formerly IPython") but at this point I can't remember for sure. Hopefully, the general idea came across anyway.

jake