Reducing Python's startup time

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jake Edge
August 16, 2017

The startup time for the Python interpreter has been discussed by the core developers and others numerous times over the years; optimization efforts are made periodically as well. Startup time can dominate the execution time of command-line programs written in Python, especially if they import a lot of other modules. Python startup time is worse than some other scripting languages and more recent versions of the language are taking more than twice as long to start up when compared to earlier versions (e.g. 3.7 versus 2.7). The most recent iteration of the startup time discussion has played out in the python-dev and python-ideas mailing lists since mid-July. This time, the focus has been on the collections.namedtuple() data structure that is used in multiple places throughout the standard library and in other Python modules, but the discussion has been more wide-ranging than simply that.

A "named tuple" is a way to assign field names to elements in a Python tuple object. The canonical example is to create a Point class using the namedtuple() factory:

    Point = namedtuple('Point', ['x', 'y'])
    p = Point(1,2)

The elements of the named tuple can then be accessed using the field names (e.g. p.x) in addition to the usual p[0] mechanism. A bug filed in November 2016 identified namedtuple() as a culprit in increasing the startup time for importing the functools standard library module. The suggested solution was to replace the namedtuple() call with its equivalent Python code that was copied from the _source attribute of a class created with namedtuple(). The _source attribute contains the pure Python implementation of the named tuple class, which eliminates the need to create and execute some of that code at import time (which is what namedtuple() does).

There are a few problems with that approach, including the fact that any updates or fixes to what namedtuple() produces would not be reflected in functools. Beyond that, though, named tuple developer Raymond Hettinger was not convinced there was a real problem:

I would like to caution against any significant changes to save microscopic amounts of time. Twisting the code into knots for minor time savings is rarely worth it and it not what Python is all about.

Nick Coghlan agreed with Hettinger's assessment:

Caring about start-up performance is certainly a good thing, but when considering potential ways to improve the situation, structural enhancements to the underlying systems are preferable to ad hoc special cases that complicate future development efforts.

Hettinger closed the bug, though it was reopened in December to consider a different approach using Argument Clinic and subsequently closed again for more or less the same reasons. That's where it stood until mid-July when Jelle Zijlstra added a comment that pointed to a patch to speed up named tuple creation by avoiding some of the exec() calls. It was mostly compatible with the existing implementation, though it did not support the _source attribute. That led to a classic "bug war", of sorts, where people kept reopening the bug, only to see it be immediately closed again. It is clear that some felt that the arguments for closing the bug were not particularly compelling.

After several suggestions that the proper way to override the bug-closing decisions made by Hettinger and Coghlan was to take the issue to python-dev, Antoine Pitrou did just that. According to Pitrou, the two main complaints about the proposed fix were that it eliminated the _source attribute and that "optimizing startup cost is supposedly not worth the effort". Pitrou argued that _source is effectively unused by any Python code that he could find and that startup optimizations are quite useful:

[...] startup time is actually a very important consideration nowadays, both for small scripts *and* for interactive use with the now very wide-spread use of Jupyter Notebooks. A 1 ms. cost when importing a single module can translate into a large slowdown when your library imports (directly or indirectly) hundreds of modules, many of which may create their own namedtuple classes.

In addition, the _source attribute is something of an odd duck in that it would seem to be part of the private interface because it is prefixed with an underscore, but also that it is meant to be used as a learning tool, which is not typical for Python objects. The underscore was used so that source could be used as a tuple field name but, as Hettinger noted, it probably should have been named differently (e.g. source_). But he is adamant that there are benefits to having that attribute, mostly from a learning and understanding standpoint.

Ever the pragmatist, Guido van Rossum offered something of a compromise. He agreed with Pitrou about the need to optimize named tuple class creation, but hoped that it would still be possible to support Hettinger's use case:

The cumulative startup time of large Python programs is a serious problem and namedtuple is one of the major contributors -- especially because it is so convenient that it is ubiquitous. The approach of generating source code and exec()ing it, is a cool demonstration of Python's expressive power, but it's always been my sense that whenever we encounter a popular idiom that uses exec() and eval(), we should augment the language (or the builtins) to avoid these calls -- that's for example how we ended up with getattr().

[...] Concluding, I think we should move on from the original implementation and optimize the heck out of namedtuple. The original has served us well. The world is constantly changing. Python should adapt to the (happy) fact that it's being used for systems larger than any of us could imagine 15 years ago.

As might be guessed, a pronouncement like that from Van Rossum, Python's benevolent dictator for life (BDFL), led Hettinger to reconsider: "Okay, then Nick and I are overruled. I'll move Jelle's patch forward. We'll also need to lazily generate _source but I don't think that will be hard." He did add "one minor grumble", however, regarding the complexity of the CPython code:

I think we need to give careful cost/benefit considerations to optimizations that complicate the implementation. Over the last several years, the source for Python has grown increasingly complicated. Fewer people understand it now. It is much harder to newcomers to on-ramp. [...] In the case of this named tuple proposal, the complexity is manageable, but the overall trend isn't good and I get the feeling the aggressive optimization is causing us to forget key parts of the zen-of-python.

That tradeoff between complexity and performance is one that has played out in many different development communities over the years—the kernel community faces it regularly. Part of the problem is that the negative effects of a performance optimization may not be seen for a long time. As Coghlan put it:

Unfortunately, these are frequently cases where the benefits are immediately visible (e.g. faster benchmark results, removing longstanding limitations on user code), but the downsides can literally take years to make themselves felt (e.g. higher defect rates in the interpreter, subtle bugs in previously correct user code that are eventually traced back to interpreter changes).

Van Rossum's pronouncement set off a predictable bikeshedding frenzy around named tuple enhancements that eventually moved to python-ideas and may be worthy of a further look at some point. But there was also some pushback regarding Hettinger's repeated contention that shaving a few milliseconds here and there from the Python startup time was not an important goal. As Barry Warsaw said:

[..] start up time *is* a serious challenge in many environments for CPython in particular and the perception of Python’s applicability to many problems. I think we’re better off trying to identify and address such problems than ignoring or minimizing them.

Gregory P. Smith pointed to the commonly mentioned command-line utilities as one place where startup time matters, but also described another problematic area:

I'll toss another where Python startup time has raised eyebrows at work: unittest startup and completion time. When the bulk of a processes time is spent in startup before hitting unittest.main(), people take notice and consider it a problem. Developer productivity is reduced. The hacks individual developers come up with to try and workaround things like this are not pretty.

[...] In real world applications you do not control the bulk of the code that has chosen to use namedtuple. They're scattered through 100-1000s of other transitive dependency libraries (not just the standard library), the modification of each of which faces hurdles both technical and non-technical in nature.

The discussion (and a somewhat dismissive tweet from Hettinger [Note: Hettinger strongly disclaims the "dismissive" characterization.]) led Victor Stinner to start a new thread on python-dev to directly discuss the interpreter startup time, separate from the named tuple issue. He collected some data that showed that the startup time for the in-development Python 3.7 is 2.3 times longer than Python 2.7. He also compared the startup of the Python-based Mercurial source code management system to that of Git (Mercurial is 45 times slower) as well as comparing the startup times of several other scripting languages (Python falls into the middle of the pack there). In the thread, Pitrou pointed out the importance of "anecdotal data", which Hettinger's tweet had dismissed:

[...] We are engineers and have to make with whatever anecdotes we are aware of (be they from our own experiences, or users' complaints). We can't just say "yes, there seems be a performance issue, but I'll wait until we have non-anecdotal data that it's important". Because that day will probably never come, and in the meantime our users will have fled elsewhere.

Python has come a long way from its roots as a teaching language. There is clearly going to be some tension between the needs of languages geared toward teaching and those of languages used for production-quality applications of various kinds. That means there is a balance to be struck, which is something the core developers (and, in particular, Van Rossum) have been good at over the years. One suspects that startup time—and the named tuple implementation—can be optimized without sacrificing that.

(Log in to post comments)

Reducing Python's startup time

Posted Aug 17, 2017 7:56 UTC (Thu) by ovitters (guest, #27950) [Link]

I make use of Flexget (.com). It's written in Python and on my very slow (power efficient) CPU it takes a huge time to start (I mean 10 seconds+). It does support a "daemon" mode, though IMO it's just way easier to run it from cron. The startup is not just Python itself, obviously Flexget is inefficient but still, startup speed is pretty important.

Reducing Python's startup time

Posted Aug 17, 2017 20:51 UTC (Thu) by jaymell (subscriber, #106443) [Link]

Another use case where startup times have become important is 'serverless' code services like AWS Lambda where you are charged in 100-millisecond intervals for execution time. Slow startup times directly translate to increased costs when using these kinds of services.

Reducing Python's startup time

Posted Aug 18, 2017 12:23 UTC (Fri) by kugel (subscriber, #70540) [Link]

Sounds like an opportunity for Emacs' feature of dumping itself after bootstrap, and running from the dump afterwards.

Reducing Python's startup time

Posted Aug 18, 2017 21:54 UTC (Fri) by zenaan (guest, #3778) [Link]

Another example that may inspire someone depending on context:
Java "serverize" so that shell "Java scriptlets" bypass the Java Virtual Machine / JVM startup time (or rather, pay that cost only once):

Nailgun: Insanely Fast Java
http://martiansoftware.com/nailgun/

"Nailgun is a client, protocol, and server for running Java programs from the command line without incurring the JVM startup overhead. Programs run in the server (which is implemented in Java), and are triggered by the client (written in C), which handles all I/O."
...

One can image heisting the Nailgun C client code, renaming it to Pailgun or something :)

In fact the client-side code might simply be the same "Nailgun C client" turned into a multiplexer for both Java -and- Python.

How this could apply outside of shell script context, I'm not sure.

Reducing Python's startup time

Posted Aug 18, 2017 23:59 UTC (Fri) by vstinner (subscriber, #42675) [Link]

The daemon idea was discussed, but Python allows to play with signals, threads, fork, child processes, etc. Sharing a single daemon process with multiple clients is not a good idea. Or maybe the daemon should fork at each client request. I don't know.

Reducing Python's startup time

Posted Aug 19, 2017 13:05 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

I feel like Bananagun (or maybe just Banana) would be a more Pythonic name.

Reducing Python's startup time

Posted Aug 22, 2017 10:08 UTC (Tue) by dgm (subscriber, #49227) [Link]

My vote is for Rubber Chicken Knight, if I can have a say.

Reducing Python's startup time

Posted Aug 18, 2017 15:19 UTC (Fri) by ehiggs (subscriber, #90713) [Link]

Startup time in Python on a busy NFS device or parallel filesystem (GPFS, Lustre, etc) is particularly painful. If you strace the startup you can watch it stat hundreds of files. If you start using pkg_resources, where it needs to crawl the filesystem to see what's available, then it goes to the thousands or tens of thousands depending on the size of the project and this results in a startup time in minutes on HPC systems.

$ strace -c -e open,stat  python3 -c "import numpy; import pkg_resources; print('hello')"
hello
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000006           0      1193       129 stat
  0.00    0.000000           0       451         2 open
------ ----------- ----------- --------- --------- ----------------
100.00    0.000006                  1644       131 total

You would think that zipapp[1] fixes this by putting all the files into a single zipped executable, but last time I checked it only zips the application being distributed and not the dependencies.

[1] https://docs.python.org/3/library/zipapp.html

Reducing Python's startup time

Posted Aug 19, 2017 9:01 UTC (Sat) by gebi (guest, #59940) [Link]

maybe pex (python executable files)[0] can help here?

it creates a single executable including your python code and all dependencies.

[0]: https://github.com/pantsbuild/pex

Reducing Python's startup time

Posted Aug 21, 2017 16:35 UTC (Mon) by epa (subscriber, #39769) [Link]

I wonder if statting hundreds of files could be partly replaced by making a directory listing of the relatively small number of directories on the include path? That at least cuts out a stat() call to test if a file exists, though you do still need an NFS round trip to do anything further with it like find out its timestamp or open it.

Reducing Python's startup time

Posted Aug 21, 2017 19:50 UTC (Mon) by njs (subscriber, #40338) [Link]

Python 3's import system already does this optimization. pkg_resources might not.

Reducing Python's startup time

Posted Aug 24, 2017 13:19 UTC (Thu) by Wol (subscriber, #4433) [Link]

> We are engineers and have to make with whatever anecdotes we are aware of (be they from our own experiences, or users' complaints).

Multiple anecdotes make anecDATA. Okay, "I heard from a guy who heard it down the pub" isn't much use, but if I say to you "I find X is slower than it used to be" then that is a datum point. It's a hard example of a problem!

If the response of the developers is "oh, that's just an anecdote", then they are quite clearly applying the dictum "the exception proves the rule" - the more reports they get that it's slow, the more they take it as proof that it's fast.

Except that, when taken in context, "the exception proves the rule" is scientific speak for "if you can find JUST ONE exception to a scientific theory, then the scientists need to to back to the drawing board and rethink" - the rule has been proven to be *wrong*.

I'm sorry, but people who resort to saying it's just anecdotes are burying their heads in the sand ...

Cheers,
Wol

Reducing Python's startup time

Posted Aug 24, 2017 14:44 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Multiple anecdotes make anecDATA.

IME, this is usually used as a derisive term where people rely only on anecdotes rather than actually looking into measurements of reported issues.

> Except that, when taken in context, "the exception proves the rule" is scientific speak for "if you can find JUST ONE exception to a scientific theory, then the scientists need to to back to the drawing board and rethink" - the rule has been proven to be *wrong*.

No, the original meaning is that the existence of an exception means that there is something to except out of.

> If the response of the developers is "oh, that's just an anecdote", then they are quite clearly applying the dictum "the exception proves the rule" - the more reports they get that it's slow, the more they take it as proof that it's fast.

It doesn't apply to this, IMO. This behavior is just "alternative fact"-ing (though an explanation of why there is a dismissal can get it out of that category).

> I'm sorry, but people who resort to saying it's just anecdotes are burying their heads in the sand ...

Depends on the anecdote. Those from contributors or familiar users are weightier than high-profile people who have little to no evidence of having used it other than for some flashy blog post.