Progress on the Gilectomy
Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
At the 2016 Python Language Summit, Larry Hastings introduced Gilectomy, his project to remove the global interpreter lock (GIL) from CPython. The GIL serializes access to the Python interpreter, so it severely limits the performance of multi-threaded Python programs. At the 2017 summit, Hastings was back to update attendees on the progress he has made and where Gilectomy is headed.
He started out by stating his goal for the project. He wants to be able to run existing multi-threaded Python programs on multiple cores. He wants to break as little of the existing C API as possible. And he will have achieved his goal if those programs run faster than they do with CPython and the GIL—as measured by wall time. To that end, he has done work in four areas over the last year.
He noted that "benchmarks are impossible" by putting up a slide that showed the different CPU frequencies that he collected from his system. The Intel Xeon system he is using is constantly adjusting how fast the cores run for power and heat considerations, which makes it difficult to get reliable numbers. An attendee suggested he look into CPU frequency pinning on Linux.
Atomicity and reference counts
CPU cores have a little bus that runs between them, which is used for atomic updates among other things, he said. The reference counts used by the Gilectomy garbage collection use atomic increment and decrement instructions frequently, which causes a performance bottleneck because of the inter-core traffic to ensure cache consistency.
So he looked for another mechanism to maintain the reference counts without all of that overhead. He consulted The Garbage Collection Handbook, which had a section on "buffered reference counting". The idea is to push all of the reference count updating to its own thread, which is the only entity that can look at or change the reference counts. Threads write their reference count changes to a log that the commit thread reads and reflects those changes to the counts.
That works, but there is contention for the log between the threads. So he added a log per thread, but that means there is an ordering problem between operations on the same reference count. It turns out that three of the four possible orderings can be swapped without affecting the outcome, but an increment followed by a decrement needs to be done in order. If the decrement is processed first, it could reduce the count to zero, which might result in the object being garbage collected even though there should still be a valid reference.
He solved that with separate increment and decrement logs. The decrement log can only be processed after all of the increments. This implementation of buffered reference counting has been in Gilectomy since October and is now working well. He did some work on the Py_INCREF() and Py_DECREF() macros that are used all over the CPython code; the intent was to cache the thread-local storage (TLS) pointer and reuse it over multiple calls, rather than looking it up for each.
Buffered reference counts have a weakness: they cannot provide realtime reference counts. It could be as long as a second or two before the reference count actually has the right value. That's fine for most code in Gilectomy, because that code cannot look at the counts directly.
But there are places that need realtime reference counts, the weakref module in particular. Weak references do not increment the reference count but can be used to reference an object (e.g. for a cache) until it is garbage collected because it has no strong references. Hastings tried to use a separate reference count to support weakref, but isn't sure that will work. Mark Shannon may have convinced him that resurrecting objects in __del__() methods will not work under that scheme; it may be a fundamental limitation that might kill Gilectomy, Hastings said.
More performance
Along the way, he came to the conclusion that the object allocation routines in obmalloc.c were too slow. The object allocation scheme has different classes for different sizes of objects, so he added per-class locks. When that was insufficient, he added two kinds of locks: a "fast" lock for when an object exists on the free list and a "heavy" lock when the allocation routines need to go back to the pool for more memory. He also added per-thread, per-class free lists. As part of that work, he added a fair amount of statistics-gathering code but went to some lengths to ensure that it had no performance impact when it was disabled.
There are a lots of places where things are being pulled out of TLS and profiling the code showed 370 million calls to get the TLS pointer over a seven to eight second run of his benchmark. In order to minimize that, he has added parameters to pass the TLS pointer down into the guts of the interpreter.
An attendee asked if it made sense to do that for the CPython mainline, but Hastings pointed out those calls come from what he has added; CPython with a GIL does not have that performance degradation. Another attendee thought it should only require one assembly instruction to get the TLS pointer and that there is a GCC extension to use that. Hastings said that he tried that, but could not get it to work; he would be happy to have help as it should be possible to make it faster.
The benchmark that he always uses is a "really bad recursive Fibonacci". He showed graphs of how various versions of Gilectomy fare versus CPython. Gilectomy is getting better, but is still well shy of CPython speed in terms of CPU time. But that is not what he is shooting for; when looking at wall time, the latest incarnation of Gilectomy is getting quite close to CPython's graph line. The "next breakthrough" may show Gilectomy as faster than CPython, he said.
Next breakthrough
He has some ideas for ways to get that next breakthrough. For one, he could go to a fully per-thread object-allocation scheme. Thomas Wouters suggested looking at Thread-Caching Malloc (TCMalloc), but Hastings was a bit skeptical. The small-block allocator in Python is well tuned for the language, he said. But Wouters said that tests have been done and TCMalloc is no worse than Python's existing allocator, but has better fragmentation performance and is multi-threaded friendly. Hastings concluded that it was "worth considering" TCMalloc going forward.
He is thinking that storing the reference count separate from the object might be an improvement performance-wise. Changing object locking might also improve things, since most objects never leave the thread they are created in. Objects could be "pre-locked" to the thread they are created in and a mechanism for threads to register their interest in other threads' objects might make sense.
The handbook that he looked in to find buffered reference counts says little about reference counting; it is mostly focused on tracing garbage collection. So one thought he has had is to do a "crazy rewrite" of the Python garbage collector. That would be a major pain and break the C API, but he has ideas on how to fix that as well.
Guido van Rossum thought that working on a GIL-less Python and C API would
be much
easier in PyPy (which has no GIL), rather than CPython. Hastings said that
he thought having a multi-threaded Python would be easier to do using
CPython. Much of breakage in the C API simply comes from adding
multi-threading into the mix at all. If you want multi-core performance,
those things are going to have to be fixed no matter what.
But Van Rossum is concerned that all of the C-based Python extensions will be broken in Gilectomy. Hastings thinks that overstates things and has some ideas on how to make things better. Someone had suggested only allowing one thread into a C extension at a time (so, a limited GIL, in effect), which might help.
The adoption of PyPy "has not been swift", Hastings said; he thinks that since CPython is the reference implementation of Python, it will be the winner. He does not know how far he can take Gilectomy, but he is sticking with it; he asked Van Rossum to "let me know if you switch to PyPy". But Van Rossum said that he is happy with CPython as it is. On the other hand, Wouters pointed out one good reason to stick with experimenting with CPython; since the implementation is similar to what the core developers are already knowledgeable about, they will be able to offer thoughts and suggestions.
Hastings also gave a talk about Gilectomy status a few days later at PyCon; a YouTube video is available for those interested.
[I would like to thank the Linux Foundation for travel assistance to
Portland for the summit.]
Index entries for this article | |
---|---|
Conference | Python Language Summit/2017 |
(Log in to post comments)
Progress on the Gilectomy
Posted May 24, 2017 22:12 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
Whut?
PyPy most definitely has a GIL: http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-g...
It's implemented a bit differently from CPython, but it's there. There was a project to add STM to PyPy but it went nowhere.
Progress on the Gilectomy
Posted May 25, 2017 0:48 UTC (Thu) by jake (editor, #205) [Link]
so i see ... i must have misunderstood Guido's point somehow ... thanks for the correction ...
jake
Progress on the Gilectomy
Posted May 25, 2017 18:25 UTC (Thu) by lhastings (guest, #66451) [Link]
I couldn't tell you what I said in the rush of the moment, and I probably phrased it badly, so let me clarify. I don't think it'd be easier for CPython to get rid of its GIL per se. I think it'll be easier for CPython to get rid of its GIL *and still run Python extensions written in C*. It's easier for PyPy to get rid of its GIL, but it's enormously harder for them to run Python extensions written in C. If the end goal is "have a Python interpreter without a GIL that runs C extensions", then yes *that* is easier to achieve for CPython than it is for PyPy.
Progress on the Gilectomy
Posted May 25, 2017 16:45 UTC (Thu) by samlh (subscriber, #56788) [Link]
PALLOC2
Posted May 25, 2017 10:38 UTC (Thu) by linuxrocks123 (guest, #34648) [Link]
I did tests with Python when I was writing PALLOC2. I remember seeing modest gains.
PALLOC2
Posted May 25, 2017 11:25 UTC (Thu) by linuxrocks123 (guest, #34648) [Link]
Progress on the Gilectomy
Posted May 25, 2017 13:54 UTC (Thu) by Sesse (subscriber, #53779) [Link]
Also, having multithreaded Fibonacci as the only benchmark? And going a year down an optimization project not knowing you can turn off frequency scaling to get more reliable microbenchmarks? Seriously?
Progress on the Gilectomy
Posted May 25, 2017 18:33 UTC (Thu) by lhastings (guest, #66451) [Link]
However: IIUC there are some supported platforms where TLS is only available as an expensive function call, so it's better to cut down on the number of TLS lookups I need to do anyway, as that'll be a win on all systems.
> Also, having multithreaded Fibonacci as the only benchmark? And going a year down an optimization project not knowing you can turn off frequency scaling to get more reliable microbenchmarks? Seriously?
Yup. I'm actually a really bad person to undertake this project. It's just that I'm the only person who is both willing and can find the time to do so. CPython is essentially an all-volunteer project, and we're just people doing the best we can.
Progress on the Gilectomy
Posted May 25, 2017 19:11 UTC (Thu) by linuxrocks123 (guest, #34648) [Link]
Are you _SURE_ it's actually making a system call? Just because you're calling pthreads doesn't mean glibc actually implements that function with a system call. If it's possible to do a MOV from FS/GS to implement the pthreads function, that's what glibc should be doing, because it's drastically more expensive to do a system call. If glibc is unnecessarily making a system call to do TLS on Linux, either glibc is broken, or we're all missing something.
In your position, I would verify glibc is doing something we think is stupid, and then ask the right mailing list "why is it doing this thing". Then you can either fix glibc, improving your own project and also every other one that uses thread-local storage, or you'll learn why glibc is implemented the way it is, and that will give you a deeper understanding of TLS that might affect how you proceed with Gilectomy. Either way, you win.
Progress on the Gilectomy
Posted May 25, 2017 19:56 UTC (Thu) by Sesse (subscriber, #53779) [Link]
Progress on the Gilectomy
Posted May 25, 2017 19:56 UTC (Thu) by Sesse (subscriber, #53779) [Link]
Non-standard? They're in standard C11.
Progress on the Gilectomy
Posted May 26, 2017 10:26 UTC (Fri) by ehiggs (subscriber, #90713) [Link]
Microsoft claims that Visual Studio 2015 has support for C99[2] except for libraries which aren't used in C++ (e.g. tgmath.h). But again, this is C99 and not C11.
[1] https://mail.python.org/pipermail/python-dev/2008-July/08...
[2] https://msdn.microsoft.com/en-us/library/hh409293.aspx
Progress on the Gilectomy
Posted May 26, 2017 10:50 UTC (Fri) by Sesse (subscriber, #53779) [Link]
(Visual Studio supports nearly all of C99 in recent versions; strangely enough, things have happened since 2008. They also support thread_local through C++11, or you can do #define thread_local __declspec(thread) and be done with it. Python already uses MSVC-specific constructs such as __declspec(dllexport).)
Progress on the Gilectomy
Posted May 26, 2017 12:25 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
Progress on the Gilectomy
Posted May 28, 2017 5:18 UTC (Sun) by gutworth (guest, #96497) [Link]
Progress on the Gilectomy
Posted Jun 2, 2017 3:06 UTC (Fri) by lhastings (guest, #66451) [Link]
Progress on the Gilectomy
Posted Jun 2, 2017 4:26 UTC (Fri) by gutworth (guest, #96497) [Link]
Progress on the Gilectomy
Posted May 25, 2017 22:22 UTC (Thu) by pboddie (subscriber, #50784) [Link]
Yup. I'm actually a really bad person to undertake this project. It's just that I'm the only person who is both willing and can find the time to do so. CPython is essentially an all-volunteer project, and we're just people doing the best we can.
It's great that you are willing to spend your own time doing this work, but would it not be in the Python community's interest that such necessary work be funded? The Python Software Foundation has a fair budget to play with (noted in this article) but chooses to spend less than 10% of it - maybe not much more than 5% of it - on directly improving Python. Growing the community by funding Python events is not a bad thing, but it is all for nothing if Python itself has perceived deficiencies that cause people to use other things instead.
I'd also add that despite various companies relying on Python for their success, those companies - particularly the larger ones - always seem to hold back on contributing substantial improvements to Python, leaving various initiatives as personal or spare time projects of their employees which inevitably run out of steam because they are obviously not adequately resourced. I guess it just shows that merely entertaining corporate activity around a project like Python doesn't automatically translate to lots of benefits for the project, perhaps because it is easy for a corporate decision-maker to point at the volunteers doing all the necessary work for them for free already, and so they can justify investing nothing in the project themselves (beyond paying for a PSF sponsorship and thus buying more publicity for themselves).
And, ominously for Python, if one of those successful corporate users of Python feels that the volunteers aren't coming through with essential improvements, they can always migrate to something else. If such entities are supposed to drive Python development, there needs to be some way of getting them to invest properly in it. Otherwise, the PSF has to step into the vacuum and see that things get done regardless of whether they step on corporate toes in getting them done. But I don't see either of these things happening.
Progress on the Gilectomy
Posted May 26, 2017 1:57 UTC (Fri) by JdGordy (subscriber, #70103) [Link]
Progress on the Gilectomy
Posted May 26, 2017 11:17 UTC (Fri) by pboddie (subscriber, #50784) [Link]
Progress on the Gilectomy
Posted Jun 3, 2017 3:18 UTC (Sat) by njs (subscriber, #40338) [Link]
But:
- historically there's been a pretty strong split between the PSF as handling the legal/social side of things and python-dev handling technical matters, so the PSF has lots of expertise and infrastructure for supporting conferences and meetups but not as much for technical stuff. This is obviously fixable but as you know changing volunteer-run institutions is never simple.
- funding development is expensive and if you mess it up then it directly impacts people's lives. The PSF has a substantial budget relative to other similar organizations, but a "six-figure dollar total" isn't necessarily enough to pay for one developer! And the PSF has been extremely conservative about hiring, because they don't want to ever be in a position where they have to be like "oops, that sponsor dropped out so surprise, you don't get paycheck next month". (OTOH if you have an idea for something that can be done for a fixed chunk of money in the $1k-$10k range then you can totally apply for a grant; e.g. the PSF gave twisted a grant to help with their py3 porting. But I'm guessing $5k wouldn't make a difference to how much energy Larry has to put into the GILectomy work, given he already has a full time job.)
- Logistically it's not as simple as just throwing money at someone. There are community concerns about how hiring core developers could create perceptions of unfairness, drive away volunteers, etc.; everyone's heard scary stories about that time Debian tried it and it was a disaster. How do you provide oversight for the first technical employee – it's a bit problematic to ask volunteers to evaluate them in their spare time. And so forth. I think this is going away over time as things like the Django fellows program demonstrate success, but it's a thing.
- Probably moon-shot projects like the GILectomy would not be the first priority for funding anyway; more likely would be Python's packaging infrastructure, or stuff like bug/PR triage.
Progress on the Gilectomy
Posted Jun 3, 2017 20:00 UTC (Sat) by pboddie (subscriber, #50784) [Link]
This is an ongoing discussion in the community; in fact there's a PSF board election happening right now and several of the candidates have variants on "hey lets spend money on python development" in their platforms.
I'll have to take a look. I admit that I don't follow things like PSF board elections any more.
The PSF has a substantial budget relative to other similar organizations, but a "six-figure dollar total" isn't necessarily enough to pay for one developer! And the PSF has been extremely conservative about hiring, because they don't want to ever be in a position where they have to be like "oops, that sponsor dropped out so surprise, you don't get paycheck next month".
I don't think you'd ever see the PSF hire someone as a developer. Then again, the PSF does appear to have some full-time positions, some of them administrative, others related to the conference/event orientation of the organisation, and I don't know whether those positions are competitively paid or not.
(OTOH if you have an idea for something that can be done for a fixed chunk of money in the $1k-$10k range then you can totally apply for a grant; e.g. the PSF gave twisted a grant to help with their py3 porting. But I'm guessing $5k wouldn't make a difference to how much energy Larry has to put into the GILectomy work, given he already has a full time job.)
As I think I noted, grants take up a rather small proportion of the PSF's spending, and one can even argue that they only seem to be done out of necessity, such as fixing up the packaging infrastructure before something really bad happens, or to support the Python 3 migration vision that hasn't been realised all by itself. And then, as you note, you have people wanting to do the work having to fit that work in around their day job, leading up to the somewhat perverse example of various core developers being hired to "work on Python" actually not really spending their work time, or very much of it, "on Python" as such.
All of this suggests a structural problem with the way people expect stuff to get done. Which ends up being expressed as "we need more volunteers", but where those volunteers can only really nibble away at the smaller problems because the focus is on funding sprints or hackathons (or whatever) as opposed to projects, initiatives and, ultimately, people. And the solution that involves getting people hired to work on Python, thus avoiding awkward issues (see next paragraph), doesn't really seem to work out in general (see previous paragraph), or it involves other obligations like doing a doctorate which can be substantial distractions in their own right.
There are community concerns about how hiring core developers could create perceptions of unfairness, drive away volunteers, etc.
I think people are justifiably worried about the response to hiring people because there is a culture of everything having to be a zero-sum game in the Python community. Someone will gladly pipe up and say how their ultra-successful, heavily-promoted (at the expense of everything else even tangentially related) project is being undermined if the PSF opens its chequebook and it is not for their benefit. And there may actually be sponsors who don't want people to work on certain projects if those projects are in any way in competition with their products.
The PSF will have to face up to this eventually, though. The only question is whether it happens before its target audience have decided that they prefer something else.
Progress on the Gilectomy
Posted Jun 4, 2017 3:48 UTC (Sun) by njs (subscriber, #40338) [Link]
Progress on the Gilectomy
Posted Jun 4, 2017 14:20 UTC (Sun) by pboddie (subscriber, #50784) [Link]
Progress on the Gilectomy
Posted Jun 4, 2017 23:35 UTC (Sun) by njs (subscriber, #40338) [Link]
Progress on the Gilectomy
Posted Jun 5, 2017 14:16 UTC (Mon) by pboddie (subscriber, #50784) [Link]
PSF grants funding Python development in 2016 amounted to 6% of the total expenditure.
I rely on public knowledge to estimate how much time core developers spend working on, not working with, Python in their day jobs. And even if some people are doing so, other evidence of people's employment translating to significant progress on critical implementation issues (see the article for an example) is rather thin on the ground. (Making other people's lives harder is, apparently, a different matter.)
But these are merely "insinuations", apparently. That's why other people at those corporations consider going their own way instead. Nothing to see here, I guess.
(And as for the zero-sum game culture, I guess you've never encountered anyone who, upon being told that you're working on something similar to them and want to share perspectives, flat out asked you why you don't just work on their project instead. Start with the passive aggression towards Python 2 at the very top and then work your way down through all the people belittling each other's projects. There's plenty to see if you want to.)
Progress on the Gilectomy
Posted May 25, 2017 19:56 UTC (Thu) by excors (subscriber, #95769) [Link]
That sounds like it might produce quite misleading results for the new reference counting approach. Atomic increment/decrement can be relatively fast when the data stays in the core's cache, it only gets really expensive when ping-ponging between multiple cores. Writing all refcount operations to a big per-thread log will always be quite expensive, since that always has to go out to RAM, and I guess you need to write at least a 64-bit pointer each time.
On my CPU (4 cores plus hyperthreading) I get about 200M atomic increments per second with one thread, or 50M/sec with 8 contending threads (i.e. 32x slower per thread), or about 700M/sec with 8 non-contending threads (i.e. approximately scaling with number of physical cores used). I can write about 800M/sec 64-bit values to RAM with one thread, and about 1200M/sec with 8 threads. (These are all just very rough figures, not accurately measured at all.)
That suggests code that is constantly touching the same object in many threads will potentially be much faster with the log approach. But for single-threaded code, or multi-threaded code that mainly uses objects in a single thread at once, the difference is much less clear (especially since the log approach will slow everything else down by thrashing the cache (unless it's designed very carefully to avoid that), can use up a significant chunk of the system's memory bandwidth, and needs another thread to read and process the whole log perhaps every few msecs (if you don't want the log to grow to hundreds of megabytes when the system is very busy)).
I'd guess most real applications will be more like the second kind with only a limited amount of shared state (and most of that protected by mutexes anyway), because the only way to preserve your sanity while writing (and debugging) multi-threaded code is to avoid shared state, and they will behave very differently to microbenchmarks specifically designed to stress the refcounting system. (Or perhaps real Python applications do a lot of refcounting of read-only global class objects and function objects and stuff? I have no idea how that really works.)
Progress on the Gilectomy
Posted Jun 2, 2017 2:55 UTC (Fri) by lhastings (guest, #66451) [Link]
> That sounds like it might produce quite misleading results for the new reference counting approach. Atomic increment/decrement can be relatively fast when the data stays in the core's cache, it only gets really expensive when ping-ponging between multiple cores. Writing all refcount operations to a big per-thread log will always be quite expensive, since that always has to go out to RAM, and I guess you need to write at least a 64-bit pointer each time.
[...]
> That suggests code that is constantly touching the same object in many threads will potentially be much faster with the log approach. But for single-threaded code, or multi-threaded code that mainly uses objects in a single thread at once, the difference is much less clear
First, it'd be the rare Python program that didn't share objects across threads. Many built-in objects are singletons in Python: None, True, False, empty string, empty tuple, and the "small integers" (-5 through 256 inclusive). Also, modules, classes, and functions are all singletons, and they all have reference counts too. And then the constants *compiled into* those modules, classes and functions are all shared and refcounted. I expect that even in a program designed around running well in multiple threads, sharing as few objects across cores as possible, will still implicitly share a lot of objects under the hood. My bad benchmark is admittedly a pathological case about object sharing, but it's not a world of difference.
Second, while I concede I don't genuinely know how atomic incr/decr works inside the CPU, obviously there must be *some* synchronization going on under the covers. The core you're running on doesn't know in advance whether or not another core is examining the memory you want to atomically incr/decr, so it *has* to synchronize somehow with the other cores. That synchronization itself seems to be expensive, and the more cores you have the more synchronization it needs to do. I assume this synchronization is a primary cause of the worse-than-linear scaling I observed when benchmarking the Gilectomy. And that's why, even with its obvious overhead, reference count logging seems to be a big performance win.
Why am I so sure it's this synchronization rather than the cache invalidation that's important? Because the cache invalidation is *still happening* with the reference count logging. The commit thread for the reference count log is continually changing the reference counts on the shared objects: the small ints, the function, the code object, the module object, etc. Those writes invalidate the cache of those objects for all the other cores. And yet I observe a big improvement win with the reference count logger. It seems to me that all the reference count log is really doing is getting rid of the synchronization overhead.
If you'd like to test your theory that atomic incr/decr isn't really so bad, you could try it with the Gilectomy. Here's how I'd do it. For a test with N cores, I'd have N modules, each with their own separate implementation of fib(). That'd ensure they weren't sharing the functions and code objects, the constants inside the code objects, or the modules. I'd then change the function so the lowest it went was 377, aka fib(14). That'd cut down on use of the small integers, though the code would still use 1 and 2, and the fib recursion level (e.g. the 14 in fib(14)) would still be using small ints. That's about as good as you're going to get with fib in Python. I predict you'll see improved scaling over my original fib benchmark but it'll still be markedly worse-than-linear, and not as gentle as with the reference count log.
Progress on the Gilectomy
Posted Jun 3, 2017 7:22 UTC (Sat) by rghetta (subscriber, #39444) [Link]
Is there some convergence with Eric Snow's separate subinterpreters work [https://lwn.net/Articles/650489/] ? While the two projects approach the gil problem almost from the opposite direction, could one benefit from the work done in the other ?
Progress on the Gilectomy
Posted May 27, 2017 19:25 UTC (Sat) by flussence (subscriber, #85566) [Link]
Progress on the Gilectomy
Posted May 29, 2017 9:42 UTC (Mon) by Yhg1s (subscriber, #101968) [Link]
Progress on the Gilectomy
Posted Jun 2, 2017 3:12 UTC (Fri) by lhastings (guest, #66451) [Link]
Some clarification on this.
First: my approach for weakref works fine. I'm not sure how Jake got it in his notes that I wasn't sure. It's probably my fault, I'm sure I explained it badly!
Second, regarding resurrecting objects under __del__. The complicated problem was figuring out how, for a resurrected object, to know that we shouldn't free the object, and how we could safely call __del__ a second (and third...) time. I came up with a mildly complicated but safe / correct / workable scheme with relatively-little overhead--at the very least, a good starting point. And then! It turns out that there are new semantics for this as of Python 3.4, courtesy of PEP 442 "Safe object finalization". This guarantees that __del__ will only be called for an object exactly once. If an object is resurrected inside __del__, the second time its reference count drops to zero, the object is simply freed--__del__ isn't called again. These semantics are actually quite easy to support, particularly for the Gilectomy. Long story short, resurrection under __del__ is no problem.