|
|
Subscribe / Log in / New account

Lockless algorithms for mere mortals

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jonathan Corbet
July 28, 2020
Time, as some have said, is nature's way of keeping everything from happening at once. In today's highly concurrent computers, though, time turns out not to be enough to keep events in order; that task falls to an extensive set of locking primitives and, below those, the formalized view of memory known as the Linux kernel memory model. It takes a special kind of mind to really understand the memory model, though; kernel developers lacking that particular superpower are likely to make mistakes when working in areas where the memory model comes into play. Working at that level is increasingly necessary for performance purposes, though; a recent conversation points out ways in which the kernel could make that kind of work easier for ordinary kernel developers.

Concurrency comes into play when multiple threads of execution are accessing the same data at the same time. Even in a simple world, keeping everything coherent in a situation like this can be a challenging task. The kernel prevents the wrong things from happening at the same time with the use of spinlocks, mutexes, and other locking primitives that can control concurrency. Locks at this level can be thought of as being similar to traffic lights in cities: they prevent accidents as long as they are properly observed, but at the cost of stopping a lot of traffic. Time spent waiting for locks hurts; even the time bouncing lock data between memory caches can wreck scalability, so developers often look for ways to avoid locking.

Lockless list linking

Consider a highly simplified example: inserting an element into a singly-linked list. One possible solution would be to use a mutex to protect the entire list; any thread traversing the list must first acquire this mutex. If the thread inserting an element acquires this lock, it knows that no other thread will be traversing the list at the same time, so changes will be safe. But if this list is heavily used, this locking will quickly become expensive.

So one might consider a lockless alternative. If the list initially looks like this:

[linked list]

One could start by linking the outgoing pointer from the new element to the existing element that will soon follow it in the list:

[linked list]

At this point, the list still looks the same to any other thread that is traversing it. To complete the operation, the code will redirect the pointer from the preceding element in the list to the new element:

[linked list]

Now everybody sees the new list; no locking was required, and the view of the list was always consistent and correct. Or so one would hope.

The problem is that modern hardware makes things harder in the name of performance. The order in which a series of operations is executed may not be the order in which those operations are visible to other threads in the system. So, for example, it might well be that other threads in the system see the assignment of the two pointers above in the opposite order, with the result that, from their point of view, there is a window of time during which the list looks like this:

[broken linked list]

The outgoing pointer in the new element will contain whatever was there before the assignment happened, leading to a list that heads off into the weeds. There are few certainties in the world, but one can be reasonably confident in saying that little good will come from this situation.

A more complex view of memory

Another way to think about it is that locking provides a sort of Newtonian-physics view of the world; in a given situation, one always knows what's going to happen. At lower levels, life starts to resemble quantum physics, where surprising things can happen and few people can convincingly claim to understand it all. One can normally function quite well in this world without understanding quantum physics, but there are situations where it is good to have an expert around.

The Linux kernel has a few experts of this type; they have put a great deal of work over the years into the creation of the kernel's memory model, which is a description of how the kernel views memory and the how to safely perform operations where concurrency may come into play. The result is the infamous memory-barriers.txt documentation file and a whole raft of supporting materials. From this documentation, one can learn a couple of useful things for the list-insertion example given above:

  • Code traversing the list should read the next-item pointers with an "acquire" operation, which is a special sort of barrier. It guarantees that any operation that happens after the barrier actually appears afterward elsewhere in the system. In this case, it would ensure that, if a traversal reads a pointer to a given element, the assignment of that element's "next" pointer will already be visible.
  • Code that sets the pointer should do so with a "release" operation, which ensures that any other operations done before the release operation are visible before that operation. That will ensure that a new element's "next" pointer is seen correctly globally before the pointer to the element itself.

This example is complicated enough to explain, but it is as simple as these things get; most cases are rather more complex. To make things worse, optimizing compilers can create surprises of their own in pursuit of higher performance. The kernel's memory model strives to address this threat as well.

The problem with the memory model

Recently, Eric Biggers posted a patch fixing a perceived problem in the direct I/O code where a concurrent data-access situation lacked the appropriate barriers. There was some discussion about whether a bug actually existed or not; the problem according to Biggers is that this sort of concurrent access is deemed "undefined behavior", meaning that the compiler is granted license to pursue any evil agenda that might strike its fancy. The real dispute, though, was over the fix.

Dave Chinner, who is generally acknowledged as being a moderately competent kernel developer, complained that the resulting code was not something that could be readily understood:

I'm talking from self interest here: I need to be able to understand and debug this code, and if I struggle to understand what the memory ordering relationship is and have to work it out from first principles every time I have to look at the code, then *that is bad code*.

He was pointed to the memory-model documentation, but that did little to improve his view of the situation:

The majority of the _good_ programmers I know run away screaming from this stuff. As was said many, many years ago - understanding memory-barriers.txt is an -extremely high bar- to set as a basic requirement for being a kernel developer.

This documentation, he said, is aimed at people who spend their time thinking about memory-ordering issues. Everybody else is going to struggle with it, starting with the basic terminology used; even those who manage to understand it are likely to forget again once they go back to the problems they are actually trying to solve. Kernel developers would be better served, Chinner said, with a set of simple recipes showing how to safely code specific lockless patterns.

Biggers responded by posting a documented recipe for the "initialize once" pattern that was the source of the original problem in the direct-I/O subsystem. This pattern comes about when the initialization of a data structure is deferred until the structure is actually used, perhaps because that use may never actually happen. The initialization should be done exactly once; two racing threads should not both try to carry it out. The document provided several recipes of increasing complexity intended to match different performance needs.

While the attempt to provide useful recipes was welcomed, it became clear that a number of people felt that the effort had missed the mark somewhat. Darrick Wong, for example, pointed out that language like:

Specifically, if all initialized memory is transitively reachable from the pointer itself, then there is no control dependency so the data dependency barrier provided by READ_ONCE() is sufficient.

is not immediately clear to a lot of developers. Alan Stern's attempt to clarify it read like this:

Specifically, if the only way to reach the initialized memory involves dereferencing the pointer itself then READ_ONCE() is sufficient. This is because there will be an address dependency between reading the pointer and accessing the memory, which will ensure proper ordering. But if some of the initialized memory is reachable some other way (for example, if it is global or static data) then there need not be an address dependency, merely a control dependency (checking whether the pointer is non-NULL). Control dependencies do not always ensure ordering -- certainly not for reads, and depending on the compiler, possibly not for some writes -- and therefore a load-acquire is necessary.

This was seen as driving home the point that started the whole discussion: most developers do not think of memory this way and would really rather not have to. They simply do not think in this kind of language. As long as lockless algorithms require this sort of understanding, they will be underused and many of the implementations that do show up are likely to be buggy in subtle ways.

An alternative, as suggested by Matthew Wilcox, is to define an API for this sort of on-the-fly initialization and hide the details behind it. Developers who understand memory models and enjoy thinking about them can concern themselves with optimizing the implementation, while the rest of the kernel-development community can simply use it with the knowledge that it works as intended. There followed a discussion with several different ideas about how this API should actually look, but nothing emerged that looks like it could find its way into wider use.

This particular discussion has faded away, but the underlying problem remains. Successful software development at all levels depends on the management of complexity, but that is not yet really happening at the memory-model level. Sooner or later, somebody will come along with the right skills to both understand the Linux kernel memory model and to hide it behind a set of APIs that other developers can safely use without having to understand that model. Until then, writing lockless code will continue to be a challenging task for many developers — and the people who have to review their code.

Index entries for this article
KernelLockless algorithms
KernelScalability


(Log in to post comments)

Lockless algorithms for mere mortals

Posted Jul 28, 2020 20:26 UTC (Tue) by warrax (subscriber, #103205) [Link]

Anybody who needs convincing that lock-free is horrifically difficult should probably watch a few of Tony van Eerd's talks (C++, but that's largely immaterial).

This is the level at which we actually *NEED* machine-checkable proof. It's just so insanely difficult to get right even for tiny little "algorithms" that are involved in simple data structures like deque's and similar.

Lockless algorithms for mere mortals

Posted Jul 28, 2020 22:45 UTC (Tue) by excors (subscriber, #95769) [Link]

Machine-checkability does seem important. Recently I wanted to implement the widely-used Chase-Lev lock-free work-stealing queue in C++, but the original paper describes it with Java and completely ignores memory barriers. I found another paper with an optimised version using C11 atomic memory ordering, which spends five pages describing a proof of correctness, so it sounded like the authors knew what they were talking about and I implemented that. (The "optimised" part is valuable - their benchmarks show that using relaxed memory barriers where it's safe can double performance compared to using sequentially-consistent barriers everywhere.)

Then later I found a third paper about a tool that checks C/C++ atomic code against a specification, which mentions their tool found a bug in that other paper's C11 implementation. I think I ended up with the correct code, but it's really very tricky even when trying to follow supposed experts on a pretty simple algorithm, and it would be very nice if those checking tools were in more widespread use.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 14:41 UTC (Wed) by bgamari (subscriber, #102670) [Link]

> Then later I found a third paper about a tool that checks C/C++ atomic code against a specification, which mentions their tool found a bug in that other paper's C11 implementation.

Do you have a reference to this paper? Indeed I have used the C11 paper's Chase-Lev implementation in the past so I'm very likely missing a barrier.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 15:14 UTC (Wed) by excors (subscriber, #95769) [Link]

There's a corrected version of the buggy paper with C11 atomics: Correct and efficient work-stealing for weak memory models. The bug in the original published version was found by CDSChecker and CDSSpec, and their source repository has the original and bugfixed versions for comparison.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 18:18 UTC (Wed) by jezuch (subscriber, #52988) [Link]

> but the original paper describes it with Java and completely ignores memory barriers.

Note that Java has a formal memory model (that's how they can do anything performant without horrible memory unsafety), so it's likely that these memory barriers were not ignored, just... assumed. (But yeah, they should have at least mentioned which features of the memory model they rely on.)

Lockless algorithms for mere mortals

Posted Jul 31, 2020 5:51 UTC (Fri) by ncm (guest, #165) [Link]

Originally Java's memory model was unimplementable on Power and ARM. They had to doctor it, and it requires more attention than it once did. It is very possible that not everybody has caught onto the actual requirements. There is probably a great deal of Java, and C and C++ and everything else that actually works reliably only on x86. If there.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 7:02 UTC (Sun) by jezuch (subscriber, #52988) [Link]

It was probably very naive at the beginning, like a lot of Java. But I think the interesting thing is that, at least as I understand it, it's specified in very different terms than the C model, which uses esoteric jargon, while the Java model tried to make it at least understandable for mere mortals. Not sure how well it succeeds - usually you don't have to go into this level of detail unless you want to do something unusual, but at that point you're clearly very clever, so... ;)

Lockless algorithms for mere mortals

Posted Aug 12, 2020 18:49 UTC (Wed) by briangordon (guest, #135428) [Link]

It depends on what you're doing with it. If you're implementing a JVM I'm sure the technical details of the spec aren't easy to grapple with, but the practical implications are reasonably intuitive for developers. That's for isolated snippets of code though. In a complex system with lots of moving parts, it can still become ferociously difficult to maintain your invariants, so you usually want to work with abstractions that wrap the low-level primitives to make things as dead simple as possible. For example, the actor model has your code running in single-threaded actors that can only communicate by sending messages to each other - there's concurrent code under the hood, but you don't have to think about it.

Lockless algorithms for mere mortals

Posted Aug 1, 2020 17:23 UTC (Sat) by anton (subscriber, #25547) [Link]

Machine-checkability does seem important.
Or one might consider such a requirement (as well as the article) to be an indication that the concepts are too difficult to use. In other cases (e.g., wrt. page colouring), Linus Torvalds has written that he targets hardware that performs well without such complications, and that other hardware deserves what it gets. I think that is a sensible approach for memory ordering as well. Have a model that's easy to understand and performs ok on nice hardware (not sure if nice hardware already exists, but weak consistency certainly does not look nice to me); then hardware designers have an incentive to make their hardware nicer.

It seems to me that all this difficult-to-program memory ordering is there because multi-processors originally come from the supercomputing area, where hardware is still more expensive than software, and hardware designers can get away with making hardware that's hard to program correctly and efficiently.

Lockless algorithms for mere mortals

Posted Aug 3, 2020 10:12 UTC (Mon) by farnz (subscriber, #17727) [Link]

I would say that machine-checkability is necessary but not sufficient. If the rules are so complex to encode that a computer can't verify that you're getting them right, then they are also too complex for a human to get right, too, and they are certainly too complex for a human code reviewer to reliably check.

That said, this is a minimum requirement, as you can have rules that a computer can reliably verify, but that humans get wrong - see DEC Alpha ordering for the classic example.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 2:50 UTC (Wed) by dgc (subscriber, #6611) [Link]

> This is the level at which we actually *NEED* machine-checkable proof.
> It's just so insanely difficult to get right even for tiny little "algorithms" that
> are involved in simple data structures like deque's and similar.

That's what the "litmus tests" the LKMM documentation talks about is for - it's a formal method of proving your ordering construct behaves correctly within the constraints of the LKMM.

That is, of course, dependent on whether you can translate your desired code construct into the form that the litmus tools can use and it still means the same thing. That can be a challenge all in itself, especially as the litmus syntax is not defined anywhere in the LKMM. And if you think the LKMM documentation is impenetrable, you should go look at the documentation for the herd7 formal verification tools.....

-Dave.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 15:30 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

Thank you all for the bug report, and work has started on improving the documentation.

I would never have believed this two years ago, but it just might be possible to get to a point where someone reading only the tools/memory-model documentation might have a fighting chance of being able to make good use of LKMM. Here is hoping, anyway...

Lockless algorithms for mere mortals

Posted Jul 28, 2020 20:32 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

The problem with memory-barriers.txt is that it's optimized for being unintelligible by relying on loads of contrived, complicated examples using single-letter variables. A much simpler version could be something like: Whenever two memory accesses need to become visible to other CPUs in a particular order, a barrier is required between them both in code which writes to and code which reads from memory.

A little "premature optimization is the root of all evil" also applies here: Start simple, make it more complicated when and if this becomes necessary. Ie, use locking or the strongest kind of memory barrier until this demonstrably causes an issue.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 13:32 UTC (Wed) by Paf (subscriber, #91811) [Link]

Except your statement is far too simple to capture the actual detailed semantics the memory model is describing. The rules are much more complex, hence the detailed, formalistic explanations.

Leslie Lamport was asked about how RAFT is easier to understand than Paxos, and in his reply he noted people should be careful: it might be easier to understand because they had dropped rigor in the explanation.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 21:55 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

It doesn't say anything about different types of memory barriers and very intentionally so: It explains the basic problem which is supposed to be solved, including the need for barrier pairing, in sufficient detail that simple, lockless algorithms can be implemented based on this level of understanding. And even getting this out of memory-barrier.txt literally required a few years of head-scratching.

This kind of "formalistic explanations" aka "wild letter salad" are essentially write-only text. All of the constructs supposedly explained in the file are used in real code in the kernel. The documentation could be greatly improved by using real examples to explain the different semantics.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 22:25 UTC (Wed) by Paf (subscriber, #91811) [Link]

Right, so it doesn’t give the required level of detail for optimized implementations, which is what memory-barriers.txt is doing.

I’m all for specific examples and I agree memory-barriers.txt is bordering on unusable, but what you suggested is still too simple to replace it.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 16:56 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

One of the motivations for LKMM was the increasing difficulty of dealing with memory-barriers.txt. In fact, LKMM is intended to be an automated replacement for large portions of memory-barriers.txt.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 9:33 UTC (Thu) by adobriyan (subscriber, #30858) [Link]

> The problem with memory-barriers.txt is that it's optimized for being unintelligible by relying on loads of contrived,
> complicated examples using single-letter variables.

memory-barriers.txt can look like scientific paper to a layman indeed.

> A much simpler version could be something like: Whenever two memory accesses need to become visible to other CPUs
> in a particular order, a barrier is required between them both in code which writes to and code which reads from memory.

This can lead to cargo-cultish practices, people remembering 1 rule which doesn't quite extend to even simplest
complications -- what to do with 3 accesses.

I gave set of Linux lectures (including memory barriers!) and naturally had to do the exercises myself.

The picture with 2 cpus, 1 memory and the case on line 150 (2 cpus, 2 variables) is absolutely required
for understanding what's going on. It is like solving ax^2+bx+c=0. Do it from the beginning to the end once.
(Doesn't work and explain order 5 case, though).

There are 24 cases indeed and you _should_ write them all down and write all the outcomes.
There will be 4.

Now add one barrier, cross the cases which became impossible, there will be still 4.

Then add second pairing barrier, and there will be 3. This explains why 1 barrier is not enough.

This also explains why barriers are actually necessary -- to reduce the set of observable outcomes.

> A little "premature optimization is the root of all evil" also applies here: Start simple, make it more complicated when and
> if this becomes necessary.

Documentation/memory-barriers.txt could do better by moving "Device" down and "data dependency" to the bottom.
And there is "compiler barrier" section" in the document about memory barriers, OK.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 18:02 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

Please feel free to send a patch implementing your reordering suggestions.

Lockless algorithms for mere mortals

Posted Jul 28, 2020 20:41 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

A second remark: "Traffic light" is a really bad example for locking as "observing traffic lights" implies traffic being stopped regardless of concurrent, conflicting traffic. A better example would be a real lock (a river lock, not a door lock): Assuming a lock is presently unoccupied, it's a somewhat expensive delay for any individual boat which needs to pass it. Boats arriving while the lock is in use have to wait in a queue until it's their turn to enter it and can't do anything more useful then twiddling their thumbs until then.

Lockless algorithms for mere mortals

Posted Jul 28, 2020 21:03 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

Then you have things like multiversion concurrency control, where the standard analogies don't really work at all ("It's like a traffic light, if you made all the lights green, and then retrospectively try to figure out whether a traffic accident happened, and then if one did, you... why are you looking at me like that?").

Lockless algorithms for mere mortals

Posted Jul 29, 2020 13:24 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

:-)

This text has no function beyond getting the comment past the 10 letter minimum.

Lockless algorithms for mere mortals

Posted Jul 28, 2020 20:53 UTC (Tue) by roc (subscriber, #30627) [Link]

Rust has some good libraries for this sort of thing, like OnceCell for lazy initialization. The memory barriers are entirely under the hood and Rust's aliasing control prevents you from accessing memory without going through the necessary barriers (assuming you don't use "unsafe").

Lockless algorithms for mere mortals

Posted Jul 28, 2020 22:53 UTC (Tue) by gus3 (guest, #61103) [Link]

Wouldn't the example linked-list problem be solved with calls to mbarrier() after each pointer update? More generally, isn't this exactly what mbarrier() was created for?

Lockless algorithms for mere mortals

Posted Jul 28, 2020 23:09 UTC (Tue) by excors (subscriber, #95769) [Link]

Do you mean the kernel's "mb()"? That might work but I think it would have suboptimal performance, and you shouldn't be using lock-free algorithms unless you really care about performance. On x86, it looks like mb() is an MFENCE instruction while acquire/release are just compiler barriers (because the x86 memory model already guarantees the desired ordering), so it may be a very significant performance impact.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 1:55 UTC (Fri) by gus3 (guest, #61103) [Link]

Yeah, I meant mb(). I was typing from memory.

But I don't see how it would be sub-optimal. It's a direct request to hardware, to slam things into place in RAM before proceeding. It also invalidates remote cache, so other CPU's/cores will re-fetch the proper contents.

Yes, there might also be other cache lines to be flushed. What is the harm in writing them out? Would that be a timing issue, if you're recording an X264 video?

Lockless algorithms for mere mortals

Posted Aug 1, 2020 2:26 UTC (Sat) by HenrikH (subscriber, #31152) [Link]

Two full mb:s for inserting a single item in a linked list will probably be slower than just using a mutex for the whole operation, as stated above going lock-less is only of interest if you are after maximum possible performance and calling mb() at each step is not that.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 0:14 UTC (Wed) by ras (subscriber, #33059) [Link]

> Another way to think about it is that locking provides a sort of Newtonian-physics view of the world

That under sells it. It's not just Newtonian physics, it's all macroscopic physics. In relativistic space-time, about the only thing preserved is causation [0]. Another way of saying this if event A causes event B, then that is how it appears in all frames of reference. It's not possible to agree on the time it happened, or speed, or the mass of the participants - but the fact that A caused B - yes we can agree on that.

I am no physicist, but causation looks to be the bare minimum that needs to be preserved to have an objective reality. By that I mean if it isn't possible to agree on A caused B, then it becomes impossible to have a shared truth. If we can not agree on what is true, there is no shared reality.

I don't think it's a coincidence that in our computer world the inability to decide the order two events happen leads to a special kind of hell. The kind that leads some of the best programmers I know to advise "just don't use threads". I suspect physicists get themselves into the same sort of hell when they imagine what happens when the light cones of two related events separate, and yet we know some things (like spin) are preserved between those two light cones, even though we don't know what the actual value is. Worse, some interpretations of quantum mechanics say the universe doesn't know either. If you substitute "the universe" for "the programmer who wrote the threaded code, who should know what is going on", you get the idea.

If you want to dive deeper into the rabbit hole, the most tantalising part is energy is some sense equivalent to information. So when we say energy is always conserved, we are in effect saying information is never created or destroyed. The observation that in our Universe every interaction is reversible is saying the same thing, as to be reversible all the information needed to describe the old state must be in the new state otherwise we could not reproduce the old state, and hence it would not be reversible. Objectively information is changing all the time so that's hard to accept at face value. It's meant in the sense that the capacity of the system to store information never changes. A somewhat poor analogy is it's similar to saying the RAM size of a computer never changes, and the only operations you can perform on that RAM are invertible - like XOR. You can of course store a mind boggling number of different configurations information that RAM (1<<bits_of_ram in fact, which for a computer with 8GB of RAM is a number that can reasonably be described as mind boggling). Noether's theorem says if something is conserved, there must be a symmetry. For example momentum to be conserved space must be symmetric, that is the same everywhere. For energy the thing that must be symmetric is time. Since energy is information, that means at a very fundamental level information and time are related concepts - one can't exist without the other.

[0] That is an exaggeration. We have laws of conservation - energy (and by implication mass) is conserved (if you ignore dark energy), charge is conserved, spin is conserved, and probably some others things.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 0:30 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> I am no physicist, but causation looks to be the bare minimum that needs to be preserved to have an objective reality. By that I mean if it isn't possible to agree on A caused B, then it becomes impossible to have a shared truth. If we can not agree on what is true, there is no shared reality.
That's not quite true in distributed systems. There's a class of data structures that achieve a consensus without strict ordering: https://en.wikipedia.org/wiki/Conflict-free_replicated_da...

Lockless algorithms for mere mortals

Posted Jul 29, 2020 1:09 UTC (Wed) by ras (subscriber, #33059) [Link]

Yes, they can reach a consensus. But they can't do that unless they exchange information after the fact. Until that point they have two sets of observations that don't agree. It's only after they meet that they can decide what they will agree on.

In contrast, if they independently observe A caused B they don't need to be in contact to know they agree on the ordering. They can be certain the other one saw A caused B without every meeting to agree on it.

The example the wikipedia article you linked to is an excellent example. The database they must agree on is a single bit. The update performed is to set the bit to 1 if an event occurs, but since it is a distributed system that event may not be seen by some databases, so some "observe" 0, and some observe a 1. Clearly that is different. To agree they observers must meet and then follow the logic "since it is never reset to 0, and since one participant saw the event happen it must have happened, ergo after meeting we can all agree on 1".

Lockless algorithms for mere mortals

Posted Jul 29, 2020 17:58 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> In contrast, if they independently observe A caused B they don't need to be in contact to know they agree on the ordering. They can be certain the other one saw A caused B without every meeting to agree on it.

There's also the Spanner approach (put atomic clocks in all of your datacenters, get them all nice and synchronized, and then everyone can agree that A happens-before B because A and B are both timestamped and you know the resolution of your clocks is finer than the difference between those timestamps). But that requires hardware support, because regular clocks are insufficiently precise for timestamping to be useful in most realistic applications. Also, this is clearly useless for something like multicore concurrency within a single machine (you're not sticking an atomic clock on each core, that would be ridiculous).

Lockless algorithms for mere mortals

Posted Jul 29, 2020 19:40 UTC (Wed) by Wol (subscriber, #4433) [Link]

Except that time depends on velocity, and velocity depends on gravity, and and and. So the only way to synchronize time in all your datacentres is to have only one large datacentre.

Except that then you know that the 1st floor has time moving at a different speed to the ground floor, and probably the front wall is moving at a different speed to the back ...

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 29, 2020 21:53 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

Spanner is not hypothetical. Google is both using it internally, and selling it on GCP (at an extortionate rate, but it is there if you really want it). It actually works as I described. The simple reality is that relativistic effects are too small to matter when you're operating on the scale of milliseconds, on a single planet. If you want to set up datacenters on Mars, or synchronize things at the nanosecond level, then you might have a problem.

(Disclaimer: I work for Google. I do *not* get paid to promote Spanner or any other Google product.)

Lockless algorithms for mere mortals

Posted Aug 1, 2020 16:35 UTC (Sat) by anton (subscriber, #25547) [Link]

All the data centers are at rest wrt each other, and wrt to the events they observe (Earth rotation may be an issue, but not at the ms resolution they work with). Differences in the gravity field are miniscule, but even if they were significant, they would only change the time scale, not the order of events; and scale can be corrected by scaling the measured time to correct for this effect.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 13:05 UTC (Thu) by sriram_srinivasan (guest, #140506) [Link]

Even Spanner's clocks have uncertainty in the millisecond range (7ms according to the paper), which is the reason for "commit wait". It has a considerable effect on throughput and latency. There are other options that rely on causal clocks and other ways of structuring transactions -- the Calvin database comes to mind -- that don't have to pay this penalty.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 15:15 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Well sure, it's just software, not magical pixie dust. I was responding to the assertion that you need a causal relationship between A and B to order them, which is simply untrue.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 2:40 UTC (Wed) by creemj (subscriber, #56061) [Link]

> In relativistic space-time, about the only thing preserved is causation [0]. Another way of saying this if event A causes event B, then that is how it appears in all frames of reference. It's not possible to agree on the time it happened, or speed, or the mass of the participants - but the fact that A caused B - yes we can agree on that.

It would be better named the "Theory of Invariants" than the "Theory of Relativity". If event A occurring at (x_a, t_a) causes event B occurring at (x_b, t_b) then all observers agree on the value of c^2 (t_b-t_a)^2 - (x_b-x_a)^2. That is invariant. Actually it is invariant whether there is a causal relationship or not between A and B. There are other invariants such as the ultimate speed c, the mass m which is the same as the Newtonian mass (about which the author of the quote above is misinformed), and invariants involving momentum with energy and wavenumber with frequency.

I am no expert on memory models and I am not convinced that this analogy necessarily transfers to memory models, but maybe it would be interesting or useful if a memory model could be expressed in terms of invariants rather than those things that can change?

> The observation that in our Universe every interaction is reversible

Really? I place a hot cube of water in contact with a cold cube of water, isolated as best I can from the environment, and they both end up at the same lukewarm temperature. But the reverse is never seen to occur!

Lockless algorithms for mere mortals

Posted Jul 29, 2020 3:42 UTC (Wed) by ras (subscriber, #33059) [Link]

> Really? I place a hot cube of water in contact with a cold cube of water, isolated as best I can from the environment, and they both end up at the same lukewarm temperature. But the reverse is never seen to occur!

Yes, but that is because statistically unlikely for the ice cube to reform, not because it can't. The individual changes that make up the transition are all reversible. In theory, the only reason you haven't see it is because you haven't waited long enough. In practice long enough is likely after the heat death of the universe.

Put in more concrete terms, all the possible combinations of position and vibrational levels of each of the molecules constitute the state space of the system. All possible arrangements of those states are equally likely. However, you can't perceive those individual arrangements, you take a far more macro view. In that macro view, far, far more of those states look like the macro state "ice cube melted in water" than the macro state "ice cube floating in warm water". The system is undergoing constant random thermally driven transitions between one state and another. Since in there are far more "ice cube in water" states, over time that's where we end up.

The short hand for all that is: the 2nd law of thermodynamics. In mathematical terms it just says if you measure the outcome random experiments the average of those outcomes will tend toward the mean as the number of experiments increases. Casino's are profitable because of the same thing. However, that does not mean someone occasionally can't get a big win at a Casino, it's just statistically unlikely.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 6:11 UTC (Wed) by madhatter (subscriber, #4665) [Link]

You seem to be proposing that the 2nd law of thermodynamics (2T) says that, when evaluated over the universe as a whole, entropy must always rise. You say that it's fine for it to temporarily drop in certain lucky locales, it's just that the number of spontaneously melting ice cubes in cups of tea significantly outweighs the number of spontaneously freezing ones in similar cups of tea, and so over the universe as a whole entropy still rises.

It's been a long time since I was a professional physicist, but as I recall 2T says that in any isolated system the entropy count can never decrease. 2T is a local, as well as a global, truth. That is to say, unless there's an interaction across the system border of any given nearly-melted ice cube in a cup of tea - eg, I stick a cooling coil in and shunt heat outside the local system - it can't spontaneously refreeze. A thermodynamically-exploitable temperature gradient will not spontaneously arise without intervention.

The casino analogy is a poor one because very few players in a casino are isolated. Any player who is - one playing solitaire in a broom cupboard, for example - will find that they never win big, because there's nobody to win from. Moreover, they can only not lose if they stop eating the chips (entropy count must rise in an isolated system unless all local processes are thermodynamically reversible).

Lockless algorithms for mere mortals

Posted Jul 29, 2020 7:43 UTC (Wed) by ras (subscriber, #33059) [Link]

> A thermodynamically-exploitable temperature gradient will not spontaneously arise without intervention.

My guess is that is demonstrably false. Lets say your system consists of 3 gas molecules 3m x 3m x 3m room. I doubt anyone's checked, but it seems certain those 3 molecules will end up on one side of the room at some point.

That is not terribly realistic of course. In real life a 3m x 3m x 3m room at 1atm has about 5e25 molecules in it. Statistically, I think we can say we are fairly safe from someone's ear drums bursting because all the air ended up on one side of the room. However, nothing has changed except the number of molecules. As far as I know nothing in physics says the universe behaves differently because that number has changed.

Your mind is now telling you that you can extract energy from that if it happened, so it must violate the 1st law of thermodynamics. I don't know the answer to that, but I don't feel too bad about it because it took about 50 years to figure out why Maxwell's daemon was a myth. I'm sure the universe has accounted for it somehow. It probably has to do with the fact that you have to wait for the universe goes dark to have a fair chance it will happen, and by that time the 2nd law has recovered the energy from you evaporating. [0]

> The casino analogy is a poor one because very few players in a casino are isolated

You play against the house, not the other players. Example: in Keno, the odds a published and fixed. So you can place a minimum bet and win the maximum even if no one else plays that round.

[0] That was a joke. Well part of it was. If it's not accounted for somewhere, it's where dark energy comes from [1].

[1] Also a joke. I really have no idea. [2]

[2] Actually, it could well be the Maxwell daemon thing again. Sitting there, actively monitoring the room so you know when all the gas molecules are on one side of the room takes an external energy source.

Lockless algorithms for mere mortals

Posted Sep 6, 2020 17:43 UTC (Sun) by nix (subscriber, #2304) [Link]

> [2] Actually, it could well be the Maxwell daemon thing again. Sitting there, actively monitoring the room so you know when all the gas molecules are on one side of the room takes an external energy source.

This was figured out a decade or so ago. You don't need the cost of monitoring: even if that's zero, the mere fact that the demon has to make a decision about whether to allow a given molecule through or not is enough to ensure that the entropy of the system (demon + box) always increases, given that the demon's memory capacity is finite such that it eventually has to erase the states of some of its memory (increasing its entropy) in order to make more decisions about whether to let molecules through.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 7:46 UTC (Wed) by cyphar (subscriber, #110703) [Link]

I don't think you're disagreeing with @ras -- the reason why the second law of thermodynamics applies to any closed system is because it is so statistically unlikely so as to be impossible for a closed system to move towards a lower-entropy state. The second law of thermodynamics isn't magic and it isn't a force being applied to thermodynamic systems -- it's a consequence of the fact that thermodynamic systems have so many microstates that the probability random perturbations will result in you moving to a lower-entropy microstate is effectively zero.

However, if you had a thermodynamic system with a handful of particles, you would observe the system move into lower entropy configurations purely by chance. All of the interactions occurring in thermodynamics are reversible, it's just that the likelihood of such interactions occurring is effectively zero. The same applies to even more mundane physics like the kinematics of billiard balls -- if you imagine a perfectly-spaced grid of one billion billiard balls being struck by another ball, while the interaction is entirely reversible it's incredibly unlikely that you'd be able to collide the balls in such a way that they would form a perfectly-spaced grid. The second law of thermodynamics is just a far more formalised version of that intuition.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 8:31 UTC (Wed) by Wol (subscriber, #4433) [Link]

And this is where it gets messy. DEFINE ENTROPY.

Sorry, I can't quote the source beyond "I saw it in New Scientist", but there was a wonderful article where someone had two different (equally plausible) definitions of entropy, AND ONE OF THEM DECREASED.

The conclusion was the 2nd law was still valid, but it was no longer entropy it was measuring. It was information. Because iirc that made the correct selection between the two different definitions of entropy.

Isn't science wonderful :-)

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 29, 2020 9:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> DEFINE ENTROPY
Easy. It's the number of microstates of a system. There are other ways to re-formulate this definition, but they are equivalent to each other.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 11:18 UTC (Wed) by Wol (subscriber, #4433) [Link]

How come those two states were equally plausible but different?

Mind you, your definition looks like the new version "information == entropy".

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 29, 2020 16:28 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

It's not information per se. Entropy is simply the measure of the number of microstates of a system, and that's all it is.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 1:55 UTC (Thu) by cyphar (subscriber, #110703) [Link]

In thermodynamics, S = kB ln Ω by definition. The term "entropy" is used in many different fields (such as computer science) and can mean different things (such as Shannon entropy), but in thermodynamics it has a specific meaning -- it is a measure of the number of microstates which appear as a single macrostate. Entropy increases because systems tend towards the macrostates which have the most microstates (again, purely because of statistics).

Lockless algorithms for mere mortals

Posted Jul 31, 2020 21:06 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> Entropy increases because systems tend towards the macrostates which have the most microstates (again, purely because of statistics).

The problem with this argument is that it is time-symmetric. In other words, you can just as easily argue that the present macrostate is more likely to have evolved from past macrostates which had more microstates. Observation tells us that the time-reversed argument is empirically wrong (because we observe entropy increasing as a function of time), which is a bit of a problem because it appears to be symmetric with the (empirically correct) non-reversed argument. The only (obvious) way to break this symmetry is to assert a boundary condition of low entropy at or near the beginning of the universe. This boundary condition, IMHO, is the real mystery: Why did the early universe have such low entropy? I don't think we have a working answer to that question (yet).

Lockless algorithms for mere mortals

Posted Aug 1, 2020 4:11 UTC (Sat) by ras (subscriber, #33059) [Link]

> In other words, you can just as easily argue that the present macro state is more likely to have evolved from past macro states which had more microstates.

Actually no, you can't. In the definition of entropy the number of microstates is fixed and you are measuring how many of those microstate arrangements look like a given macro state.

That aside, no one is disputing there is a thermodynamic arrow of time. The argument being made is that all transitions between microstates are reversible, lossless in terms of information and equally likely in both directions. It's possible for that to be true for micro states and yet there be preferred direction for the evolution of macro states. It's not only possible, it's what happens.

It's not only what happens, there is no mystery as to why it happens. For example, take a go board covered with black and white stones and assume we are near blind. If all black stones are on one side of the board and white on the other we can see that, so black/white is a macro state, but the rest of the possible microstate arrangements just look like grey to us, so grey is another macro state. Back/white corresponds to a few micro states where the black and white stores are poorly mixed. I think there are 361! / 181! possible board arrangements or about 10^431. Most of those states will look just grey - a random mixture of black and white. If the pieces are moving randomly, all of the microstates will be equally likely. This means the odds of seeing black/white instead of grey look to be about 1 in 10^245 if you assume 10% can be out of place. [0] If your go board starts in the black / white macro state, then is allowed to evolve through random 100% completely reversible symmetric swaps you will see it change to grey, and stay that way.

That is the thermodynamic arrow of time. Yet it arose from a completely reversible process. Microstates and macro states are both reasonable ways of looking at the system. When we say time is reversible, we are talking about the former.

[0] Caveat: I am deriving the formulas for these probabilities in my head, as I write this. They are almost certainly wrong.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 18:27 UTC (Wed) by ianmcc (subscriber, #88379) [Link]

The second law is about probabilities. For an isolated system, the probability that entropy decreases in some given time period is exponentially small in the number of degrees of freedom. For a sufficiently small system, this will happen quite often. For a macroscopic system the probability is so small that you will never see it in the lifetime of the universe. But in principle, every isolated system eventually returns to its initial state (this is the Poincaré
recurrence theorem), so in that sense the entropy is cyclic, if you start from a very low entropy state then eventually it will return to that low entropy state again. But the time it takes to get there is again exponentially large in the number of particles, so the time required very quickly exceeds any sensible number (even using units of the lifetime of the universe).

Lockless algorithms for mere mortals

Posted Jul 29, 2020 10:18 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

In casino it is trivial to win if you have unlimited supply of money. Just put X money on the black. If that looses, double the amount and put on the black again (well, slightly more than double than double to account for zero). Eventually you will win and the total win will be X. That is why a guy with a lot of money can cause a trouble as they may bankrupt the casino before they run out of money doubling the bets.

As for ice cubes it is more accurate to state that at the room temperature there are vastly more states of water that resemble liquid than the number of states that resemble liquid and an ice cube. And the reason for that is that while a few individual molecules in water can be ice-cold, one needs for all such molecules to come to one place, and that is extremely-extremely-extremely unlikely.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 12:58 UTC (Wed) by ras (subscriber, #33059) [Link]

> In casino it is trivial to win if you have unlimited supply of money.

As I've grown older, I've become very wary of arguments that involve infinities. All sorts of weird things happen. For example we are finite beings, therefore we can only make finite changes. If we make a finite change to an infinity the relative change is infinitesimal, so it looks like nothing has changed. Therefore if there is an infinity out there we can perceive, we can't change it. So to us an infinity is not something immense or huge, it's a constant, unchangeable by us, in fact unchangeable by anything other than another infinity. If we do happen to see an infinity change an infinity what we will see is a constant changing a constant.

At this point I trust you understand my distrust in my understanding of infinities.

The final nail in the coffin in my confidence in my ability to reason about infinities was the proof of the halting problem. It looked so bloody simple when it was presented in my undergraduate course, the proof almost self evident once it was explained. Then 40 years later, I read an article proclaiming the halting problem and Gödel's incompleteness theorem are one and the same thing; mostly. It came as a surprise as I found the incompleteness theorem near inscrutable. Turned out that was because Gödel needed a computer language for his proof. Having none handy invented his own just for the job which was so butt ugly it was unrecognisable - to me anyway. People cleverer than me did spot it and blogged about it. It was a beautiful read. More pertinently, another gem dropped out of the argument the two were the same - and that was they involve an infinity. It's the tamest of infinities - countably infinite, which is a fancy way of saying no upper limit. But you see once you realise the halting problem is really saying you can't prove something will halt if it could take an arbitrarily long time to do so the proof loses most of its magic. In fact, once you put aside the obstacle the "halting theorem shows it's not possible" it isn't too difficult to discover you can prove whether any finite system will halt or not. You may need a lot of resources to do it, but the amount of resources is finite and bounded. Since I haven't seen too many constants changing constants (none whatsoever in fact), I'd say I live in a finite world. That means something I believed in for 40 years doesn't actually apply to me, and it's all because of a misunderstanding created by an infinity.

I guess you are saying that if you have an infinite amount of money, on each subsequent bet you place your losses to date + $1. Eventually you must win a bet, ergo eventually you must win $1. You probably think I am chafing at the infinite amount of money. I am, but there is a second hidden second infinity. The only condition under which you are guaranteed to win a bet is if you are prepared to place an infinite amount of bets. So now we have two infinities, and for your statement to be true you must be always able to place more bets than the infinite amount of bets it takes to guarantee a win.

Clearly I'm no expert on the matter, but I vaguely recall infinities aren't comparable. It's sort of like asking if two constants fight, which one will be the victor.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 13:44 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

My education in physics made me very sceptical about infinities as well. I should have clarified that "unlimited" was used above in a sense like "above an implied relevant limit". That is the amount sufficient to make the probability of loosing money too small to be relevant.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 4:31 UTC (Thu) by nash (subscriber, #50334) [Link]

> In casino it is trivial to win if you have unlimited supply of money. Just put X money on the black. If that looses, double the amount and put on the black again (well, slightly more than double than double to account for zero). Eventually you will win and the total win will be X. That is why a guy with a lot of money can cause a trouble as they may bankrupt the casino before they run out of money doubling the bets.

If you have an infinite amount of money, bankrupting a casino is a poor use of your time. And you need an infinite amount of time as well as money.

Wikipedia covers is reasonably well:
https://en.wikipedia.org/wiki/Martingale_(betting_system)

Lockless algorithms for mere mortals

Posted Jul 29, 2020 8:14 UTC (Wed) by joib (subscriber, #8541) [Link]

>> Another way to think about it is that locking provides a sort of Newtonian-physics view of the world

> That under sells it. It's not just Newtonian physics, it's all macroscopic physics.

An explanation I've read somewhere is that sequential consistency is like Newtonian physics. The model is that you have a switch that connects memory to one processor at a time, and only that one processor can interact with memory, and all effects have propagated when you turn the switch to another processor etc. In effect everything propagates instantaneously.

Relaxed memory models are like relativistic physics, in that each processor has its own view of memory (its own light-cone if you wish), effects propagate at finite speed and different processors may see events in different order.

I don't see the necessity to bring in quantum physics here. Certainly not the pop-sci "ooh, this is weird, thus it must have something to do with quantum mumble-mumble something". And yes, I'm a (former) physicist.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 8:24 UTC (Wed) by Wol (subscriber, #4433) [Link]

> [0] That is an exaggeration. We have laws of conservation - energy (and by implication mass) is conserved (if you ignore dark energy), charge is conserved, spin is conserved, and probably some others things.

Well, actually, energy doesn't (or so it is believed) exist, in the sense that the sum of energy in the Universe is zero. It is a signed quantity, so potential gravitation energy is equal in magnitude but opposite in sign to mass energy. That's how black holes evaporate as the Universe gets colder ...

As for dark energy, this is merely my theory, but I believe it doesn't exist ... Just as we have Boyle's Ideal Gas Law, I think we have mucked up the formula for gravity. I think it's assumed to be G=g.m1.m2/d^2 (I've forgotten my Physics :-). But imho this is probably the IDEAL gravitational equation, and as we move into more open space gravitons coalesce into a "superfluid" and d^2 tends to d. This would explain the fact that gravity appears to get stronger, the emptier space gets. In other words, gravitons behave exactly like gas molecules, as they get colder they move from the gaseous phase towards the solid phase.

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 29, 2020 13:41 UTC (Wed) by Paf (subscriber, #91811) [Link]

Wol,

Lots of versions of this have been formulated and tested against astronomical data; look up “modified Newtonian dynamics” for what I think is the most serious and current version. The conclusion of most physicists is that while this sort of thing is a compelling idea, it doesn’t match observations (and probably can’t ever). There are still a few modified gravity models out there clinging on with serious debate - not just cranks - over whether they match reality, I think generally under that modified Newtonian dynamics label.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 14:01 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

The problem is that dark matter is natural consequence of using Newtonian mechanics with minimal relativistic corrections to model galaxies as we cannot practically calculate evolution of galaxies using General Relativity. Yet we do not have a proof that what is done is a valid approximation. Moreover, there are exact solutions for General Relativity that show that Newtonian approximations on a galaxy scale can be very wrong. So maybe what is attributed to dark matter and even dark energy!) can be just calculation artifacts.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 22:29 UTC (Wed) by Paf (subscriber, #91811) [Link]

I’m pretty sure we have a pretty good idea of the scale of error the corrections introduce, probably a pretty tightly defined one, in fact. The results of our modeling suggest that some huge % of the matter and energy in the universe is dark. I think if that were in the realm of uncertainty introduced by the corrections the issue would get serious airtime in the mainstream physics community. That it does not seems indicative.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 23:12 UTC (Wed) by ibukanov (subscriber, #3942) [Link]

We really do not know if the current computational methods for a galaxy based on Newtonian mechanics with corrections to account for finite speed of light and gravity are valid approximations. Mathematically we cannot proof that such approximation is sound due to extreme non-linearity of equations of General Relativity. Neither we can numerically compare results since modeling a galaxy with General Relativity is not yet feasible.

And again, for extremely simplified toy models where General Relativity is solvable the results indicate that the Newtonian approximation may not be valid and General Relativity alone can explain rotational curves for a galaxy without any dark matter. Now those toy models are too naive, but they emphasize that we cannot rule out that dark matter is not an artifact of an unsound approximation.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 4:06 UTC (Thu) by alison (subscriber, #63752) [Link]

ibukanov correctly refers to, "Newtonian mechanics with minimal relativistic corrections". Problems with correct order of events in physics have little to do with quantum mechanics and everything to do with special relativity. While quantum mechanics has its own paradoxes, special relativity has its own "relativity of simultaneity": https://en.wikipedia.org/wiki/Relativity_of_simultaneity

Lockless algorithms for mere mortals

Posted Jul 29, 2020 22:57 UTC (Wed) by Wol (subscriber, #4433) [Link]

Looking at the wikipedia page, it looks like my idea is roughly like the ideas that seem close to reality.

Just like when you add velocities you have to add some relativistic correction as they get higher so that c+c=c, my theory is adding a correction as mass density tends to zero.

And surely it's not too outlandish to assume that gravity - which we now know is transmitted by gravitons - will be seriously affected if gravitons form a Bose-Einstein condensate in intergalactic space?

If dark matter doesn't exist, then we need need some way of strengthening gravity as mass-density drops, and assuming gravitons condense and transmit it more efficiently just seems to fit with everything else we know about the Universe. We just need to get the maths right :-)

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 30, 2020 15:12 UTC (Thu) by joib (subscriber, #8541) [Link]

> gravity - which we now know is transmitted by gravitons

We do?

I mean, sure, quantum gravity is an active research topic, but AFAIK there's been no breakthrough recently. Did I miss something major?

Lockless algorithms for mere mortals

Posted Jul 30, 2020 18:17 UTC (Thu) by Wol (subscriber, #4433) [Link]

Theory says gravity is transmitted by gravitons, which are the particle equivalent of a gravitational wave (think photons/em-radiation).

So having detected gravitational waves, we know (assuming there isn't a major screw-up in the theory) that gravitons must exist.

Of course, just as with electromagnetics, I think it's the same problem of we can know where it and what it is, just not both at the same time :-)

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 30, 2020 20:50 UTC (Thu) by joib (subscriber, #8541) [Link]

No, this is putting the cart before the horse.

Gravitational waves, as e.g. detected by LIGO, are predicted by general relativity. No need for quantum gravity. And without quantisation, no graviton either.

That's not to say quantum gravity isn't a thing. But so far we have neither observations suggesting it nor any convincing theory for it.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 8:47 UTC (Fri) by Wol (subscriber, #4433) [Link]

Are you saying that gravity is the only known wave without an associated particle?

A photon is an electro-magnetic wave. An electron is a wave (not sure what in :-). Symmetry says surely a gravitational wave has a graviton?

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Jul 31, 2020 11:01 UTC (Fri) by joib (subscriber, #8541) [Link]

I'm saying we don't know. The standard model in particle physics encompasses every known interaction, except gravity. We don't have a functioning theory of quantum gravity. That doesn't mean that gravity isn't quantised, but it could alternatively mean that gravity is fundamentally different from the other interactions (say, describing instead the curvature of space-time like in GR). We just don't know yet.

Gravity waves neither prove nor disprove that gravity is quantised, just like a plethora of electromagnetic wave phenomena that neither prove nor disprove that the electromagnetic field is quantised.

And to be pedantic, a photon is a quantised excitation in the electromagnetic field (not the field itself). An electron is a quantised excitation in the electron-positron field. Sure, it would be nice and symmetrical if there analogously would be a "graviton". Unfortunately nature doesn't much care for our notions of beauty and symmetry, and so far has resisted our efforts to crack this particular nut.

Lockless algorithms for mere mortals

Posted Sep 7, 2020 12:38 UTC (Mon) by nix (subscriber, #2304) [Link]

Quite. Another way of putting it is that if gravitons exist, gravity is transmitted by *virtual* gravitons. Virtual particles are not real, thus cannot form a Bose-Einstein condensate. (And if real gravitons exist, which is debatable, they are going to be massless, only appear when masses *move* (just like photons in the electromagnetic field only appear when charged particles move), and incredibly low-energy: the best upper bound on the graviton's Compton wavelength is over a light year! Using a massless ultra-low-energy particle to explain away the majority of the mass-energy density in the universe seems to me to be putting the cart so far before the horse that it's in a different solar system. And while a BEC of gravitons is theoretically possible -- a BEC of photons has been produced, after all -- it seems massively unlikely to ever happen in the real universe.)

Also, since both move at lightspeed, it makes about as much sense to say that gravitons can "get colder" as it does to say that light can "get colder".

Lockless algorithms for mere mortals

Posted Sep 7, 2020 16:46 UTC (Mon) by Wol (subscriber, #4433) [Link]

> Using a massless ultra-low-energy particle to explain away the majority of the mass-energy density in the universe seems to me to be putting the cart so far before the horse that it's in a different solar system.

Except that's not what I'm doing. Sorry if I've mangled my physics, but ?Boyle's Ideal Gas Law only applies to a hot gas where the atoms/molecules bounce off each other. As the gas cools, the Gas Law changes.

I'm saying that, just like with velocities where at slow speeds v1+v2 gives us the obvious answer but c+c gives us the extremely unintuitive answer of c, might not the gravitational law ALSO change with mass-energy density, so that gravity is stronger in regions of low mass density like interstellar and intergalactic space? After all, isn't that one possible explanation for what we observe?

Or, given that somebody said that gravitational waves travel at light speed, but light speed is not constant (c is a theoretical maximum), maybe the waves travel faster in low mass density?

Cheers,
Wol

Lockless algorithms for mere mortals

Posted Sep 7, 2020 19:19 UTC (Mon) by nix (subscriber, #2304) [Link]

> might not the gravitational law ALSO change with mass-energy density, so that gravity is stronger in regions of low mass density like interstellar and intergalactic space?

This is the basis of Mordehai Milgrom's MOND (and a number of other similar things, some of which, unlike MOND, are modifications of relativity rather than Newton). They're not widely accepted because while they get galaxies' rotational velocities right (it would be surprising if they didn't), unfortunately, if you attempt to simulate the structure and evolution of the large-scale universe using MOND or similar theories, the results look very very different from what we observe. The people involved in MOND are major figures (not least Jakob Bekenstein, who nobody could claim doesn't know his stuff where gravity is concerned), but MOND remains... problematic.

(But it has nothing to do with Boyle's law or with slowing of anything: MOND isn't even relativistic, let alone quantized, and doesn't mention gravitons at all. Its adjustments to Newton are ad-hoc to make the rotation curves of galaxies come out right.)

> Or, given that somebody said that gravitational waves travel at light speed, but light speed is not constant (c is a theoretical maximum), maybe the waves travel faster in low mass density?

Massless particles (including photons) travel at c, always: don't think of it as the speed of light, think of it as the speed of propagation of cause and effect, and massless particles max this out. In a medium, the apparent speed of light waves in particular (but no other massless particles) is reduced by coupling to the electromagnetic fields of charged particles in the medium (largely electrons): but this is a group effect on the wave as a whole, and the photons that make up the light are still moving at c!

I suppose the same class of effect in theory could apply to changes in gravity, since gravitational waves should I believe couple to all masses they pass in the same way and slow in regions with higher mass density, but gravity is such a weak force that the effect would be incredibly tiny: it would almost certainly be as unobservable as the graviton itself even if you were hanging out next to a black hole. (So far, nobody has thought up a way to produce a graviton detector which isn't so massive it collapses into a black hole. Gravity is *ridiculously* weak.)

Lockless algorithms for mere mortals

Posted Sep 7, 2020 20:07 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> if you attempt to simulate the structure and evolution of the large-scale universe using MOND or similar theories, the results look very very different from what we observe
This is actually questionable.

Moreover, it has not been conclusively proven that galactic rotation curves can't be explained by general relativity alone. Exact solutions of Einstein field equations are too simplistic for that and computer simulation is way too complicated for something like a galaxy. There are people working on this, but this is very un-glamorous area of research.

Lockless algorithms for mere mortals

Posted Sep 7, 2020 12:27 UTC (Mon) by nix (subscriber, #2304) [Link]

That's how black holes evaporate as the Universe gets colder ...
For the record, this is completely wrong. Black hole evaporation is a consequence of the differing appearance of the weave of spacetime to observers very close to, versus far from, the event horizon: a local version of the Unruh effect, as it were (or, rather, the Unruh effect is the same thing as Hawking evaporation applied to accelerating observers rather than observers in a gravitational field). It depends only on the mass of the hole and its event horizon radius. It happens even when the universe is hot: it happens even when the hole is still in the middle of its exploding progenitor star. It's just that until the universe is very old and cold (or the hole is exceptionally small, thus with a high Hawking temperature), the hole will gain more mass through absorbing intercepted microwave background radiation than it loses through Hawking radiation. But it's still losing mass through this effect all the time, nonetheless.

The process has nothing to do with the sign of gravitational potential energy at all.

(Yes, the Unruh effect is named after the same Bill Unruh who used to hang out on uk.comp.os.linux.)

Lockless algorithms for mere mortals

Posted Jul 29, 2020 11:11 UTC (Wed) by excors (subscriber, #95769) [Link]

> If you want to dive deeper into the rabbit hole, the most tantalising part is energy is some sense equivalent to information. So when we say energy is always conserved, we are in effect saying information is never created or destroyed.

Some people have argued that will become a practical issue for computers. The standard AND/OR/XOR/etc gates destroy information by converting 2 bits of input to 1 bit of output, but information is entropy and entropy can't decrease in a closed system, so in a physical implementation the decrease in logical information has to be balanced out by emitting heat. That puts an ultimate limit on the power efficiency of a non-reversible computer. To beat that limit, it's necessary - and theoretically possible, though far from straightforward - to design computers with reversible logic gates instead.

(https://spectrum.ieee.org/computing/hardware/the-future-o...)

Lockless algorithms for mere mortals

Posted Jul 29, 2020 18:27 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

It's unclear to me how you would implement free(3) with a reversible computer using a finite amount of RAM. If you want to reuse the memory for something else, won't it have to be erased at some point? Or is the whole idea to minimize erasure, so that you only erase things that actually need to be erased?

Lockless algorithms for mere mortals

Posted Jul 29, 2020 19:17 UTC (Wed) by excors (subscriber, #95769) [Link]

Yeah, that's why it's not straightforward :-) . As I understand it (which is admittedly not much), every reversible program with N bits of input will produce N bits of output. Typically the input would consist of some useful data, plus some bits that are required to have certain values for the algorithm to work correctly. The output would consist of some useful data, plus some 'garbage' that is only there to ensure reversibility.

You could extract the useful data and discard the garbage; but then you're destroying information and creating heat, so it's only helpful if you have an efficient algorithm that generates very little garbage, and is not generally worthwhile. Apparently the trick is to copy the useful output, then run the program in reverse to convert the full output (including garbage) back into the input (including constants), so now you have the input and output and constants and no garbage. If you then want to discard the input, I think the argument is that that's okay because the cost is bounded by the size of the input, not by the amount of computation.

You're probably not going to run Linux on a computer like that, you'll have to rethink everything from the ground up.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 23:47 UTC (Wed) by ras (subscriber, #33059) [Link]

> If you want to reuse the memory for something else, won't it have to be erased at some point? Or is the whole idea to minimize erasure, so that you only erase things that actually need to be erased?

As it turns out this has been explored, at least partially. Theoretically the destruction of information takes energy and heat has always been a problem in computers. (I don't understand the mechanism. Perhaps because information can't actually be destroyed it's turned into heat, which then has to be carted away.) In any case, one solution that was proposed was to make a memory cell using two capacitors rather than one. One is always charged, the other discharged, and changing value is just a transfer or charge from one to the other. I don't know that it every made it off the back of a napkin, mostly because the power lost by discharging the DRAM capacitors is the least of the power problems.

The more general solution has already been deployed, not by physicists or hardware engineers, but by us - the software people. It's log structured storage. From the perspective of the question you posed, the answer we developed is "don't do free()'s". Interestingly, as this conversation is about the connection between time and information, we use a technique that does not destroy information because there are places we have to roll back time in order to keep our data structures consistent.

Lockless algorithms for mere mortals

Posted Oct 13, 2020 15:28 UTC (Tue) by immibis (subscriber, #105511) [Link]

I'll try to give an intuitive explanation for why erasing data must cost energy. Premise: we know that the universe is reversible. If state A goes to state B, then state reverse(B) must go to state reverse(A).

Therefore you can't have states A and B which both go to C, because a universe in state reverse(C) wouldn't know whether to go to reverse(A) or reverse(B). Information cannot be deleted.

So how do computers delete information, then? Well, the universe is full of states we don't care about, so we just cheat and move the information to one of those. For example, writing 1 releases energy from a different point in the chip than writing 0, which causes a different vibration in the silicon lattice, which transfers to the plastic packaging, which causes an air molecule to bounce off the chip differently.

That means if there are 1 zillion different ways the air molecules would've bounced (if the chip was turned off), now there are 2 zillion, because all the bounces after 0 is written are different from all the bounces after 1 is written. Obviously air can bounce a lot of ways - but the problem is that all of the 1-zillion input states map to 1-zillion output states already. There's no way to get more output states than input states without giving the molecule more energy - which allows it to be in states it couldn't be in with its previous amount of energy. Therefore the chip must have transferred a bit of energy to the air molecule. There is no other way it could work.

All of this applies *on average*, by the way.

Or here's a different explanation: If you consider a computer that's known to be in 1 of 1000 states, and the air in the room that's in 1 of 1 zillion states, there are 1000 zillion states in total (the cartesian product). If the computer is in 1 of 500 states in the next timestep, there still must be 1000 zillion states for the computer+air system, because the universe is reversible. Therefore the air must be in 1 of 2 zillion states. This doesn't apply to reversible computers, because reversible computers *don't* decrease their number of states in any time step.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 17:14 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

The complexity of memory ordering can be fully explained with normal Newtonian physics. For example, you can make good analogies between memory misordering and misordering of messages carried by sound waves.

Simple propagation delay is more than enough to make all these apparent reorderings happen.

But it is of course often more fun to think in terms of more trendy views of physics. ;-)

Lockless algorithms for mere mortals

Posted Aug 1, 2020 2:09 UTC (Sat) by ras (subscriber, #33059) [Link]

Defending myself, my point wasn't that you couldn't explain what happens in terms of Newtonian physics. It is that personally I find it a pretty unsatisfying explanation. In Newtonian physics time and information are a given, causality is assumption - usually just an unstated assumption. If you are going to rely on unstated beliefs to derive something you may as well say God did it. In fact God is better in some ways - it at least makes the limits of your understanding plain.

But, Newtonian physics is an emergent property of other sets of rules. In those rules time, information, and causality are fundamental building blocks. Information in particular goes pretty deep - for example as others have stated here the definition of thermodynamic entropy is actually not about all possible arrangements of particles because there is no way to tell if you have swapped two identical particles, so such rearrangements can't be counted. How on earth does the Universe even know I can't distinguish between them - it's not as if it's looking through my eyes. But it does know, because if I don't discount the identical looking states when I calculate the energy I can extract from a thermodynamic inequality, I get the wrong answer. It goes even deeper in quantum mechanics. The different outcomes when you don't and don't know something become positively bizarre - plainly obvious interference patterns disappear just because I know something. [0]

It was partly observations like these that are utterly inexplicable by Newtonian Physics that led to the development of quantum mechanics, and at the other end of the scale relativity. If you are going to start marking analogies between computers programmers struggles with what can and can't be known and order things must happen (aka causality) and what physics has to say on the matter, you are far better starting with physics theories that feature those things front and centre, rather one one that ignores them completely.

[0] To this day I've yet to find a answer as to why these bizarre quantum mechanic effects happen, so it's this new physics is not _that_ satisfying. Instead "all" the physicists have done is built mathematical models of our universe that assume it happens, and those models are insanely accurate. So there is not doubt you must take the relationship between time and information, and causality into account. But the mechanism driving that relationship appears to be a total mystery. At least that my understanding, but I am just an fascinated bystander.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 15:35 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

No two ways about it, analogies with relativistic and quantum physics are way more cool than analogies against Newtonian physics. ;-)

Lockless algorithms for mere mortals

Posted Mar 15, 2021 16:01 UTC (Mon) by alison (subscriber, #63752) [Link]

> It's not just Newtonian physics, it's all macroscopic physics. In relativistic space-time, about the only thing preserved is causation [0].

Sadly, Bell's Inequality shows that the combination of quantum mechanics and relativity spells trouble for causality, too. See, for example,

https://phys.org/news/2017-07-probability-quantum-world-l...

"By performing an essentially loophole-free Bell test, they have shown that two atoms separated by a distance of a quarter of a mile share correlations that should be impossible under the hypothesis of local realism, and are most likely explained by quantum entanglement."

"Local realism" here means that even Reality has only "eventual consistency." Senior, respected, widely published physicists believe this and have done so for decades. Quantum entanglement is harder to reason about then memory ordering and consistency. Einstein's famous comment that "God does not play dice" was in response to discussions of this topic.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 15:12 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

Lockless algorithms are definitely something that needs practice to understand, but it's possible to learn it. I would say that 99% of lock-free uses are covered by just five cases:

  • "message passing", where one side uses smp_wmb() or smp_store_release() and the other uses smp_rmb() or smp_load_acquire(). This is also the basis of RCU.

  • "cmpxchg for the fast path with fall back to locking", which is abstracted by functions such as refcount_dec_and_mutex_lock

  • statistics that you simply update or read outside a lock (ironically this is not as easy as it seems, and it is rarely thought through because usually the stats need not be perfectly accurate; but do recognize those cases and just use a lock!)

  • "Dekker synchronization", also known as "store-B/load-A vs. store-A/load-B", which is a bit more complicated (and explained in Documentation/virt/kvm/vcpu-requests.rst) but quite useful.

  • Multiple-producer/single-consumer stacks, which are abstracted by linux/llist.h functions

... and I included linked lists only because they are mentioned in the article and they are already nicely abstracted within Linux, but usually you do not even need to use the full llist API (for example llist_del_first is rarely needed). The first four are those you want to learn inside-out; once you have done that, you can say you already have a good understanding of *practical* lockless algorithms.

An important thing to notice is that "acquire" and "release" are not black magic and the same concept of acquire/release semantics can be applied to higher-level synchronization primitives as well. For example, here are two cases of lockless access that people will do without even thinking of them as lockless:

  • "pthread_create" has release semantics and synchronizes with the start of the new thread; this means that anything written before a thread is created can be accessed from the thread even without a lock

  • "pthread_join" has acquire semantics and synchronizes with the exiting of the thread; these means that anything written by the thread before it exits can be accessed after pthread_join even without a lock

It's quite an eye-opener to realize that these cases work in exactly the same way as those "sophisticated" atomic access primitives.

Another thing to know is to steer clear of the C11 "sequentially consistent" loads and stores, they are incredibly easy to get wrong despite the reassuring name. In this respect the LKMM is easier to use, but I have learned the hard way to confine myself to the above five patterns.

Lockless algorithms for mere mortals

Posted Jul 29, 2020 21:45 UTC (Wed) by itsmycpu (subscriber, #139639) [Link]

If pthread create and join counts, then semaphore post and wait counts as well. :)

Tools like "thread sanitizer" are a considerable help in finding threadsafety 'leaks', although sometimes it takes a while, and I wouldn't rely on it finding everything.

In my experience using the C/C++ "relaxed" is rarely worth the extra headache, a big time saver is to just (almost) always use acquire/release (and acqrel for RMWs).

Lockless algorithms for mere mortals

Posted Jul 29, 2020 21:58 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

Semaphore post and wait certainly count!

Lockless algorithms for mere mortals

Posted Jul 30, 2020 10:07 UTC (Thu) by farnz (subscriber, #17727) [Link]

It also depends which CPUs you're working with; on x86 and zArch, acquire and release are the semantics of all loads and stores, and only acqrel and seqcst require anything special. Thus, on x86, going weaker than acquire and release doesn't gain you much, if any performance, as the only effect is to allow a little more compiler optimisation than would otherwise be possible.

In contrast, on ARM or POWER, only relaxed uses "normal" load and store operations; acquire and release operations need instructions to tell the CPU to put the barrier in place. You can thus gain a significant increase in performance by carefully using relaxed instead of acquire/release, because not only do you enable a tiny amount of compiler optimization, but you also tell the CPU that it can continue with normal memory ordering.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 13:50 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

There's actually an "even weaker than relaxed" mode for RMW operations, for example

    atomic_add_relaxed(&p, 3);

will actually compile to a locked-add on x86. But if you know you are the only writer (and for several of the idioms I listed above, you are) you can write this in a much more efficient way:

    WRITE_ONCE(p, p + 3);

(You don't even need READ_ONCE, because no data race is possible). And this is something I highly recommend because, in addition to being much more efficient, it documents the fact that you're the only writer and therefore (unlike relaxed atomics) it lowers the cognitive load of whoever reads the code.

Relaxed atomics can still be useful for statistics, though if you can afford the space and the reading cost of percpu values you probably should use them instead.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 14:36 UTC (Thu) by itsmycpu (subscriber, #139639) [Link]

Regardless of the benefit, when does a variable need to be atomic, but does not need acquire/release ? An example sometimes mentioned is a counter, but the compiler and/or the cpu/memory system may reorder such that the thing counted hasn't even happened yet, so to speak. Or that more things happen before the increment than you want to. So you'd need to make sure even that is still a valid and useful operation. I think it makes things an order of magnitude more tricky, and they are tricky enough already.

I guess I am saying that when I use atomic variables, it's a situation where I don't want "normal memory ordering", which usually is "unknown memory ordering" from the point of view of another thread. When I started with atomic variables, I saw "relaxed" as the optimum, however I repeatedly made the experience that I replaced code using "relaxed" because it was more complicated than necessary in general. Somehow the simpler solutions tended not to use 'relaxed' in the first place.

So I would hope that the performance characteristics of a platform support these simpler solutions.

By the way, is there a way to mark a sequence of two instructions such that preemption/interrupts will postpone a few cycles until the second one is executed as well? :)

Lockless algorithms for mere mortals

Posted Jul 30, 2020 14:43 UTC (Thu) by itsmycpu (subscriber, #139639) [Link]

This was meant as a response to the previous comment about x86+zArch vs ARM+POWER CPUs...

Lockless algorithms for mere mortals

Posted Jul 30, 2020 15:39 UTC (Thu) by excors (subscriber, #95769) [Link]

> Regardless of the benefit, when does a variable need to be atomic, but does not need acquire/release ?

The ordering constraints could happen at a higher level than the atomic accesses. E.g. you have a lot of threads atomically incrementing lots of counters lots of times, and then you pthread_join() to wait for them all to finish before printing the values of the counters. The pthread_join() ensures correct ordering between the increments and reads. But it doesn't matter what order the increments happen in, so there's no need to prevent the compiler or CPU from reordering them relative to each other (or coalescing them into fewer bigger additions, etc), and you can save the overhead of acquire/release semantics on every atomic instruction.

> By the way, is there a way to mark a sequence of two instructions such that preemption/interrupts will postpone a few cycles until the second one is executed as well? :)

I think all CPUs let you mask interrupts, to run arbitrary sequences atomically on single-core chips, but it won't be atomic on multi-core. Some (like ARM) instead let you execute a sequence of instructions, and then check afterwards whether it managed to run atomically or not (e.g. if there was an interrupt, or another core stole the cache line you were using) so you can retry until it succeeds.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 16:27 UTC (Thu) by itsmycpu (subscriber, #139639) [Link]

>> The ordering constraints could happen at a higher level than the atomic accesses. E.g. you have a lot of threads atomically incrementing lots of counters lots of times, and then you pthread_join() to wait for them all to finish before printing the values of the counters.

Let's take that example. You don't need acquire/release, however you have a lot of RMWs locking and contending for the same shared cachelines or memory locations.

Define the counters in a struct, and let each thread accumulate its counts in a thread-local instance of that struct. Now it is a simple ADD instruction on local memory. At some point in time, at the end of each thread or perhaps every 5 minutes, they add their values to a global shared struct (using acqrel if they like, since it doesn't matter performance-wise, so coordination at a higher level isn't necessary.)

>> I think all CPUs let you mask interrupts, to run arbitrary sequences atomically on single-core chips, but it won't be atomic on multi-core.

Yes, it would be nice if a userspace process could do this just for the duration of two instructions, since some algorithms would require other threads to wait if they get interrupted exactly between two instructions. It doesn't need to be atomic. For example lockless FIFO lists could then be built and used without need for list reversal. Another thread would just have to re-read once or twice if the "next" pointer contains a "please-wait" value (like (void*)1).

Currently the above mentioned lockless linux/llist.h appears to require list reversal if it is to be used in a FIFO fashion.

Lockless algorithms for mere mortals

Posted Jul 30, 2020 19:48 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

> Define the counters in a struct, and let each thread accumulate its counts in a thread-local instance of that struct. Now it is a simple ADD instruction on local memory.

It still needs to be an atomic access, otherwise you have a data race with the readers. In that case you can use relaxed loads and stores (while on x86 atomic_add_relaxed would not buy you anything in terms of performance).

Relaxed loads and stores are also useful when using seqlocks (which probably should be in my list too!), and when using smp_mb() as is the case for Dekker synchronization.

I do tend to avoid relaxed RMW operations.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 1:43 UTC (Fri) by itsmycpu (subscriber, #139639) [Link]

> It still needs to be an atomic access, otherwise you have a data race with the readers. In that case you can use relaxed loads and stores (while on x86 atomic_add_relaxed would not buy you anything in terms of performance).

You could have readers of thread-local data if you want to, in which case you are correct, that would require atomic stores. However the example as given does not require this, as it was said that collecting the values at the end of the thread is sufficient. So the thread itself can write the accumulated values to the shared global memory location. Or, for example, the thread writes to shared global memory each time its local counter reaches a multiple of 1,024. So the global counters are always updated with that granularity. And at the end of each thread (or some other, perhaps regular event), each thread adds the remainder.

So then, plain local non-atomic operations are sufficient most of the time, and global write access becomes so rare that its method is not performance relevant anymore, and can be acqrel RMW for a more defined behavior.

Nevertheless I agree that there are cases where "relaxed" is useful. I was also thinking of seqLocks, however I am not using them. I often use multiple copies of data (in various ways simpler than a RCU system). Like double-buffered framebuffers in graphics. One thread can locally work with one set of data, while the other thread works on another copy.

I am not familiar with the advantages of using smp_mb() in an algorithm (or Dekker synchronization). I would guess it is for cases where I would use an acqrel RMW. In the case of Linux kernel facilities, it is for me difficult to tell which ones are historically grown, and which ones might offer advantages compared to non-barrier C/C++ atomics.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 18:01 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

We recently had to defend memory_order_relaxed in the committee. Here is Hans Boehm's and my A Relaxed Guide to memory_order_relaxed [PDF] which was the basis of that defense.

The “Dekker synchronization” refers to the core of this locking algorithm, which as Paolo said is set up so that if two threads each doing a store to one variable followed by a load from the other, at least one of the threads sees the other's store. This is used heavily in the Linux kernel, for example, to correctly resolve wait/wakeup races.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 22:09 UTC (Fri) by itsmycpu (subscriber, #139639) [Link]

The PDF is an interesting read (though it will take me a few days to go through it), thank you for the reference. Also Dekker synchronization so far escaped my attention, at least by name, I will be looking into it.

My first impression regarding the described use cases of memory_order_relaxed is that it is worse than I thought. Not because I would doubt that there are algorithms that make good use of it (like seqLock), but because it seems really difficult to navigate the potential problems in most cases.

For example the use in reference counting increments. Even if decrements are non-relaxed, it isn't immediately obvious to me that increments can be relaxed. Although I am well aware that in the handover of a reference from thread A to thread B there needs to be a moment where an original reference is still in place, and that the atomic value itself is sequential for RMW operations like increment and decrement (which are already expensive enough), I would not be sure that thread B necessarily needs to execute the necessary barriers after incrementing the reference relaxed, and before starting to load values from the referenced object. This would seem to mean that the compiler (or the hardware) is free to move the effective load time to before the reference increment becomes effective. So the logic that prevents this is probably very tricky (at least from my point of view, unless I'm simply missing some principle that provides for verification to become much easier). My guess would be that the necessary barriers might be involved in assuring thread A that it can release its reference. But I'm not aware of a theoretical construct that would guarantee this expectation to always apply, in general, for unknown use cases.

Lockless algorithms for mere mortals

Posted Jul 31, 2020 22:47 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

There are many use cases for which memory_order_relaxed is reliable and useful, but there are dragons as well. Part of the problem is that C and C++ do not respect dependencies (with a very few exceptions), so the dragons are much more difficult than they are for things like READ_ONCE() and WRITE_ONCE() in the Linux kernel.

On the reference count increment, the trick is that although the increment can be relaxed, the act of passing it somewhere else must be properly ordered. This proper ordering is often supplied by something else, though, such as thread-creation primitives. Whether this is worthwhile can depend on the CPU.

Lockless algorithms for mere mortals

Posted Aug 1, 2020 4:24 UTC (Sat) by itsmycpu (subscriber, #139639) [Link]

Partial answer: In my most critical use case, I am using the reference counter also as a synchronization point, so I need acquire/release there. However in many cases making the reference count atomic will perhaps simply be a way to avoid the need to get exclusive write access.

For example if thread A already increases the count before passing the pointer, in advance of decreasing its own reference, of course relaxed is sufficient. If thread B is the one to increase the count, I'd think there needs to be a direct or indirect synchronization point between B's increment and A's decrement. Otherwise each increment is atomic, but one can't be sure which one happens first. However that synchronization point doesn't have to be the counter itself. If it isn't, then the question arises if decrementing can't be relaxed as well. :) Especially if the count reaching zero can be taken as an indication that all read/write access has already been given up.

So yes, I think that's a valid example for relaxed. Had to think about it, though. :-)

Here http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n...
you wrote that a compiler is allowed to replace

while (tmp = atomic_load_explicit(a, memory_order_relaxed))
do_something_with(tmp);

with

while (tmp = atomic_load_explicit(a, memory_order_relaxed))
do_something_with(tmp);
do_something_with(tmp);
do_something_with(tmp);
do_something_with(tmp);
}

If that is still the case according to current C/C++ definitions, why don't you argue that the definition of memory_order_relaxed be refined to READ_ONCE and WRITE_ONCE? Or are you already?

Lockless algorithms for mere mortals

Posted Aug 1, 2020 12:47 UTC (Sat) by itsmycpu (subscriber, #139639) [Link]

Something I noticed in your article http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p...
In 2.3, it describes an atomic_thread_fence with acquire:

> The atomic_thread_fence() function can be used to order multiple sets of accesses, for example, by replacing a series of acquire loads with relaxed loads followed by an atomic_thread_fence(memory_order_acquire)

In its shortness, this sentence suggests the sequence:
1: A series of relaxed loads
2: An acquire fence

However reading https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence suggests:

1: Reading a single synchronization variable (probably atomic relaxed, as in the example)
2: An acquire fence
3: A series of relaxed or non-atomic loads.

If you don't mind me pointing it out. Probably more like a typo.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 16:43 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

Hans and I were expecting people to refer to the cited sections of the working paper "N2153: A simple and efficient memory model for weakly-ordered architectures", which covers this in detail, including step 3. But yes, that expectation might not apply to people unfamiliar with the committee. Plus N2153 was written before the C11 atomic API had been finalized, so some mapping is required to understand it.

I have therefore expanded this section to make step 3 explicit. Thank you for pointing this out.

There will be a P2055R1 at some point, but in the meantime you can access the LaTeX at https://github.com/paulmckrcu/WG21-relaxedguide.git.

Lockless algorithms for mere mortals

Posted Aug 1, 2020 14:32 UTC (Sat) by itsmycpu (subscriber, #139639) [Link]

To complete my thoughts about reference counting above:

If the reference counter is incremented relaxed, (for common use cases) it implies that read/write access is synchronized separately. So I'd expect that decrementing can be relaxed as well if the thread that encounters a zero reference count still has read/write access (otherwise it should be enough to use a simple load-acquire on the synchronization variable).

However the (separate) synchronization of read/write access, which can't be relaxed, probably makes the gain look small in comparison, even on those platforms where it is larger. Which makes me wonder how large it is.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 16:45 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

Try it and measure the results!

Of course, the results will likely vary not only across CPU families, but also within CPU families.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 17:05 UTC (Sun) by itsmycpu (subscriber, #139639) [Link]

Currently working on a way to consistently precision-test execution times of short code fragments, as a side project...not as easy as it may sound. ;-)

Lockless algorithms for mere mortals

Posted Aug 2, 2020 17:27 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

No it is not at all easy! Today's systems are quite complex, and their are variations from one system to other (ostensibly identical) systems. And variations in a single system over time.

There is a lot of existing code to do such measurement, however, but on the other hand creating your own can be quite instructive.

Lockless algorithms for mere mortals

Posted Aug 2, 2020 16:56 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

Well, if the use of memory_order_relaxed was obvious and trivial, Hans and I probably would not have written P2055R0. :-)

It turns out that in C, all the atomic operations are already volatile, including atomic_load_explicit(), which is then pretty close to READ_ONCE().

In contrast, in C++, atomic_load_explicit() can take either a volatile or a non-volatile pointer, which means that use of C++ atomic_load_explicit() on non-volatile pointers (the common case) will be subject to the transformation called out in N4444. This working paper is for C++ rather than for C, in case you were wondering why it leaves out the C-language case.

And yes, JF and I are already calling for something like READ_ONCE() and WRITE_ONCE() in C++: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p...

Lockless algorithms for mere mortals

Posted Jul 31, 2020 17:55 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

Having per-subsystem documentation of the memory-ordering use cases of particular importance to that subsystem makes a lot of sense to me! Many of the five you call out are heavily used, but even given common documentation of a given use case, many subsystems might still want to cover special cases or how that use case interacts with the rest of the subsystem.

Naming is fun. Yet another common name for Dekker synchronization is store buffering. We probably need to list the common names for each use case in the LKMM documentation.

Good point on pthread_create() and pthread_join(). The same applies to things like mod_timer() and the resulting handler function.

Your point about C11 memory_order_seq_cst does me good, given how difficult it for me and a few others to get the committee to accept that the standard should include anything in addition than sequential consistency. ;-)

Lockless algorithms for mere mortals

Posted Jul 31, 2020 21:11 UTC (Fri) by itsmycpu (subscriber, #139639) [Link]

On x86, memory_order_seq_cst is not of practical interest because of the performance impact of the mfence instruction required in the implementation of the store operation.

Maybe it's a thing for college beginner class. ;)

Lockless algorithms for mere mortals

Posted Jul 31, 2020 21:27 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

Music to my ears! And please don't keep your views secret!!! It is always better when I am not the only one defending non-memory_order_seq_cst atomics. :-)

Lockless algorithms for mere mortals

Posted Aug 1, 2020 9:25 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

Paul, you don't know how much it means to me that you confirm that my understanding makes sense! It only took 10 years. :-)

Thank you very much!

Lockless algorithms for mere mortals

Posted Aug 2, 2020 15:39 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

My fond hope is that things like the Linux-kernel memory model can shorten that time in at least some cases. ;-)


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds