Lockless algorithms for mere mortals
Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
Time, as some have said, is nature's way of keeping everything from happening at once. In today's highly concurrent computers, though, time turns out not to be enough to keep events in order; that task falls to an extensive set of locking primitives and, below those, the formalized view of memory known as the Linux kernel memory model. It takes a special kind of mind to really understand the memory model, though; kernel developers lacking that particular superpower are likely to make mistakes when working in areas where the memory model comes into play. Working at that level is increasingly necessary for performance purposes, though; a recent conversation points out ways in which the kernel could make that kind of work easier for ordinary kernel developers.
Concurrency comes into play when multiple threads of execution are accessing the same data at the same time. Even in a simple world, keeping everything coherent in a situation like this can be a challenging task. The kernel prevents the wrong things from happening at the same time with the use of spinlocks, mutexes, and other locking primitives that can control concurrency. Locks at this level can be thought of as being similar to traffic lights in cities: they prevent accidents as long as they are properly observed, but at the cost of stopping a lot of traffic. Time spent waiting for locks hurts; even the time bouncing lock data between memory caches can wreck scalability, so developers often look for ways to avoid locking.
Lockless list linking
Consider a highly simplified example: inserting an element into a singly-linked list. One possible solution would be to use a mutex to protect the entire list; any thread traversing the list must first acquire this mutex. If the thread inserting an element acquires this lock, it knows that no other thread will be traversing the list at the same time, so changes will be safe. But if this list is heavily used, this locking will quickly become expensive.
So one might consider a lockless alternative. If the list initially looks like this:
One could start by linking the outgoing pointer from the new element to the existing element that will soon follow it in the list:
At this point, the list still looks the same to any other thread that is traversing it. To complete the operation, the code will redirect the pointer from the preceding element in the list to the new element:
Now everybody sees the new list; no locking was required, and the view of the list was always consistent and correct. Or so one would hope.
The problem is that modern hardware makes things harder in the name of performance. The order in which a series of operations is executed may not be the order in which those operations are visible to other threads in the system. So, for example, it might well be that other threads in the system see the assignment of the two pointers above in the opposite order, with the result that, from their point of view, there is a window of time during which the list looks like this:
The outgoing pointer in the new element will contain whatever was there before the assignment happened, leading to a list that heads off into the weeds. There are few certainties in the world, but one can be reasonably confident in saying that little good will come from this situation.
A more complex view of memory
Another way to think about it is that locking provides a sort of Newtonian-physics view of the world; in a given situation, one always knows what's going to happen. At lower levels, life starts to resemble quantum physics, where surprising things can happen and few people can convincingly claim to understand it all. One can normally function quite well in this world without understanding quantum physics, but there are situations where it is good to have an expert around.
The Linux kernel has a few experts of this type; they have put a great deal of work over the years into the creation of the kernel's memory model, which is a description of how the kernel views memory and the how to safely perform operations where concurrency may come into play. The result is the infamous memory-barriers.txt documentation file and a whole raft of supporting materials. From this documentation, one can learn a couple of useful things for the list-insertion example given above:
- Code traversing the list should read the next-item pointers with an "acquire" operation, which is a special sort of barrier. It guarantees that any operation that happens after the barrier actually appears afterward elsewhere in the system. In this case, it would ensure that, if a traversal reads a pointer to a given element, the assignment of that element's "next" pointer will already be visible.
- Code that sets the pointer should do so with a "release" operation, which ensures that any other operations done before the release operation are visible before that operation. That will ensure that a new element's "next" pointer is seen correctly globally before the pointer to the element itself.
This example is complicated enough to explain, but it is as simple as these things get; most cases are rather more complex. To make things worse, optimizing compilers can create surprises of their own in pursuit of higher performance. The kernel's memory model strives to address this threat as well.
The problem with the memory model
Recently, Eric Biggers posted a patch fixing a perceived problem in the direct I/O code where a concurrent data-access situation lacked the appropriate barriers. There was some discussion about whether a bug actually existed or not; the problem according to Biggers is that this sort of concurrent access is deemed "undefined behavior", meaning that the compiler is granted license to pursue any evil agenda that might strike its fancy. The real dispute, though, was over the fix.
Dave Chinner, who is generally acknowledged as being a moderately competent kernel developer, complained that the resulting code was not something that could be readily understood:
He was pointed to the memory-model documentation, but that did little to improve his view of the situation:
This documentation, he said, is aimed at people who spend their time thinking about memory-ordering issues. Everybody else is going to struggle with it, starting with the basic terminology used; even those who manage to understand it are likely to forget again once they go back to the problems they are actually trying to solve. Kernel developers would be better served, Chinner said, with a set of simple recipes showing how to safely code specific lockless patterns.
Biggers responded by posting a documented recipe for the "initialize once" pattern that was the source of the original problem in the direct-I/O subsystem. This pattern comes about when the initialization of a data structure is deferred until the structure is actually used, perhaps because that use may never actually happen. The initialization should be done exactly once; two racing threads should not both try to carry it out. The document provided several recipes of increasing complexity intended to match different performance needs.
While the attempt to provide useful recipes was welcomed, it became clear that a number of people felt that the effort had missed the mark somewhat. Darrick Wong, for example, pointed out that language like:
is not immediately clear to a lot of developers. Alan Stern's attempt to clarify it read like this:
This was seen as driving home the point that started the whole discussion: most developers do not think of memory this way and would really rather not have to. They simply do not think in this kind of language. As long as lockless algorithms require this sort of understanding, they will be underused and many of the implementations that do show up are likely to be buggy in subtle ways.
An alternative, as suggested by Matthew Wilcox, is to define an API for this sort of on-the-fly initialization and hide the details behind it. Developers who understand memory models and enjoy thinking about them can concern themselves with optimizing the implementation, while the rest of the kernel-development community can simply use it with the knowledge that it works as intended. There followed a discussion with several different ideas about how this API should actually look, but nothing emerged that looks like it could find its way into wider use.
This particular discussion has faded away, but the underlying problem
remains. Successful software development at all levels depends on the
management of complexity, but that is not yet really happening at the
memory-model level. Sooner or later, somebody will come along with the
right skills to both understand the Linux kernel memory model and to hide
it behind a set of APIs that other developers can safely use without
having to understand that model. Until then, writing lockless code will
continue to be a challenging task for many developers — and the people who
have to review their code.
Index entries for this article | |
---|---|
Kernel | Lockless algorithms |
Kernel | Scalability |
(Log in to post comments)
Lockless algorithms for mere mortals
Posted Jul 28, 2020 20:26 UTC (Tue) by warrax (subscriber, #103205) [Link]
This is the level at which we actually *NEED* machine-checkable proof. It's just so insanely difficult to get right even for tiny little "algorithms" that are involved in simple data structures like deque's and similar.
Lockless algorithms for mere mortals
Posted Jul 28, 2020 22:45 UTC (Tue) by excors (subscriber, #95769) [Link]
Then later I found a third paper about a tool that checks C/C++ atomic code against a specification, which mentions their tool found a bug in that other paper's C11 implementation. I think I ended up with the correct code, but it's really very tricky even when trying to follow supposed experts on a pretty simple algorithm, and it would be very nice if those checking tools were in more widespread use.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 14:41 UTC (Wed) by bgamari (subscriber, #102670) [Link]
Do you have a reference to this paper? Indeed I have used the C11 paper's Chase-Lev implementation in the past so I'm very likely missing a barrier.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 15:14 UTC (Wed) by excors (subscriber, #95769) [Link]
There's a corrected version of the buggy paper with C11 atomics: Correct and efficient work-stealing for weak memory models. The bug in the original published version was found by CDSChecker and CDSSpec, and their source repository has the original and bugfixed versions for comparison.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 18:18 UTC (Wed) by jezuch (subscriber, #52988) [Link]
Note that Java has a formal memory model (that's how they can do anything performant without horrible memory unsafety), so it's likely that these memory barriers were not ignored, just... assumed. (But yeah, they should have at least mentioned which features of the memory model they rely on.)
Lockless algorithms for mere mortals
Posted Jul 31, 2020 5:51 UTC (Fri) by ncm (guest, #165) [Link]
Lockless algorithms for mere mortals
Posted Aug 2, 2020 7:02 UTC (Sun) by jezuch (subscriber, #52988) [Link]
Lockless algorithms for mere mortals
Posted Aug 12, 2020 18:49 UTC (Wed) by briangordon (guest, #135428) [Link]
Lockless algorithms for mere mortals
Posted Aug 1, 2020 17:23 UTC (Sat) by anton (subscriber, #25547) [Link]
Machine-checkability does seem important.Or one might consider such a requirement (as well as the article) to be an indication that the concepts are too difficult to use. In other cases (e.g., wrt. page colouring), Linus Torvalds has written that he targets hardware that performs well without such complications, and that other hardware deserves what it gets. I think that is a sensible approach for memory ordering as well. Have a model that's easy to understand and performs ok on nice hardware (not sure if nice hardware already exists, but weak consistency certainly does not look nice to me); then hardware designers have an incentive to make their hardware nicer.
It seems to me that all this difficult-to-program memory ordering is there because multi-processors originally come from the supercomputing area, where hardware is still more expensive than software, and hardware designers can get away with making hardware that's hard to program correctly and efficiently.
Lockless algorithms for mere mortals
Posted Aug 3, 2020 10:12 UTC (Mon) by farnz (subscriber, #17727) [Link]
I would say that machine-checkability is necessary but not sufficient. If the rules are so complex to encode that a computer can't verify that you're getting them right, then they are also too complex for a human to get right, too, and they are certainly too complex for a human code reviewer to reliably check.
That said, this is a minimum requirement, as you can have rules that a computer can reliably verify, but that humans get wrong - see DEC Alpha ordering for the classic example.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 2:50 UTC (Wed) by dgc (subscriber, #6611) [Link]
> It's just so insanely difficult to get right even for tiny little "algorithms" that
> are involved in simple data structures like deque's and similar.
That's what the "litmus tests" the LKMM documentation talks about is for - it's a formal method of proving your ordering construct behaves correctly within the constraints of the LKMM.
That is, of course, dependent on whether you can translate your desired code construct into the form that the litmus tools can use and it still means the same thing. That can be a challenge all in itself, especially as the litmus syntax is not defined anywhere in the LKMM. And if you think the LKMM documentation is impenetrable, you should go look at the documentation for the herd7 formal verification tools.....
-Dave.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 15:30 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
I would never have believed this two years ago, but it just might be possible to get to a point where someone reading only the tools/memory-model documentation might have a fighting chance of being able to make good use of LKMM. Here is hoping, anyway...
Lockless algorithms for mere mortals
Posted Jul 28, 2020 20:32 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]
A little "premature optimization is the root of all evil" also applies here: Start simple, make it more complicated when and if this becomes necessary. Ie, use locking or the strongest kind of memory barrier until this demonstrably causes an issue.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 13:32 UTC (Wed) by Paf (subscriber, #91811) [Link]
Leslie Lamport was asked about how RAFT is easier to understand than Paxos, and in his reply he noted people should be careful: it might be easier to understand because they had dropped rigor in the explanation.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 21:55 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]
This kind of "formalistic explanations" aka "wild letter salad" are essentially write-only text. All of the constructs supposedly explained in the file are used in real code in the kernel. The documentation could be greatly improved by using real examples to explain the different semantics.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 22:25 UTC (Wed) by Paf (subscriber, #91811) [Link]
I’m all for specific examples and I agree memory-barriers.txt is bordering on unusable, but what you suggested is still too simple to replace it.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 16:56 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
Lockless algorithms for mere mortals
Posted Jul 30, 2020 9:33 UTC (Thu) by adobriyan (subscriber, #30858) [Link]
> complicated examples using single-letter variables.
memory-barriers.txt can look like scientific paper to a layman indeed.
> A much simpler version could be something like: Whenever two memory accesses need to become visible to other CPUs
> in a particular order, a barrier is required between them both in code which writes to and code which reads from memory.
This can lead to cargo-cultish practices, people remembering 1 rule which doesn't quite extend to even simplest
complications -- what to do with 3 accesses.
I gave set of Linux lectures (including memory barriers!) and naturally had to do the exercises myself.
The picture with 2 cpus, 1 memory and the case on line 150 (2 cpus, 2 variables) is absolutely required
for understanding what's going on. It is like solving ax^2+bx+c=0. Do it from the beginning to the end once.
(Doesn't work and explain order 5 case, though).
There are 24 cases indeed and you _should_ write them all down and write all the outcomes.
There will be 4.
Now add one barrier, cross the cases which became impossible, there will be still 4.
Then add second pairing barrier, and there will be 3. This explains why 1 barrier is not enough.
This also explains why barriers are actually necessary -- to reduce the set of observable outcomes.
> A little "premature optimization is the root of all evil" also applies here: Start simple, make it more complicated when and
> if this becomes necessary.
Documentation/memory-barriers.txt could do better by moving "Device" down and "data dependency" to the bottom.
And there is "compiler barrier" section" in the document about memory barriers, OK.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 18:02 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
Lockless algorithms for mere mortals
Posted Jul 28, 2020 20:41 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]
Lockless algorithms for mere mortals
Posted Jul 28, 2020 21:03 UTC (Tue) by NYKevin (subscriber, #129325) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 13:24 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]
This text has no function beyond getting the comment past the 10 letter minimum.
Lockless algorithms for mere mortals
Posted Jul 28, 2020 20:53 UTC (Tue) by roc (subscriber, #30627) [Link]
Lockless algorithms for mere mortals
Posted Jul 28, 2020 22:53 UTC (Tue) by gus3 (guest, #61103) [Link]
Lockless algorithms for mere mortals
Posted Jul 28, 2020 23:09 UTC (Tue) by excors (subscriber, #95769) [Link]
Lockless algorithms for mere mortals
Posted Jul 31, 2020 1:55 UTC (Fri) by gus3 (guest, #61103) [Link]
But I don't see how it would be sub-optimal. It's a direct request to hardware, to slam things into place in RAM before proceeding. It also invalidates remote cache, so other CPU's/cores will re-fetch the proper contents.
Yes, there might also be other cache lines to be flushed. What is the harm in writing them out? Would that be a timing issue, if you're recording an X264 video?
Lockless algorithms for mere mortals
Posted Aug 1, 2020 2:26 UTC (Sat) by HenrikH (subscriber, #31152) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 0:14 UTC (Wed) by ras (subscriber, #33059) [Link]
That under sells it. It's not just Newtonian physics, it's all macroscopic physics. In relativistic space-time, about the only thing preserved is causation [0]. Another way of saying this if event A causes event B, then that is how it appears in all frames of reference. It's not possible to agree on the time it happened, or speed, or the mass of the participants - but the fact that A caused B - yes we can agree on that.
I am no physicist, but causation looks to be the bare minimum that needs to be preserved to have an objective reality. By that I mean if it isn't possible to agree on A caused B, then it becomes impossible to have a shared truth. If we can not agree on what is true, there is no shared reality.
I don't think it's a coincidence that in our computer world the inability to decide the order two events happen leads to a special kind of hell. The kind that leads some of the best programmers I know to advise "just don't use threads". I suspect physicists get themselves into the same sort of hell when they imagine what happens when the light cones of two related events separate, and yet we know some things (like spin) are preserved between those two light cones, even though we don't know what the actual value is. Worse, some interpretations of quantum mechanics say the universe doesn't know either. If you substitute "the universe" for "the programmer who wrote the threaded code, who should know what is going on", you get the idea.
If you want to dive deeper into the rabbit hole, the most tantalising part is energy is some sense equivalent to information. So when we say energy is always conserved, we are in effect saying information is never created or destroyed. The observation that in our Universe every interaction is reversible is saying the same thing, as to be reversible all the information needed to describe the old state must be in the new state otherwise we could not reproduce the old state, and hence it would not be reversible. Objectively information is changing all the time so that's hard to accept at face value. It's meant in the sense that the capacity of the system to store information never changes. A somewhat poor analogy is it's similar to saying the RAM size of a computer never changes, and the only operations you can perform on that RAM are invertible - like XOR. You can of course store a mind boggling number of different configurations information that RAM (1<<bits_of_ram in fact, which for a computer with 8GB of RAM is a number that can reasonably be described as mind boggling). Noether's theorem says if something is conserved, there must be a symmetry. For example momentum to be conserved space must be symmetric, that is the same everywhere. For energy the thing that must be symmetric is time. Since energy is information, that means at a very fundamental level information and time are related concepts - one can't exist without the other.
[0] That is an exaggeration. We have laws of conservation - energy (and by implication mass) is conserved (if you ignore dark energy), charge is conserved, spin is conserved, and probably some others things.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 0:30 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
That's not quite true in distributed systems. There's a class of data structures that achieve a consensus without strict ordering: https://en.wikipedia.org/wiki/Conflict-free_replicated_da...
Lockless algorithms for mere mortals
Posted Jul 29, 2020 1:09 UTC (Wed) by ras (subscriber, #33059) [Link]
In contrast, if they independently observe A caused B they don't need to be in contact to know they agree on the ordering. They can be certain the other one saw A caused B without every meeting to agree on it.
The example the wikipedia article you linked to is an excellent example. The database they must agree on is a single bit. The update performed is to set the bit to 1 if an event occurs, but since it is a distributed system that event may not be seen by some databases, so some "observe" 0, and some observe a 1. Clearly that is different. To agree they observers must meet and then follow the logic "since it is never reset to 0, and since one participant saw the event happen it must have happened, ergo after meeting we can all agree on 1".
Lockless algorithms for mere mortals
Posted Jul 29, 2020 17:58 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
There's also the Spanner approach (put atomic clocks in all of your datacenters, get them all nice and synchronized, and then everyone can agree that A happens-before B because A and B are both timestamped and you know the resolution of your clocks is finer than the difference between those timestamps). But that requires hardware support, because regular clocks are insufficiently precise for timestamping to be useful in most realistic applications. Also, this is clearly useless for something like multicore concurrency within a single machine (you're not sticking an atomic clock on each core, that would be ridiculous).
Lockless algorithms for mere mortals
Posted Jul 29, 2020 19:40 UTC (Wed) by Wol (subscriber, #4433) [Link]
Except that then you know that the 1st floor has time moving at a different speed to the ground floor, and probably the front wall is moving at a different speed to the back ...
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 29, 2020 21:53 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
(Disclaimer: I work for Google. I do *not* get paid to promote Spanner or any other Google product.)
Lockless algorithms for mere mortals
Posted Aug 1, 2020 16:35 UTC (Sat) by anton (subscriber, #25547) [Link]
All the data centers are at rest wrt each other, and wrt to the events they observe (Earth rotation may be an issue, but not at the ms resolution they work with). Differences in the gravity field are miniscule, but even if they were significant, they would only change the time scale, not the order of events; and scale can be corrected by scaling the measured time to correct for this effect.
Lockless algorithms for mere mortals
Posted Jul 30, 2020 13:05 UTC (Thu) by sriram_srinivasan (guest, #140506) [Link]
Lockless algorithms for mere mortals
Posted Jul 30, 2020 15:15 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 2:40 UTC (Wed) by creemj (subscriber, #56061) [Link]
It would be better named the "Theory of Invariants" than the "Theory of Relativity". If event A occurring at (x_a, t_a) causes event B occurring at (x_b, t_b) then all observers agree on the value of c^2 (t_b-t_a)^2 - (x_b-x_a)^2. That is invariant. Actually it is invariant whether there is a causal relationship or not between A and B. There are other invariants such as the ultimate speed c, the mass m which is the same as the Newtonian mass (about which the author of the quote above is misinformed), and invariants involving momentum with energy and wavenumber with frequency.
I am no expert on memory models and I am not convinced that this analogy necessarily transfers to memory models, but maybe it would be interesting or useful if a memory model could be expressed in terms of invariants rather than those things that can change?
> The observation that in our Universe every interaction is reversible
Really? I place a hot cube of water in contact with a cold cube of water, isolated as best I can from the environment, and they both end up at the same lukewarm temperature. But the reverse is never seen to occur!
Lockless algorithms for mere mortals
Posted Jul 29, 2020 3:42 UTC (Wed) by ras (subscriber, #33059) [Link]
Yes, but that is because statistically unlikely for the ice cube to reform, not because it can't. The individual changes that make up the transition are all reversible. In theory, the only reason you haven't see it is because you haven't waited long enough. In practice long enough is likely after the heat death of the universe.
Put in more concrete terms, all the possible combinations of position and vibrational levels of each of the molecules constitute the state space of the system. All possible arrangements of those states are equally likely. However, you can't perceive those individual arrangements, you take a far more macro view. In that macro view, far, far more of those states look like the macro state "ice cube melted in water" than the macro state "ice cube floating in warm water". The system is undergoing constant random thermally driven transitions between one state and another. Since in there are far more "ice cube in water" states, over time that's where we end up.
The short hand for all that is: the 2nd law of thermodynamics. In mathematical terms it just says if you measure the outcome random experiments the average of those outcomes will tend toward the mean as the number of experiments increases. Casino's are profitable because of the same thing. However, that does not mean someone occasionally can't get a big win at a Casino, it's just statistically unlikely.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 6:11 UTC (Wed) by madhatter (subscriber, #4665) [Link]
You seem to be proposing that the 2nd law of thermodynamics (2T) says that, when evaluated over the universe as a whole, entropy must always rise. You say that it's fine for it to temporarily drop in certain lucky locales, it's just that the number of spontaneously melting ice cubes in cups of tea significantly outweighs the number of spontaneously freezing ones in similar cups of tea, and so over the universe as a whole entropy still rises.It's been a long time since I was a professional physicist, but as I recall 2T says that in any isolated system the entropy count can never decrease. 2T is a local, as well as a global, truth. That is to say, unless there's an interaction across the system border of any given nearly-melted ice cube in a cup of tea - eg, I stick a cooling coil in and shunt heat outside the local system - it can't spontaneously refreeze. A thermodynamically-exploitable temperature gradient will not spontaneously arise without intervention.
The casino analogy is a poor one because very few players in a casino are isolated. Any player who is - one playing solitaire in a broom cupboard, for example - will find that they never win big, because there's nobody to win from. Moreover, they can only not lose if they stop eating the chips (entropy count must rise in an isolated system unless all local processes are thermodynamically reversible).
Lockless algorithms for mere mortals
Posted Jul 29, 2020 7:43 UTC (Wed) by ras (subscriber, #33059) [Link]
My guess is that is demonstrably false. Lets say your system consists of 3 gas molecules 3m x 3m x 3m room. I doubt anyone's checked, but it seems certain those 3 molecules will end up on one side of the room at some point.
That is not terribly realistic of course. In real life a 3m x 3m x 3m room at 1atm has about 5e25 molecules in it. Statistically, I think we can say we are fairly safe from someone's ear drums bursting because all the air ended up on one side of the room. However, nothing has changed except the number of molecules. As far as I know nothing in physics says the universe behaves differently because that number has changed.
Your mind is now telling you that you can extract energy from that if it happened, so it must violate the 1st law of thermodynamics. I don't know the answer to that, but I don't feel too bad about it because it took about 50 years to figure out why Maxwell's daemon was a myth. I'm sure the universe has accounted for it somehow. It probably has to do with the fact that you have to wait for the universe goes dark to have a fair chance it will happen, and by that time the 2nd law has recovered the energy from you evaporating. [0]
> The casino analogy is a poor one because very few players in a casino are isolated
You play against the house, not the other players. Example: in Keno, the odds a published and fixed. So you can place a minimum bet and win the maximum even if no one else plays that round.
[0] That was a joke. Well part of it was. If it's not accounted for somewhere, it's where dark energy comes from [1].
[1] Also a joke. I really have no idea. [2]
[2] Actually, it could well be the Maxwell daemon thing again. Sitting there, actively monitoring the room so you know when all the gas molecules are on one side of the room takes an external energy source.
Lockless algorithms for mere mortals
Posted Sep 6, 2020 17:43 UTC (Sun) by nix (subscriber, #2304) [Link]
This was figured out a decade or so ago. You don't need the cost of monitoring: even if that's zero, the mere fact that the demon has to make a decision about whether to allow a given molecule through or not is enough to ensure that the entropy of the system (demon + box) always increases, given that the demon's memory capacity is finite such that it eventually has to erase the states of some of its memory (increasing its entropy) in order to make more decisions about whether to let molecules through.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 7:46 UTC (Wed) by cyphar (subscriber, #110703) [Link]
However, if you had a thermodynamic system with a handful of particles, you would observe the system move into lower entropy configurations purely by chance. All of the interactions occurring in thermodynamics are reversible, it's just that the likelihood of such interactions occurring is effectively zero. The same applies to even more mundane physics like the kinematics of billiard balls -- if you imagine a perfectly-spaced grid of one billion billiard balls being struck by another ball, while the interaction is entirely reversible it's incredibly unlikely that you'd be able to collide the balls in such a way that they would form a perfectly-spaced grid. The second law of thermodynamics is just a far more formalised version of that intuition.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 8:31 UTC (Wed) by Wol (subscriber, #4433) [Link]
Sorry, I can't quote the source beyond "I saw it in New Scientist", but there was a wonderful article where someone had two different (equally plausible) definitions of entropy, AND ONE OF THEM DECREASED.
The conclusion was the 2nd law was still valid, but it was no longer entropy it was measuring. It was information. Because iirc that made the correct selection between the two different definitions of entropy.
Isn't science wonderful :-)
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 29, 2020 9:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
Easy. It's the number of microstates of a system. There are other ways to re-formulate this definition, but they are equivalent to each other.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 11:18 UTC (Wed) by Wol (subscriber, #4433) [Link]
Mind you, your definition looks like the new version "information == entropy".
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 29, 2020 16:28 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
Lockless algorithms for mere mortals
Posted Jul 30, 2020 1:55 UTC (Thu) by cyphar (subscriber, #110703) [Link]
In thermodynamics, S = kB ln Ω by definition. The term "entropy" is used in many different fields (such as computer science) and can mean different things (such as Shannon entropy), but in thermodynamics it has a specific meaning -- it is a measure of the number of microstates which appear as a single macrostate. Entropy increases because systems tend towards the macrostates which have the most microstates (again, purely because of statistics).
Lockless algorithms for mere mortals
Posted Jul 31, 2020 21:06 UTC (Fri) by NYKevin (subscriber, #129325) [Link]
The problem with this argument is that it is time-symmetric. In other words, you can just as easily argue that the present macrostate is more likely to have evolved from past macrostates which had more microstates. Observation tells us that the time-reversed argument is empirically wrong (because we observe entropy increasing as a function of time), which is a bit of a problem because it appears to be symmetric with the (empirically correct) non-reversed argument. The only (obvious) way to break this symmetry is to assert a boundary condition of low entropy at or near the beginning of the universe. This boundary condition, IMHO, is the real mystery: Why did the early universe have such low entropy? I don't think we have a working answer to that question (yet).
Lockless algorithms for mere mortals
Posted Aug 1, 2020 4:11 UTC (Sat) by ras (subscriber, #33059) [Link]
Actually no, you can't. In the definition of entropy the number of microstates is fixed and you are measuring how many of those microstate arrangements look like a given macro state.
That aside, no one is disputing there is a thermodynamic arrow of time. The argument being made is that all transitions between microstates are reversible, lossless in terms of information and equally likely in both directions. It's possible for that to be true for micro states and yet there be preferred direction for the evolution of macro states. It's not only possible, it's what happens.
It's not only what happens, there is no mystery as to why it happens. For example, take a go board covered with black and white stones and assume we are near blind. If all black stones are on one side of the board and white on the other we can see that, so black/white is a macro state, but the rest of the possible microstate arrangements just look like grey to us, so grey is another macro state. Back/white corresponds to a few micro states where the black and white stores are poorly mixed. I think there are 361! / 181! possible board arrangements or about 10^431. Most of those states will look just grey - a random mixture of black and white. If the pieces are moving randomly, all of the microstates will be equally likely. This means the odds of seeing black/white instead of grey look to be about 1 in 10^245 if you assume 10% can be out of place. [0] If your go board starts in the black / white macro state, then is allowed to evolve through random 100% completely reversible symmetric swaps you will see it change to grey, and stay that way.
That is the thermodynamic arrow of time. Yet it arose from a completely reversible process. Microstates and macro states are both reasonable ways of looking at the system. When we say time is reversible, we are talking about the former.
[0] Caveat: I am deriving the formulas for these probabilities in my head, as I write this. They are almost certainly wrong.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 18:27 UTC (Wed) by ianmcc (subscriber, #88379) [Link]
recurrence theorem), so in that sense the entropy is cyclic, if you start from a very low entropy state then eventually it will return to that low entropy state again. But the time it takes to get there is again exponentially large in the number of particles, so the time required very quickly exceeds any sensible number (even using units of the lifetime of the universe).
Lockless algorithms for mere mortals
Posted Jul 29, 2020 10:18 UTC (Wed) by ibukanov (subscriber, #3942) [Link]
As for ice cubes it is more accurate to state that at the room temperature there are vastly more states of water that resemble liquid than the number of states that resemble liquid and an ice cube. And the reason for that is that while a few individual molecules in water can be ice-cold, one needs for all such molecules to come to one place, and that is extremely-extremely-extremely unlikely.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 12:58 UTC (Wed) by ras (subscriber, #33059) [Link]
As I've grown older, I've become very wary of arguments that involve infinities. All sorts of weird things happen. For example we are finite beings, therefore we can only make finite changes. If we make a finite change to an infinity the relative change is infinitesimal, so it looks like nothing has changed. Therefore if there is an infinity out there we can perceive, we can't change it. So to us an infinity is not something immense or huge, it's a constant, unchangeable by us, in fact unchangeable by anything other than another infinity. If we do happen to see an infinity change an infinity what we will see is a constant changing a constant.
At this point I trust you understand my distrust in my understanding of infinities.
The final nail in the coffin in my confidence in my ability to reason about infinities was the proof of the halting problem. It looked so bloody simple when it was presented in my undergraduate course, the proof almost self evident once it was explained. Then 40 years later, I read an article proclaiming the halting problem and Gödel's incompleteness theorem are one and the same thing; mostly. It came as a surprise as I found the incompleteness theorem near inscrutable. Turned out that was because Gödel needed a computer language for his proof. Having none handy invented his own just for the job which was so butt ugly it was unrecognisable - to me anyway. People cleverer than me did spot it and blogged about it. It was a beautiful read. More pertinently, another gem dropped out of the argument the two were the same - and that was they involve an infinity. It's the tamest of infinities - countably infinite, which is a fancy way of saying no upper limit. But you see once you realise the halting problem is really saying you can't prove something will halt if it could take an arbitrarily long time to do so the proof loses most of its magic. In fact, once you put aside the obstacle the "halting theorem shows it's not possible" it isn't too difficult to discover you can prove whether any finite system will halt or not. You may need a lot of resources to do it, but the amount of resources is finite and bounded. Since I haven't seen too many constants changing constants (none whatsoever in fact), I'd say I live in a finite world. That means something I believed in for 40 years doesn't actually apply to me, and it's all because of a misunderstanding created by an infinity.
I guess you are saying that if you have an infinite amount of money, on each subsequent bet you place your losses to date + $1. Eventually you must win a bet, ergo eventually you must win $1. You probably think I am chafing at the infinite amount of money. I am, but there is a second hidden second infinity. The only condition under which you are guaranteed to win a bet is if you are prepared to place an infinite amount of bets. So now we have two infinities, and for your statement to be true you must be always able to place more bets than the infinite amount of bets it takes to guarantee a win.
Clearly I'm no expert on the matter, but I vaguely recall infinities aren't comparable. It's sort of like asking if two constants fight, which one will be the victor.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 13:44 UTC (Wed) by ibukanov (subscriber, #3942) [Link]
Lockless algorithms for mere mortals
Posted Jul 30, 2020 4:31 UTC (Thu) by nash (subscriber, #50334) [Link]
If you have an infinite amount of money, bankrupting a casino is a poor use of your time. And you need an infinite amount of time as well as money.
Wikipedia covers is reasonably well:
https://en.wikipedia.org/wiki/Martingale_(betting_system)
Lockless algorithms for mere mortals
Posted Jul 29, 2020 8:14 UTC (Wed) by joib (subscriber, #8541) [Link]
> That under sells it. It's not just Newtonian physics, it's all macroscopic physics.
An explanation I've read somewhere is that sequential consistency is like Newtonian physics. The model is that you have a switch that connects memory to one processor at a time, and only that one processor can interact with memory, and all effects have propagated when you turn the switch to another processor etc. In effect everything propagates instantaneously.
Relaxed memory models are like relativistic physics, in that each processor has its own view of memory (its own light-cone if you wish), effects propagate at finite speed and different processors may see events in different order.
I don't see the necessity to bring in quantum physics here. Certainly not the pop-sci "ooh, this is weird, thus it must have something to do with quantum mumble-mumble something". And yes, I'm a (former) physicist.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 8:24 UTC (Wed) by Wol (subscriber, #4433) [Link]
Well, actually, energy doesn't (or so it is believed) exist, in the sense that the sum of energy in the Universe is zero. It is a signed quantity, so potential gravitation energy is equal in magnitude but opposite in sign to mass energy. That's how black holes evaporate as the Universe gets colder ...
As for dark energy, this is merely my theory, but I believe it doesn't exist ... Just as we have Boyle's Ideal Gas Law, I think we have mucked up the formula for gravity. I think it's assumed to be G=g.m1.m2/d^2 (I've forgotten my Physics :-). But imho this is probably the IDEAL gravitational equation, and as we move into more open space gravitons coalesce into a "superfluid" and d^2 tends to d. This would explain the fact that gravity appears to get stronger, the emptier space gets. In other words, gravitons behave exactly like gas molecules, as they get colder they move from the gaseous phase towards the solid phase.
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 29, 2020 13:41 UTC (Wed) by Paf (subscriber, #91811) [Link]
Lots of versions of this have been formulated and tested against astronomical data; look up “modified Newtonian dynamics” for what I think is the most serious and current version. The conclusion of most physicists is that while this sort of thing is a compelling idea, it doesn’t match observations (and probably can’t ever). There are still a few modified gravity models out there clinging on with serious debate - not just cranks - over whether they match reality, I think generally under that modified Newtonian dynamics label.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 14:01 UTC (Wed) by ibukanov (subscriber, #3942) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 22:29 UTC (Wed) by Paf (subscriber, #91811) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 23:12 UTC (Wed) by ibukanov (subscriber, #3942) [Link]
And again, for extremely simplified toy models where General Relativity is solvable the results indicate that the Newtonian approximation may not be valid and General Relativity alone can explain rotational curves for a galaxy without any dark matter. Now those toy models are too naive, but they emphasize that we cannot rule out that dark matter is not an artifact of an unsound approximation.
Lockless algorithms for mere mortals
Posted Jul 30, 2020 4:06 UTC (Thu) by alison (subscriber, #63752) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 22:57 UTC (Wed) by Wol (subscriber, #4433) [Link]
Just like when you add velocities you have to add some relativistic correction as they get higher so that c+c=c, my theory is adding a correction as mass density tends to zero.
And surely it's not too outlandish to assume that gravity - which we now know is transmitted by gravitons - will be seriously affected if gravitons form a Bose-Einstein condensate in intergalactic space?
If dark matter doesn't exist, then we need need some way of strengthening gravity as mass-density drops, and assuming gravitons condense and transmit it more efficiently just seems to fit with everything else we know about the Universe. We just need to get the maths right :-)
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 30, 2020 15:12 UTC (Thu) by joib (subscriber, #8541) [Link]
We do?
I mean, sure, quantum gravity is an active research topic, but AFAIK there's been no breakthrough recently. Did I miss something major?
Lockless algorithms for mere mortals
Posted Jul 30, 2020 18:17 UTC (Thu) by Wol (subscriber, #4433) [Link]
So having detected gravitational waves, we know (assuming there isn't a major screw-up in the theory) that gravitons must exist.
Of course, just as with electromagnetics, I think it's the same problem of we can know where it and what it is, just not both at the same time :-)
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 30, 2020 20:50 UTC (Thu) by joib (subscriber, #8541) [Link]
Gravitational waves, as e.g. detected by LIGO, are predicted by general relativity. No need for quantum gravity. And without quantisation, no graviton either.
That's not to say quantum gravity isn't a thing. But so far we have neither observations suggesting it nor any convincing theory for it.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 8:47 UTC (Fri) by Wol (subscriber, #4433) [Link]
A photon is an electro-magnetic wave. An electron is a wave (not sure what in :-). Symmetry says surely a gravitational wave has a graviton?
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Jul 31, 2020 11:01 UTC (Fri) by joib (subscriber, #8541) [Link]
Gravity waves neither prove nor disprove that gravity is quantised, just like a plethora of electromagnetic wave phenomena that neither prove nor disprove that the electromagnetic field is quantised.
And to be pedantic, a photon is a quantised excitation in the electromagnetic field (not the field itself). An electron is a quantised excitation in the electron-positron field. Sure, it would be nice and symmetrical if there analogously would be a "graviton". Unfortunately nature doesn't much care for our notions of beauty and symmetry, and so far has resisted our efforts to crack this particular nut.
Lockless algorithms for mere mortals
Posted Sep 7, 2020 12:38 UTC (Mon) by nix (subscriber, #2304) [Link]
Also, since both move at lightspeed, it makes about as much sense to say that gravitons can "get colder" as it does to say that light can "get colder".
Lockless algorithms for mere mortals
Posted Sep 7, 2020 16:46 UTC (Mon) by Wol (subscriber, #4433) [Link]
Except that's not what I'm doing. Sorry if I've mangled my physics, but ?Boyle's Ideal Gas Law only applies to a hot gas where the atoms/molecules bounce off each other. As the gas cools, the Gas Law changes.
I'm saying that, just like with velocities where at slow speeds v1+v2 gives us the obvious answer but c+c gives us the extremely unintuitive answer of c, might not the gravitational law ALSO change with mass-energy density, so that gravity is stronger in regions of low mass density like interstellar and intergalactic space? After all, isn't that one possible explanation for what we observe?
Or, given that somebody said that gravitational waves travel at light speed, but light speed is not constant (c is a theoretical maximum), maybe the waves travel faster in low mass density?
Cheers,
Wol
Lockless algorithms for mere mortals
Posted Sep 7, 2020 19:19 UTC (Mon) by nix (subscriber, #2304) [Link]
This is the basis of Mordehai Milgrom's MOND (and a number of other similar things, some of which, unlike MOND, are modifications of relativity rather than Newton). They're not widely accepted because while they get galaxies' rotational velocities right (it would be surprising if they didn't), unfortunately, if you attempt to simulate the structure and evolution of the large-scale universe using MOND or similar theories, the results look very very different from what we observe. The people involved in MOND are major figures (not least Jakob Bekenstein, who nobody could claim doesn't know his stuff where gravity is concerned), but MOND remains... problematic.
(But it has nothing to do with Boyle's law or with slowing of anything: MOND isn't even relativistic, let alone quantized, and doesn't mention gravitons at all. Its adjustments to Newton are ad-hoc to make the rotation curves of galaxies come out right.)
> Or, given that somebody said that gravitational waves travel at light speed, but light speed is not constant (c is a theoretical maximum), maybe the waves travel faster in low mass density?
Massless particles (including photons) travel at c, always: don't think of it as the speed of light, think of it as the speed of propagation of cause and effect, and massless particles max this out. In a medium, the apparent speed of light waves in particular (but no other massless particles) is reduced by coupling to the electromagnetic fields of charged particles in the medium (largely electrons): but this is a group effect on the wave as a whole, and the photons that make up the light are still moving at c!
I suppose the same class of effect in theory could apply to changes in gravity, since gravitational waves should I believe couple to all masses they pass in the same way and slow in regions with higher mass density, but gravity is such a weak force that the effect would be incredibly tiny: it would almost certainly be as unobservable as the graviton itself even if you were hanging out next to a black hole. (So far, nobody has thought up a way to produce a graviton detector which isn't so massive it collapses into a black hole. Gravity is *ridiculously* weak.)
Lockless algorithms for mere mortals
Posted Sep 7, 2020 20:07 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
This is actually questionable.
Moreover, it has not been conclusively proven that galactic rotation curves can't be explained by general relativity alone. Exact solutions of Einstein field equations are too simplistic for that and computer simulation is way too complicated for something like a galaxy. There are people working on this, but this is very un-glamorous area of research.
Lockless algorithms for mere mortals
Posted Sep 7, 2020 12:27 UTC (Mon) by nix (subscriber, #2304) [Link]
That's how black holes evaporate as the Universe gets colder ...For the record, this is completely wrong. Black hole evaporation is a consequence of the differing appearance of the weave of spacetime to observers very close to, versus far from, the event horizon: a local version of the Unruh effect, as it were (or, rather, the Unruh effect is the same thing as Hawking evaporation applied to accelerating observers rather than observers in a gravitational field). It depends only on the mass of the hole and its event horizon radius. It happens even when the universe is hot: it happens even when the hole is still in the middle of its exploding progenitor star. It's just that until the universe is very old and cold (or the hole is exceptionally small, thus with a high Hawking temperature), the hole will gain more mass through absorbing intercepted microwave background radiation than it loses through Hawking radiation. But it's still losing mass through this effect all the time, nonetheless.
The process has nothing to do with the sign of gravitational potential energy at all.
(Yes, the Unruh effect is named after the same Bill Unruh who used to hang out on uk.comp.os.linux.)
Lockless algorithms for mere mortals
Posted Jul 29, 2020 11:11 UTC (Wed) by excors (subscriber, #95769) [Link]
Some people have argued that will become a practical issue for computers. The standard AND/OR/XOR/etc gates destroy information by converting 2 bits of input to 1 bit of output, but information is entropy and entropy can't decrease in a closed system, so in a physical implementation the decrease in logical information has to be balanced out by emitting heat. That puts an ultimate limit on the power efficiency of a non-reversible computer. To beat that limit, it's necessary - and theoretically possible, though far from straightforward - to design computers with reversible logic gates instead.
(https://spectrum.ieee.org/computing/hardware/the-future-o...)
Lockless algorithms for mere mortals
Posted Jul 29, 2020 18:27 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
Lockless algorithms for mere mortals
Posted Jul 29, 2020 19:17 UTC (Wed) by excors (subscriber, #95769) [Link]
You could extract the useful data and discard the garbage; but then you're destroying information and creating heat, so it's only helpful if you have an efficient algorithm that generates very little garbage, and is not generally worthwhile. Apparently the trick is to copy the useful output, then run the program in reverse to convert the full output (including garbage) back into the input (including constants), so now you have the input and output and constants and no garbage. If you then want to discard the input, I think the argument is that that's okay because the cost is bounded by the size of the input, not by the amount of computation.
You're probably not going to run Linux on a computer like that, you'll have to rethink everything from the ground up.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 23:47 UTC (Wed) by ras (subscriber, #33059) [Link]
As it turns out this has been explored, at least partially. Theoretically the destruction of information takes energy and heat has always been a problem in computers. (I don't understand the mechanism. Perhaps because information can't actually be destroyed it's turned into heat, which then has to be carted away.) In any case, one solution that was proposed was to make a memory cell using two capacitors rather than one. One is always charged, the other discharged, and changing value is just a transfer or charge from one to the other. I don't know that it every made it off the back of a napkin, mostly because the power lost by discharging the DRAM capacitors is the least of the power problems.
The more general solution has already been deployed, not by physicists or hardware engineers, but by us - the software people. It's log structured storage. From the perspective of the question you posed, the answer we developed is "don't do free()'s". Interestingly, as this conversation is about the connection between time and information, we use a technique that does not destroy information because there are places we have to roll back time in order to keep our data structures consistent.
Lockless algorithms for mere mortals
Posted Oct 13, 2020 15:28 UTC (Tue) by immibis (subscriber, #105511) [Link]
Therefore you can't have states A and B which both go to C, because a universe in state reverse(C) wouldn't know whether to go to reverse(A) or reverse(B). Information cannot be deleted.
So how do computers delete information, then? Well, the universe is full of states we don't care about, so we just cheat and move the information to one of those. For example, writing 1 releases energy from a different point in the chip than writing 0, which causes a different vibration in the silicon lattice, which transfers to the plastic packaging, which causes an air molecule to bounce off the chip differently.
That means if there are 1 zillion different ways the air molecules would've bounced (if the chip was turned off), now there are 2 zillion, because all the bounces after 0 is written are different from all the bounces after 1 is written. Obviously air can bounce a lot of ways - but the problem is that all of the 1-zillion input states map to 1-zillion output states already. There's no way to get more output states than input states without giving the molecule more energy - which allows it to be in states it couldn't be in with its previous amount of energy. Therefore the chip must have transferred a bit of energy to the air molecule. There is no other way it could work.
All of this applies *on average*, by the way.
Or here's a different explanation: If you consider a computer that's known to be in 1 of 1000 states, and the air in the room that's in 1 of 1 zillion states, there are 1000 zillion states in total (the cartesian product). If the computer is in 1 of 500 states in the next timestep, there still must be 1000 zillion states for the computer+air system, because the universe is reversible. Therefore the air must be in 1 of 2 zillion states. This doesn't apply to reversible computers, because reversible computers *don't* decrease their number of states in any time step.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 17:14 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
Simple propagation delay is more than enough to make all these apparent reorderings happen.
But it is of course often more fun to think in terms of more trendy views of physics. ;-)
Lockless algorithms for mere mortals
Posted Aug 1, 2020 2:09 UTC (Sat) by ras (subscriber, #33059) [Link]
But, Newtonian physics is an emergent property of other sets of rules. In those rules time, information, and causality are fundamental building blocks. Information in particular goes pretty deep - for example as others have stated here the definition of thermodynamic entropy is actually not about all possible arrangements of particles because there is no way to tell if you have swapped two identical particles, so such rearrangements can't be counted. How on earth does the Universe even know I can't distinguish between them - it's not as if it's looking through my eyes. But it does know, because if I don't discount the identical looking states when I calculate the energy I can extract from a thermodynamic inequality, I get the wrong answer. It goes even deeper in quantum mechanics. The different outcomes when you don't and don't know something become positively bizarre - plainly obvious interference patterns disappear just because I know something. [0]
It was partly observations like these that are utterly inexplicable by Newtonian Physics that led to the development of quantum mechanics, and at the other end of the scale relativity. If you are going to start marking analogies between computers programmers struggles with what can and can't be known and order things must happen (aka causality) and what physics has to say on the matter, you are far better starting with physics theories that feature those things front and centre, rather one one that ignores them completely.
[0] To this day I've yet to find a answer as to why these bizarre quantum mechanic effects happen, so it's this new physics is not _that_ satisfying. Instead "all" the physicists have done is built mathematical models of our universe that assume it happens, and those models are insanely accurate. So there is not doubt you must take the relationship between time and information, and causality into account. But the mechanism driving that relationship appears to be a total mystery. At least that my understanding, but I am just an fascinated bystander.
Lockless algorithms for mere mortals
Posted Aug 2, 2020 15:35 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]
Lockless algorithms for mere mortals
Posted Mar 15, 2021 16:01 UTC (Mon) by alison (subscriber, #63752) [Link]
Sadly, Bell's Inequality shows that the combination of quantum mechanics and relativity spells trouble for causality, too. See, for example,
https://phys.org/news/2017-07-probability-quantum-world-l...
"By performing an essentially loophole-free Bell test, they have shown that two atoms separated by a distance of a quarter of a mile share correlations that should be impossible under the hypothesis of local realism, and are most likely explained by quantum entanglement."
"Local realism" here means that even Reality has only "eventual consistency." Senior, respected, widely published physicists believe this and have done so for decades. Quantum entanglement is harder to reason about then memory ordering and consistency. Einstein's famous comment that "God does not play dice" was in response to discussions of this topic.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 15:12 UTC (Wed) by pbonzini (subscriber, #60935) [Link]
Lockless algorithms are definitely something that needs practice to understand, but it's possible to learn it. I would say that 99% of lock-free uses are covered by just five cases:
"message passing", where one side uses smp_wmb() or smp_store_release() and the other uses smp_rmb() or smp_load_acquire(). This is also the basis of RCU.
"cmpxchg for the fast path with fall back to locking", which is abstracted by functions such as refcount_dec_and_mutex_lock
statistics that you simply update or read outside a lock (ironically this is not as easy as it seems, and it is rarely thought through because usually the stats need not be perfectly accurate; but do recognize those cases and just use a lock!)
"Dekker synchronization", also known as "store-B/load-A vs. store-A/load-B", which is a bit more complicated (and explained in Documentation/virt/kvm/vcpu-requests.rst) but quite useful.
Multiple-producer/single-consumer stacks, which are abstracted by linux/llist.h functions
... and I included linked lists only because they are mentioned in the article and they are already nicely abstracted within Linux, but usually you do not even need to use the full llist API (for example llist_del_first
is rarely needed). The first four are those you want to learn inside-out; once you have done that, you can say you already have a good understanding of *practical* lockless algorithms.
An important thing to notice is that "acquire" and "release" are not black magic and the same concept of acquire/release semantics can be applied to higher-level synchronization primitives as well. For example, here are two cases of lockless access that people will do without even thinking of them as lockless:
"pthread_create" has release semantics and synchronizes with the start of the new thread; this means that anything written before a thread is created can be accessed from the thread even without a lock
"pthread_join" has acquire semantics and synchronizes with the exiting of the thread; these means that anything written by the thread before it exits can be accessed after pthread_join even without a lock
It's quite an eye-opener to realize that these cases work in exactly the same way as those "sophisticated" atomic access primitives.
Another thing to know is to steer clear of the C11 "sequentially consistent" loads and stores, they are incredibly easy to get wrong despite the reassuring name. In this respect the LKMM is easier to use, but I have learned the hard way to confine myself to the above five patterns.
Lockless algorithms for mere mortals
Posted Jul 29, 2020 21:45 UTC (Wed) by itsmycpu (subscriber, #139639) [Link]
Tools like "thread sanitizer" are a considerable help in finding threadsafety 'leaks', although sometimes it takes a while, and I wouldn't rely on it finding everything.
In my experience using the C/C++ "relaxed" is rarely worth the extra headache, a big time saver is to just (almost) always use acquire/release (and acqrel for RMWs).
Lockless algorithms for mere mortals
Posted Jul 29, 2020 21:58 UTC (Wed) by pbonzini (subscriber, #60935) [Link]
Lockless algorithms for mere mortals
Posted Jul 30, 2020 10:07 UTC (Thu) by farnz (subscriber, #17727) [Link]
It also depends which CPUs you're working with; on x86 and zArch, acquire and release are the semantics of all loads and stores, and only acqrel and seqcst require anything special. Thus, on x86, going weaker than acquire and release doesn't gain you much, if any performance, as the only effect is to allow a little more compiler optimisation than would otherwise be possible.
In contrast, on ARM or POWER, only relaxed uses "normal" load and store operations; acquire and release operations need instructions to tell the CPU to put the barrier in place. You can thus gain a significant increase in performance by carefully using relaxed instead of acquire/release, because not only do you enable a tiny amount of compiler optimization, but you also tell the CPU that it can continue with normal memory ordering.
Lockless algorithms for mere mortals
Posted Jul 30, 2020 13:50 UTC (Thu) by pbonzini (subscriber, #60935) [Link]
There's actually an "even weaker than relaxed" mode for RMW operations, for example
atomic_add_relaxed(&p, 3);
will actually compile to a locked-add on x86. But if you know you are the only writer (and for several of the idioms I listed above, you are) you can write this in a much more efficient way:
WRITE_ONCE(p, p + 3);
(You don't even need READ_ONCE
, because no data race is possible). And this is something I highly recommend because, in addition to being much more efficient, it documents the fact that you're the only writer and therefore (unlike relaxed atomics) it lowers the cognitive load of whoever reads the code.
Relaxed atomics can still be useful for statistics, though if you can afford the space and the reading cost of percpu values you probably should use them instead.
Lockless algorithms for mere mortals
Posted Jul 30, 2020 14:36 UTC (Thu) by itsmycpu (subscriber, #139639) [Link]
I guess I am saying that when I use atomic variables, it's a situation where I don't want "normal memory ordering", which usually is "unknown memory ordering" from the point of view of another thread. When I started with atomic variables, I saw "relaxed" as the optimum, however I repeatedly made the experience that I replaced code using "relaxed" because it was more complicated than necessary in general. Somehow the simpler solutions tended not to use 'relaxed' in the first place.
So I would hope that the performance characteristics of a platform support these simpler solutions.
By the way, is there a way to mark a sequence of two instructions such that preemption/interrupts will postpone a few cycles until the second one is executed as well? :)
Lockless algorithms for mere mortals
Posted Jul 30, 2020 14:43 UTC (Thu) by itsmycpu (subscriber, #139639) [Link]
Lockless algorithms for mere mortals
Posted Jul 30, 2020 15:39 UTC (Thu) by excors (subscriber, #95769) [Link]
The ordering constraints could happen at a higher level than the atomic accesses. E.g. you have a lot of threads atomically incrementing lots of counters lots of times, and then you pthread_join() to wait for them all to finish before printing the values of the counters. The pthread_join() ensures correct ordering between the increments and reads. But it doesn't matter what order the increments happen in, so there's no need to prevent the compiler or CPU from reordering them relative to each other (or coalescing them into fewer bigger additions, etc), and you can save the overhead of acquire/release semantics on every atomic instruction.
> By the way, is there a way to mark a sequence of two instructions such that preemption/interrupts will postpone a few cycles until the second one is executed as well? :)
I think all CPUs let you mask interrupts, to run arbitrary sequences atomically on single-core chips, but it won't be atomic on multi-core. Some (like ARM) instead let you execute a sequence of instructions, and then check afterwards whether it managed to run atomically or not (e.g. if there was an interrupt, or another core stole the cache line you were using) so you can retry until it succeeds.
Lockless algorithms for mere mortals
Posted Jul 30, 2020 16:27 UTC (Thu) by itsmycpu (subscriber, #139639) [Link]
Let's take that example. You don't need acquire/release, however you have a lot of RMWs locking and contending for the same shared cachelines or memory locations.
Define the counters in a struct, and let each thread accumulate its counts in a thread-local instance of that struct. Now it is a simple ADD instruction on local memory. At some point in time, at the end of each thread or perhaps every 5 minutes, they add their values to a global shared struct (using acqrel if they like, since it doesn't matter performance-wise, so coordination at a higher level isn't necessary.)
>> I think all CPUs let you mask interrupts, to run arbitrary sequences atomically on single-core chips, but it won't be atomic on multi-core.
Yes, it would be nice if a userspace process could do this just for the duration of two instructions, since some algorithms would require other threads to wait if they get interrupted exactly between two instructions. It doesn't need to be atomic. For example lockless FIFO lists could then be built and used without need for list reversal. Another thread would just have to re-read once or twice if the "next" pointer contains a "please-wait" value (like (void*)1).
Currently the above mentioned lockless linux/llist.h appears to require list reversal if it is to be used in a FIFO fashion.
Lockless algorithms for mere mortals
Posted Jul 30, 2020 19:48 UTC (Thu) by pbonzini (subscriber, #60935) [Link]
It still needs to be an atomic access, otherwise you have a data race with the readers. In that case you can use relaxed loads and stores (while on x86 atomic_add_relaxed would not buy you anything in terms of performance).
Relaxed loads and stores are also useful when using seqlocks (which probably should be in my list too!), and when using smp_mb() as is the case for Dekker synchronization.
I do tend to avoid relaxed RMW operations.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 1:43 UTC (Fri) by itsmycpu (subscriber, #139639) [Link]
You could have readers of thread-local data if you want to, in which case you are correct, that would require atomic stores. However the example as given does not require this, as it was said that collecting the values at the end of the thread is sufficient. So the thread itself can write the accumulated values to the shared global memory location. Or, for example, the thread writes to shared global memory each time its local counter reaches a multiple of 1,024. So the global counters are always updated with that granularity. And at the end of each thread (or some other, perhaps regular event), each thread adds the remainder.
So then, plain local non-atomic operations are sufficient most of the time, and global write access becomes so rare that its method is not performance relevant anymore, and can be acqrel RMW for a more defined behavior.
Nevertheless I agree that there are cases where "relaxed" is useful. I was also thinking of seqLocks, however I am not using them. I often use multiple copies of data (in various ways simpler than a RCU system). Like double-buffered framebuffers in graphics. One thread can locally work with one set of data, while the other thread works on another copy.
I am not familiar with the advantages of using smp_mb() in an algorithm (or Dekker synchronization). I would guess it is for cases where I would use an acqrel RMW. In the case of Linux kernel facilities, it is for me difficult to tell which ones are historically grown, and which ones might offer advantages compared to non-barrier C/C++ atomics.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 18:01 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
We recently had to defend memory_order_relaxed in the committee. Here is Hans Boehm's and my A Relaxed Guide to memory_order_relaxed [PDF] which was the basis of that defense.The “Dekker synchronization” refers to the core of this locking algorithm, which as Paolo said is set up so that if two threads each doing a store to one variable followed by a load from the other, at least one of the threads sees the other's store. This is used heavily in the Linux kernel, for example, to correctly resolve wait/wakeup races.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 22:09 UTC (Fri) by itsmycpu (subscriber, #139639) [Link]
My first impression regarding the described use cases of memory_order_relaxed is that it is worse than I thought. Not because I would doubt that there are algorithms that make good use of it (like seqLock), but because it seems really difficult to navigate the potential problems in most cases.
For example the use in reference counting increments. Even if decrements are non-relaxed, it isn't immediately obvious to me that increments can be relaxed. Although I am well aware that in the handover of a reference from thread A to thread B there needs to be a moment where an original reference is still in place, and that the atomic value itself is sequential for RMW operations like increment and decrement (which are already expensive enough), I would not be sure that thread B necessarily needs to execute the necessary barriers after incrementing the reference relaxed, and before starting to load values from the referenced object. This would seem to mean that the compiler (or the hardware) is free to move the effective load time to before the reference increment becomes effective. So the logic that prevents this is probably very tricky (at least from my point of view, unless I'm simply missing some principle that provides for verification to become much easier). My guess would be that the necessary barriers might be involved in assuring thread A that it can release its reference. But I'm not aware of a theoretical construct that would guarantee this expectation to always apply, in general, for unknown use cases.
Lockless algorithms for mere mortals
Posted Jul 31, 2020 22:47 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
On the reference count increment, the trick is that although the increment can be relaxed, the act of passing it somewhere else must be properly ordered. This proper ordering is often supplied by something else, though, such as thread-creation primitives. Whether this is worthwhile can depend on the CPU.
Lockless algorithms for mere mortals
Posted Aug 1, 2020 4:24 UTC (Sat) by itsmycpu (subscriber, #139639) [Link]
For example if thread A already increases the count before passing the pointer, in advance of decreasing its own reference, of course relaxed is sufficient. If thread B is the one to increase the count, I'd think there needs to be a direct or indirect synchronization point between B's increment and A's decrement. Otherwise each increment is atomic, but one can't be sure which one happens first. However that synchronization point doesn't have to be the counter itself. If it isn't, then the question arises if decrementing can't be relaxed as well. :) Especially if the count reaching zero can be taken as an indication that all read/write access has already been given up.
So yes, I think that's a valid example for relaxed. Had to think about it, though. :-)
Here http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n...
you wrote that a compiler is allowed to replace
while (tmp = atomic_load_explicit(a, memory_order_relaxed))
do_something_with(tmp);
with
while (tmp = atomic_load_explicit(a, memory_order_relaxed))
do_something_with(tmp);
do_something_with(tmp);
do_something_with(tmp);
do_something_with(tmp);
}
If that is still the case according to current C/C++ definitions, why don't you argue that the definition of memory_order_relaxed be refined to READ_ONCE and WRITE_ONCE? Or are you already?
Lockless algorithms for mere mortals
Posted Aug 1, 2020 12:47 UTC (Sat) by itsmycpu (subscriber, #139639) [Link]
In 2.3, it describes an atomic_thread_fence with acquire:
> The atomic_thread_fence() function can be used to order multiple sets of accesses, for example, by replacing a series of acquire loads with relaxed loads followed by an atomic_thread_fence(memory_order_acquire)
In its shortness, this sentence suggests the sequence:
1: A series of relaxed loads
2: An acquire fence
However reading https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence suggests:
1: Reading a single synchronization variable (probably atomic relaxed, as in the example)
2: An acquire fence
3: A series of relaxed or non-atomic loads.
If you don't mind me pointing it out. Probably more like a typo.
Lockless algorithms for mere mortals
Posted Aug 2, 2020 16:43 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]
I have therefore expanded this section to make step 3 explicit. Thank you for pointing this out.
There will be a P2055R1 at some point, but in the meantime you can access the LaTeX at https://github.com/paulmckrcu/WG21-relaxedguide.git.
Lockless algorithms for mere mortals
Posted Aug 1, 2020 14:32 UTC (Sat) by itsmycpu (subscriber, #139639) [Link]
If the reference counter is incremented relaxed, (for common use cases) it implies that read/write access is synchronized separately. So I'd expect that decrementing can be relaxed as well if the thread that encounters a zero reference count still has read/write access (otherwise it should be enough to use a simple load-acquire on the synchronization variable).
However the (separate) synchronization of read/write access, which can't be relaxed, probably makes the gain look small in comparison, even on those platforms where it is larger. Which makes me wonder how large it is.
Lockless algorithms for mere mortals
Posted Aug 2, 2020 16:45 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]
Of course, the results will likely vary not only across CPU families, but also within CPU families.
Lockless algorithms for mere mortals
Posted Aug 2, 2020 17:05 UTC (Sun) by itsmycpu (subscriber, #139639) [Link]
Lockless algorithms for mere mortals
Posted Aug 2, 2020 17:27 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]
There is a lot of existing code to do such measurement, however, but on the other hand creating your own can be quite instructive.
Lockless algorithms for mere mortals
Posted Aug 2, 2020 16:56 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]
It turns out that in C, all the atomic operations are already volatile, including atomic_load_explicit(), which is then pretty close to READ_ONCE().
In contrast, in C++, atomic_load_explicit() can take either a volatile or a non-volatile pointer, which means that use of C++ atomic_load_explicit() on non-volatile pointers (the common case) will be subject to the transformation called out in N4444. This working paper is for C++ rather than for C, in case you were wondering why it leaves out the C-language case.
And yes, JF and I are already calling for something like READ_ONCE() and WRITE_ONCE() in C++: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p...
Lockless algorithms for mere mortals
Posted Jul 31, 2020 17:55 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
Having per-subsystem documentation of the memory-ordering use cases of particular importance to that subsystem makes a lot of sense to me! Many of the five you call out are heavily used, but even given common documentation of a given use case, many subsystems might still want to cover special cases or how that use case interacts with the rest of the subsystem.Naming is fun. Yet another common name for Dekker synchronization is store buffering. We probably need to list the common names for each use case in the LKMM documentation.
Good point on pthread_create() and pthread_join(). The same applies to things like mod_timer() and the resulting handler function.
Your point about C11 memory_order_seq_cst does me good, given how difficult it for me and a few others to get the committee to accept that the standard should include anything in addition than sequential consistency. ;-)
Lockless algorithms for mere mortals
Posted Jul 31, 2020 21:11 UTC (Fri) by itsmycpu (subscriber, #139639) [Link]
Maybe it's a thing for college beginner class. ;)
Lockless algorithms for mere mortals
Posted Jul 31, 2020 21:27 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]
Lockless algorithms for mere mortals
Posted Aug 1, 2020 9:25 UTC (Sat) by pbonzini (subscriber, #60935) [Link]
Thank you very much!
Lockless algorithms for mere mortals
Posted Aug 2, 2020 15:39 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]