Why Object-Oriented Languages Need Tail Calls

The Fortress blog has a recent post, Why Object-Oriented Languages Need Tail Calls, where Guy Steele argues for the necessity of proper tail call implementations without rehashing two of the classic arguments: state machines and the continuation passing style. It starts by mentioning William Cook's On Understanding Data Abstraction, Revisited:

In this blog post we extend one of his examples in order to make a completely different point: object-oriented programming languages need tail calls correctly implemented, not just as a "trivial and optional" optimization of the use of stack space that could be achieved by using iteration statements, but in order to preserve object-oriented abstractions.

The post also mentions other papers previously discussed on LtU: Automata as Macros, and A Tail-Recursive Machine with Stack Inspection.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

So apparently they don't

So apparently they don't think TCO is a red herring...

haha

apparently not. ;-)

Similar

Matthias Felleisen gave a presentation in 2004 titled Functional Objects in which he argued something similar. Slide 59: "Object-oriented programming in languages that don't require tail-call optimizations makes no sense". He then quotes an email correspondence with Guy Steele and concludes that Scheme was led to require TCO by studying OO examples.

What does Tail Call Optimization mean to you?

I'm a bit curious as to how different minds understand 'Tail Call Optimization' (TCO).

Consider a few cases:

  • In what I'll call 'traditional TCO', activation records are allocated on a stack and under certain conditions (tail-calls) one will reuse the activation record. Thus the stack doesn't actually grow (except perhaps if the new activation record is larger than the prior one).
  • As an alternative to the stack the activation records are created on the heap. For performance, this is usually in a clean nursery-region of the heap where one can increment a pointer to perform an allocation. One typically compiles into continuation-passing style, and each continuation therefore serves as a thread. A copying garbage collector ensures there is always clean space for a new nursery. For lack of a better name, I'll call this the 'generic no-stack' design.
  • From the generic no-stack design, one might express a 'generic TCO' which simply reuses the same activation-record - no matter where it is allocated (i.e. on the stack or the heap). However, with CPS this isn't very useful (excepting to avoid a simple allocation) since the continuation is no longer implicit to the stack.
  • In Chicken Scheme, the 'no-stack' approach is adapted to work with C as a target language. The data uses automatic allocation on the C stack, which will grow with each function call... but at a certain size (default 64 kB, IIRC, checked via a pointer comparison) a small copying-GC moves the important items from the C stack to the heap, then a longjump resets the stack with the new continuation. The C stack serves as a nursery, and the C language also implements the various compiled functions.
  • Another variation on the no-stack design is Stallman's 'Phantom Stacks'. Unlike the generic (CPS) variation, one will occasionally return to an older activation record before continuing the computation, thus old activation records represent both history and futures - as per the traditional 'stack'. If a dangerous reference is created (a pointer from heap to stack, or a pointer from earlier in stack to later in stack as per upfunarg) a copying collection is performed immediately and a new stack is created. ... IIRC, Baker also produced something similar, using particular arrangements of pointers - critically, with the stack growing away from the heap - to reduce to a simple comparison the detection of dangerous references.

There are probably many more alternatives sitting in the collective brains of the LtU community.

I'm a rather pedantic guy, so I don't consider any of these excepting 'traditional TCO' - and perhaps I'd grant the 'generic TCO' variation - to actually be TCO.

That is, I see TCO as a particular mechanism to support a useful performance property: partial decoupling of the space-performance from the time-performance of an activity.

I'm somewhat curious as to how other people would call it. In my view, TCO isn't necessary... because there are other approaches to achieving the same, useful property. But I suspect others might say, 'meh, don't be so pedantic - you're getting TCO by another name...'.

But there are some other differences in the practical implementation properties. The no-stack approach with CPS can in some cases collect more space than does TCO on a stack, especially when working with conditional branches. Avoiding use of a stack also keeps the overhead for new continuations very small - i.e. a single activation record, added to the scheduler's queue - which is excellent for fine-grained task-parallelism.

Further, with the generic no-stack approach, true history can be kept as a simple list of earlier activation records... i.e. the records are not serving many duties or much reuse, and one can even remove mutable state from them.

Use of 'generic TCO' might save a bit on allocations and collections, and still might serve as a useful optimization... but is not nearly so useful as 'traditional TCO', and may be a bit difficult to reconcile with other advanced features (such as generational garbage collection, debugger histories, or persistent memory).

Good question

In fact, I think this is a great question. I would actually prefer a different definition, which I will call "semantic TCO":

A language has TCO if all storage allocated in the course of a function call can be reclaimed as soon as it becomes unreachable according to the language semantics. (A language fails to have TCO exactly when there is some such storage that cannot be reclaimed even though there is no way to reach it in any subsequent execution trace.) The mechanism by which this is achieved is irrelevant.

Note that this definition is fundamentally connected with garbage collection. This doesn't rule out its application to C-like languages, because I believe that C's stack discipline is indeed an extremely limited form of GC.

Languages like Java are interesting under this definition, because they do allow for various ways of accessing stack frames (security checks, reifying stack traces during exception handling, etc.), even though the according to the language semantics per se, everything in those stack frames is garbage. So are these stack frames garbage according to the language semantics or not? As far as I know, this question is basically the big sticking point for TCO on the JVM.

Semantic TCO FTW

"semantic TCO" ... The mechanism by which this is achieved is irrelevant.

I'm 100% with Matt on this one.

I've long argued that TCO is a legitimate language definition issue and not merely an implementational one.

Quite. But my feeling is

Quite. But my feeling is that this (semantic perspective) is clear to most of those arguing for TCO, even if not for the opposition.

Yet more pluralism

this (semantic perspective) is clear to most of those arguing for TCO, even if not for the opposition

I'm not sure that I interpret Tim Bray's comment as opposition to TCO as much as acceptance that its absence is a practical limitation that can be worked around.

He seems to be arguing (even in his comment on Guy's post) that, given the limitations of the JVM, Clojure (and a similar argument can be made for Scala) has made do pretty well without the world ending as some might predict.

Steele, Felleisen and Cook are presenting a "hard-line" definition of OO that isn't adhered to by most who practice it, and it feels like a bit of a straw-man to me, even if I may agree with them that the PL world might be a better place if their perspective was a adopted as canonical.

We all have to do the best that we can in the world we actually live in, and whereas I would consider rejecting the superiority of semantic TCO outright to be a sin, being prepared to work around not having it the best you can is not. ;-)

I can accept this. I don't

I can accept this. I don't see what you wrote as contradicting the quote you started your post with, though.

LtU's Most Wanted

I don't see what you wrote as contradicting the quote you started your post with, though.

It depends on who you meant as the "opposition" in that quote. ;-)

That's what I figured...

That's what I figured...

(more here)

(more here)

Semantic TCO

Well, that's a position I can understand and find rather agreeable, Matt.

That said, I don't believe 'semantic TCO' has ever been implemented, at least not as you define it. In the general case, we cannot know exactly when storage will become unreachable according to language semantics... all sorts of neat little mathematical theorems (Rice's theorem, Halting problem, etc.) tell us we cannot know.

And that makes semantic TCO 'idealistic' - which is probably why I find it agreeable. ;-)

Re: don't believe 'semantic TCO' has ever been implemented

Shao and Appel came close in Efficient and Safe-for-Space Closure Conversion with their definition of "safe for space complexity" and related heuristics for the SML/NJ compiler.

Clinger also describes safe-for-space in Proper Tail Recursion and Space Efficiency

Clinger

Thanks for the references. I'll note that Clinger's article makes a distinction between 'Proper Tail Recursion' and 'Tail Call Optimization'. From the abstract:

Scheme requires implementations to be properly tail recursive. This ensures that portable code can rely upon the space efficiency of continuation-passing style and other idioms. On its face, proper tail recursion concerns the efficiency of procedure calls that occur within a tail context. When examined closely, proper tail recursion also depends upon the fact that garbage collection can be asymptotically more space-efficient than Algol-like stack allocation. Proper tail recursion is not the same as ad hoc tail call optimization in stack-based languages. Proper tail recursion often precludes stack allocation of variables, but yields a well-defined asymptotic space complexity that can be relied upon by portable programs.

I think it quite legit to distinguish a guarantee on the space-efficiency of certain expressions from a particular peephole optimization. But, in this light, my initial question was a bit misleading. I asked about TCO, whereas the OP is almost certainly aimed at "[Proper] Tail Calls".

I'm a bit confused

I'm not sure I agree with this. Language semantics typically make it clear precisely which storage is reachable at any given time. Of course not all of that storage will be accessed in all possible executions, and of course we can't know which is which. But at any time, the potentially reachable storage is well-defined. (This is why the generic no-stack design works, isn't it?)

(Obviously this does not apply if the program does funny stuff like casting ints to pointers, but if the semantics allows such tricks, then all storage is always trivially reachable and no GC is ever safe.)

But I could be missing something, or misunderstanding you. And anyway this just puts it in the same boat as the rest of GC, which certainly works well enough for most purposes.

Not so obvious...

Language semantics typically make it clear precisely which storage is reachable at any given time.

Not quite. Language semantics are generally small-step, which considerably obfuscates exactly when storage cannot possibly be accessed.

This should be something you know trivially. Indeed, we can reduce a question of whether storage "cannot be reclaimed even though there is no way to reach it in any subsequent execution trace" to the Halting Problem. Quite easily, too:

{
 X = new Storage();
 NobodyKnowsWhetherIHalt();
 DoSomethingWith(X);
}

The storage above (allocated over the course of the function call) includes both X and the stack-frame containing space for a reference to X. In general, neither can be collected, even if they are unreachable.

The generic no-stack approach isn't so ambitious as your definition requires for semantic TCO.

Interestingly, your definition of semantic TCO doesn't mention anything about 'tail calls' and whatnot. I'll display your words again, with emphasis by an admitted pedant:

A language has [semantic] TCO if all storage allocated in the course of a function call can be reclaimed as soon as it becomes unreachable according to the language semantics. (A language fails to have TCO exactly when there is some such storage that cannot be reclaimed even though there is no way to reach it in any subsequent execution trace.)

This definition, in a quick summary, is: "Semantic TCO is perfect GC that also solves the halting problem." Can't get much more idealistic than that. ;-)

"live" vs "reachable"

I think this disagreement might just be over terminology. I've seen articles on garbage collection that use the term "live" to describe data that will be accessed in the future (i.e. an algorithm to determine if something is live would essentially be solving the halting problem).

The notion of reachability is then defined in some simple way (i.e. there is a path of pointers from the root set to this object) such that "live" is a subset.

This is what I meant

I may not have said it right, but the second paragraph of Kannan's post is indeed what I had in mind. (Not that it necessarily has to be defined in terms of following pointers. The language semantics could define it in terms of visible names, or some other way.)

Scope adjustment

"Semantic TCO is perfect GC that also solves the halting problem."

I think you are attributing too large a scope to Matt's statement.
If you add the obvious qualification:

"even though there is provably no way to reach it in any subsequent execution trace"

it is clear it is a much more modest and practical claim than you are suggesting.

Granted.

Granted. But, of course, that raises the question: provable by which prover?

Example


{
 X = new Storage();
 NobodyKnowsWhetherIHalt();
 DoSomethingWith(X);
}

This is a good example. In this case, the language semantics presumably say that a function call may (a) return, (b) terminate through an unconditional control transfer (e.g., throw an exception) or (c) fail to terminate. So we have a number of possible executions to consider. In one of these executions, X is used, so X is still reachable and is not garbage.

That's all I meant.

Interestingly, though, sometimes language semantics are enough to restrict this. For instance, Java uses this same set of properties to determine when dead code must be removed: the compiler will reject:

  int x;
  return;
  println(x);

And in Scala, a method returning type Nothing is known never to return. Therefore any subsequent code is dead, the stack frame is garbage and the call could be compiled as a tail call. This is a nice example of the semantics determining the reachability property.

Anyway, I didn't have in mind anything as powerful as what you suggest. Sorry if I was unclear. Although as you mention it, this would be the perfect ideal definition of TCO, just as one could imagine a perfect oracular GC that retains only storage that will eventually be used.

Question begging

Language semantics typically make it clear precisely which storage is reachable at any given time.

What if a language defines entries on the stack to be reachable until the stack frame is popped, as C does? Does C do TCO under this definition?

Good point.

Good point. C is a funny case, because all storage is always reachable. So I suppose you could say that C has perfect GC as well, in a trivial sense, although I agree that this feels extremely unsatisfying.

But actually, I'm not sure if C really says this as part of the language semantics. Certainly there is no syntax to access that storage, although you can do things like this:

void test(int x) {
  void *back = (&x - 1);
  int outer_int = *back;
  ...
}

But I'm pretty sure that the outcome of that is actually undefined.

Come to think of it, I don't even think C says whether the stack grows up or down, and I definitely don't think the standard prescribes a layout for activation records... So I'm not really sure whether this is such a problem.

Not exactly my point

My point doesn't rely on sneaky pointer casting C code. It was that passing the buck to a language defined notion of "reachability" doesn't seem to accomplish much. If you start with a definition of reachability such that:

void foo() {
    SomeStruct s;
    bar();
} /* s becomes unreachable here, by definition */

then C is doing TCO in this situation. To fix this we must add the requirement that any definition of reachability should respect tail calls. And then we're back where we started.

I see what you mean

I see what you mean, but it feels like cheating. I would expect that reachability should be a property derived from the meanings of the actual language constructs, not something explicitly bolted onto the side.

Like, you can say that "s is reachable until here," but you don't actually provide any way to observe that. I'm not sure how to make the notion of observation precise in terms of denotations or whatever, because I am far from knowledgeable about formal semantics. But I hope you see what I mean.

But anyway, yes, I see what you're saying and I agree that it's a little weaselly to define TCO in this way. But I still like it, as a fuzzy notion, because it captures the idea that implementations without TCO are, in some sense, doing something extra, while implementations with TCO are, again in some sense, simply collecting all the garbage. Most other definitions make TCO seem like something extra, which is a crucial difference in viewpoint between advocates and detractors.

One more poke

I'm not sure that there's an easy way to formalize this, since reachability is a more or less ad hoc approximation to liveness. Here's another potential problem: does this definition require more than we normally expect out of TCO?

In a strict language, with a function definition such as,

foo x = let s = buildBigStruct x in foo (bar (baz s))

I'd expect TCO to only require freeing 's' before transferring control to 'foo'. But under your definition, we might expect TCO to require freeing it before the call to 'bar'.

Yes, it sets the bar higher.

Yes, it sets the bar higher than some of the other definitions. But I think some of the program transformations to implement TCO may already achieve this. The generic no-stack design with CPS, for example, might already consider s garbage before the call to bar. I haven't thought about it in detail.

I'm not sure that there's an easy way to formalize this, since reachability is a more or less ad hoc approximation to liveness.

I'm not sure... I mean, how is this different from GC in general? I'm looking through R5RS and I don't see any explicit definition of storage reachability, yet people do manage to correctly implement garbage-collected Schemes.

I'm genuinely surprised by your point of view. It has never occurred to me that the notion of what was/wasn't garbage was ill-defined or an approximation of anything. It's a superset, sure, of "stuff the program will actually use," but I wouldn't say it's an approximation. I still wonder if we're using "reachable" in the same way.

(Except in the case of conservative GC, but that's a different story. That only happens when you can't tell what is or isn't a pointer.)

Clinger's Paper

This discussion would be well-served by reading Will Clinger's paper "Proper tail recursion and space efficiency", which answers all of these questions in a simple way. Also, Morisset, Felleisen and Harper's paper "Abstract Models of Memory Management" is on point as well.

Good references

I've seen both of those, but admit that it's been awhile. Thanks for reminding me that speculation is the poor cousin of study. ;)

How is this different from GC in general?

The simplest explanation for when something is no longer reachable is when there is no way to reach it from the symbols that are in scope. This definition is not sufficient to explain TCO. You're arguing for garbage to be identified in some situations where symbols, though still in scope, are provably no longer be used.

Edit: My use of 'in scope' is probably confusing here. I just mean that in e.g. Java symbols 'go out of scope' when the block they are in is exited.

I see what you mean.

I see what you mean.

Early GC

The generic no-stack design with CPS, for example, might already consider s garbage before the call to bar.

It certainly would.

Indeed, the generic no-stack design with CPS might consider part or all of s to be garbage long before completing the call to 'baz'. The ability to collect just the unused parts of s is especially impressive relative to normal TCO, though one could combine TCO with something like invalidating references that won't be used anymore to achieve similar effects.

The generic no-stack CPS approach can often collect far more than could traditional TCO.

The main problem with the no-stack approach: it is usually difficult to achieve equal performance to the stack-based design - due largely to CPU caching and locality of reference issues. One would need some way to GC just the nursery when it gets full, without tracing very much else, or alternatively GC the nursery whenever a reference is created from outside the nursery.

Right.

This whole issue has always seemed to me to be mostly about tradeoffs between the stackless CPS approach (which AFAICS has pretty much ideal semantics) and the traditional contiguous stack, which is almost forced on you by modern hardware.

Alternatively, you could separate return addresses from everything else like Forth... There's a price to pay for that, too, but it would be a shame not to mention it in this context. Forth continues to be worthy of study.

Reachability and Scoping Related?

But the value s could not be accessed by bar(). Reachable at a point means to me that the value can be referred to by an expression at the point being discussed. Of course a pointer to s being passed to bar complicates the matter. Does this somehow hint scoping and reachability are related?

Reachability

I'm not aware of a formal or universal definition of reachability. My informal understanding is that it's an implementation property that answers whether you can get to a certain value by pointer chasing from the stack. This is why I see a circularity in defining TCO in terms of reachability - TCO is supposed to place restrictions on what can be on the stack.

Semantic TCO trouble

A language has TCO if all storage allocated in the course of a function call can be reclaimed as soon as it becomes unreachable according to the language semantics.

Careful with that "all"! Consider a closure that contains the sites binding variables i and j. Both are used in the lifetime of the closure, but the semantics of the language shows that the lifetime of i is short while the lifetime of j is long. Does semantic TCO require that i be freed while the closure is still needed? It sounds like more trouble than it could ever be worth.

Sometimes

Sometimes variables must be cleared to preserve the "Safe-for-Space" property, which is stronger than the "Properly tail recursive" property. There are a variety of complexity classes defined in Clinger's paper - I recommend it.

I understand it as ensuring

I understand it as ensuring that a call in tail position always consumes asymptotically no space, i.e. an arbitrary number of tail calls
can be executed sequentially. This is a slight weakening of how Scheme defines it.

Goto Start

From Tim Bray's blog:

Why I Hate Tail Call Recursion Optimization · It feels like a big fat lie. It looks like a subroutine call, but in the case where it occurs as the last thing in the routine, it magically, silently, and automatically gets turned into, now how did I put it? “A highly controlled and structured GOTO.”

If that’s what the programmer wants, then that’s what the programmer should bloody well ask for. I’ve always been irritated by the explanations, in the Erlang and Lisp introductory books, about how you must be careful to ensure that your recursive calls are the last thing executed, and I’ve observed this leading sometimes to some gnarly unintuitive if/else branch placement. Call me old-fashioned, but it just seems wrong when the semantics of two identical function invocations vary wildly as a function of whether there’s an executable statement between the invocation and the function exit.

So the discussion boils down to: 1) Should a language support TCO? 2) Should TCs be made explicit in the language?

My take on (1): put it in the language specification. My take on (2): Yeah, well, if you feel like it, but why not support "goto" then?

Why I hate TCO discussions: TCO is a minor compiler optimization in languages which use a stack to implement calls. That's it. The discussion should have been dropped around three centuries ago.

[Edit: Guy Steele makes an interesting point, but it seems to boil down to: Recursion is intuitively the best answer.]

Making TCO explicit

If you are a tenant of the "TCO is just an optimisation" camp, then there is no incentive to make it explicit.

On the other hand, if you believe that TCO is necessary for the semantics of the language (as argued in the original post), then it would make a lot of sense to make it explicit. If nothing else, it would allow the compiler/runtime to tell you "that call is marked as tail-call, but there is no way it could be so go and rewrite this code properly".

The recursive call syntax is so convenient though. I would not trade it for goto.

Explicit where?

Or the approach of Scheme could be taken where tail position is very precisely defined and that is exactly what must be optimized. Checking annotations for tail position should not be what compilers are doing because it's redundant. If someone doesn't know what is and what isn't tail position, rewriting the code correctly to make it tail recursive is probably beyond their abilities. The optimization is explicit in the
language definition, and so the code doesn't need to mention.

The optimization is explicit

The optimization is explicit in the language definition, and so the code doesn't need to mention.

The annotation should be there because it makes the code more robust to refactoring.

Two sides

As a property to be checked, I agree. As a pragma for optimization, I expect it would have the same future as inline.

Yes, I believe such an

Yes, I believe such an annotation is useful for documenting and ensuring the expected space usage of a call and maintaining this property in the presence of refactoring.

Disagree with your first point

If you are a tenant of the "TCO is just an optimisation" camp, then there is no incentive to make it explicit.

I disagree: as it is a very fragile optimisation, if you don't put the operation in the 'right order' you'll blow the stack..
An optimisation is only useful if it is a 'stable' optimisation.

So a keyword is useful for the compiler (as you said) *and* useful for documentation: when the next maintainer will ask himself 'Why is this code unreadable?' he can see that this was because the original developer wanted to have TCO.

RE: Why not Goto?

Should TCs be made explicit in the language? Yeah, well, if you feel like it, but why not support "goto" then?

Unless you had something akin to 'goto funcName(ArgumentA,ArgumentB,...)', where the return-value from funcName becomes the return-value for the current procedure, gotos simply aren't as expressive as tail-calls. And, at that point, you might as well dodge a lot of confused C/C++ programmers and call it an explicit 'tail-call'.

A typical use of 'goto' in structured languages cannot be argumented, cannot leave a certain scope. This severely limits their utility... i.e. you cannot use gotos for mutual recursion between first-class functions.

Using goto, for, or while for looping requires violating single-assignment for anything non-trivial. But violating single-assignment discipline on the stack harms other features - especially support for task-parallelism (i.e. spawning off an expression), lazy or delayed evaluations, and first-class closures. You are cut off from these other powerful features before you even discover you want them.

The explicit tail-calls do not suffer these problems.

goto considered helpful

Guy Steele proposed exactly the syntax in your first line for explicit proper tail calls in Java, in 1996. (This according to his comment here -- first comment to Bray's blog post.)

I would love to see this kind of syntax

Having spent a long time trying to shoehorn coroutines into C-style stack-oriented languages, I would love to see more languages support a function call mechanism that replaces the current stack frame with another one. Why didn't it make it into Java, I wonder?

Implicit v Explicit Tail Calls

Honestly, I'm somewhat agnostic regarding implicit versus explicit tail calls, although I lean towards implicit. I can certainly live with explicit tail calls in the case of a Python or Java-like syntax, which already require a "return" statement. However, having to explicitly mark my tail calls in Scheme or ML would drive me absolutely batty.

Tim Bray, GvR, and others have argued against proper handling of implicit tail calls because it's a "magical" compiler optimization that might not work when you need it. I really don't see how this part of the argument holds much water, because for me personally, 99% of the time it takes absolutely no effort to determine whether or not any given call is a tail call, and in 99% of the remaining 1%, it is still easy.

So I think that this aspect of the argument really stems from either 1) a lack of understanding of tail calls, 2) a language implementation, such as Scala or Kawa, whose object code targets an environment hostile to proper tail calls, such as the JVM, and as a result proper tail call handling is fragile, 3) language features, such as post-increment operators, that interact with tail calls in subtle ways, and/or 4) growing up around less than reliable compilers.

But of course, there are languages, such as ML and Scheme, where it's really pretty easy to see whether or not a call is a tail call, with compilers that handle tail calls robustly.

tail or fail?

OK, assume these forms are in tail position. Which of the f* are tail calls in Clojure?

(if (g a) (f1 a) (f2 a))

(when (g a) (f3 a) (f4 a))

(whenever (g a) (f5 a))

Hint: whenever is not defined in Clojure core.

Macros and tail position

f1, f2, and f4 are tail calls, but f3 is not. The first two lines take no effort at all.

I don't know about f5, because I don't know what the "whenever" macro does. But if I knew, then it would almost certainly be easy.

I do define and use macros in my own code, but I tend to do so rather sparingly. I admit that one can construct pathological macros, but that doesn't mean that one should use them in production code. :-)

OO does not need TCO

Steele's argument bugs me. I don't think it is quite right: We can construct a Cooke style OO language that does not have TCO yet which can solve the integer set problem Steele poses.

One way to do that is with some notions of types, indefinitely delayed values, and automatic coercions. This amounts to a way to add explicit trampolines to the OO language:

Given a type T (an object providing the interface T) we can derive a second, recursive type: Delayed[T]. A delayed value can be produced by supplying an object, method, and parameters. An operator we can use is ForceStep which takes a delayed value and invokes the method. The method may return either a value of type T or of type Delayed[T]. The operator Force iterates the operator ForceStep until a value of type T is produced (and then returns that value).

We can add a rule of implicit coercion: if a value of type Delayed[T] is produced in a context that demands a value of type T, the Force operator is implicitly applied. Thus, for example, if my program says:

Boolean x = someset.member(y);

the "member" method of "someset" might include, as one branch of a conditional:

return delayed otherset.member(y);

and the effect of TCO is achieved but without TCO.

The effect of TCO is thus achieved, basically, by a trampoline. We have not added TCO to the language. Instead, we've added delayed values. Delayed values are interesting in so far as they are useful for more than just optimizing tail calls.

I think that points to the weakness of even trying to make the argument Steele is trying to make: Cooke's notion of OO is about what high-level, abstract concepts should be present in a language in order to use a certain programming style; TCO is about the low-level operational characteristics of implementations of function calls or method invocations. It would be surprising if the high level concepts so constrained implementations that any language offering those high level concepts simply *had* to include TCO. In the end, I think Steele is left with a circular argument that if you have an OO language that requires TCO as the only reasonable solution to the integer set problem, then it requires TCO.

One could counter-argue that delayed values are "ugly" where as optimized tail calls are "elegant". In that sense, perhaps, OO "needs" TCO as in "makes programmers wish for it". That seems like a far less interesting claim, though, especially since delayed values strike me as not ugly and more general.

What the alternatives of TCO and delayed values suggest, however, is that in any event, if your language includes the concept of activation records, there should be some mechanism by which code can create a new activation record invoked after the current activation record is destroyed, yet whose activation produces the value expected to be returned and continues the computation expected of the current activation record. It's just not necessarily the case that this has to involve a compiler or interpreter recognizing tail calls and treating them specially.

It's the same

But if I then specify each tail call as being delayed, I achieve TCO. Meanwhile we need to dynamically check for Delay[T] and take care of it.
To optimize this dynamic check out of the way, I would introduce a tail call primitive and have the Delay[T] use that when coercion is immediate. What's more, Delay[T] only has an effect in the semantics you gave when it is used in a tail call. In a fictitious typed Scheme

x:[a]
y:Delay a
z=(cons y x)

y is forced at the cons. So what we have is a very restricted kind of lazy that is weaker then tail calls.

delayed vals != TCO

But if I then specify each tail call as being delayed, I achieve TCO.

Not so, I think. Consider a call:

    Any x = s.member(y);

If the Any type includes delayed values, there is no implicit coercion by Force, hence adding a delayed to a return statement in a member method is not the same as TCO.

Meanwhile we need to dynamically check for Delay[T] and take care of it.

Hmm. Well, not necessarily. A "return address" could be regarded as an array of code addresses, indexed by permissible types. If the callee statically knows what type it is returning, it simply returns to that particular piece of code - there is no run-time type check involved. Such a facility would indeed be an optimization, but not quite a TCO.

I would introduce a tail call primitive and have the Delay[T] use that when coercion is immediate.

In the system I've sketched, consider this code:

    return delayed otherset.member(y);

For some callers, otherset.member(y) must be immediately forced. For other callers, it is appropriate to return the delayed value (perhaps to be forced much later). Therefore, a "tail call primitive" would not do the trick here.

Indeed, in the system I've sketched there is nothing particularly special about tail position calls at all - which is perhaps why it is hard to see through TCO-colored goggles :-)

In a fictitious typed Scheme

x:[a]
y:Delay a
z=(cons y x)

y is forced at the cons. So what we have is a very restricted kind of lazy that is weaker then tail calls.

I see no good reason why cons should force the value of y here. If we're talking about something Schemish, I would expect cons to take two arguments of some Any type, where that type includes all delayed values.

It's true that delayed and forcing (implicit or otherwise) are about lazy computation - but I don't see how they are weaker than tail calls.

I do admit that there is a real question there. You could be right and I don't see it. I'm certainly handwaving about my type system, making up details as I go along to support my argument. I don't think I've made up any details that lead to an inconsistent type system. I'd say the burden of proof one way or the other is "shared" in the face of what I'm willing to call my "hypothesized" type system.

A flaw

You talk about objects of Delayed[T] having operators ForceStep and Force. Exposing ForceStep, though, qualitatively changes the interface. It allows you to ask encapsulation-breaking questions such as "Do I have a value after 5 iterations of ForceStep on the .member return value?"

So maybe you try to hide ForceStep and expose only Force. You could use the type system to hide ForceStep ADT-style, but then we're not using only pure objects in accordance with the rules of the exercise. So we really need to return an object that only supports Force, and not ForceStep, which is easily done ... with TCO :).

flaw?

Being paranoid that I'm just not "getting the joke" I half-suspect you are just being silly but, if not.... would it also be encapsulation-breaking if I allowed programs to reflect on the amount of measured CPU time a given method call takes? Does it break encapsulation if a method invoker can figure out if a method invoked has any side effects? With delayed values I haven't exposed any of the "slot structure" internal to any objects so I think I pass Cooke's exercise.

Rhetorical answers

Would it also be encapsulation-breaking if I allowed programs to reflect on the amount of measured CPU time a given method call takes?

Yes. This breach of encapsulation is a common source of security vulnerabilities, too, though it is admittedly difficult to prevent in practice. One foolproof method that's occasionally practical: don't give the program a clock.

Does it break encapsulation if a method invoker can figure out if a method invoked has any side effects?

This presumes there are side-effecting functions to begin with. Or should I say, "with which to begin"?

It sounds like we're approaching this in different ways, but I wasn't joking or just being silly. I do consider this a theoretical flaw with your encoding. But if you're happy with it as a practical matter, who am I to judge? I agree it's pretty close :)

Proper Implementation of Tail Calls is a Semantic Property

You have just re-invented the trampoline, a method for implementing proper tail calls, along with an explicit notation for invoking them.

The fact that it uses a different implementation is irrelevant - proper implementation of tail calls is a property of the space usage of programs, not of stack layouts.

Wrong

TCO is there to reduce stack consumption on (recursive) calls. There's no need for TCO if your language translates to, for example, hypercombinators since there is no -or need not be a- stack since it is unnecessary to push local contexts.

This is not correct

First, proper implementation of tail calls has no essential connection to recursion - consider CPS.

Second, the property is about the asymptotic space consumption of programs, not about stacks, contexts, or calls. See Will Clinger's paper on this, which I cited up-thread, which is explicitly cited by the RnRS as the definition of this property. A hypercombinator-based implementation strategy may or may not meet this requirement, regardless of stacks.

We Bad?

First, proper implementation of tail calls has no essential connection to recursion - consider CPS.

I agree, hence the parentheses.

Second, the property is about the asymptotic space consumption of programs, not about stacks, contexts, or calls. See Will Clinger's paper on this, which I cited up-thread, which is explicitly cited by the RnRS as the definition of this property. A hypercombinator-based implementation strategy may or may not meet this requirement, regardless of stacks.

I guess the asymptotic space consumption you refer to is with regard to local contexts build up? In that regard, I guess I agree with the other poster, it depends on our definitions - we are both right.

I will argue however that TCO should be defined as just a technique on stack machines (replacing a call with a goto when the local context can be discarded), and your definition is, well, something else?

But granted, I'll read your reference.

I agree with Sam, and besides...

"What is the right definition of TCO?" is pretty much the whole point of this discussion. So it's not really fair to say "wrong" by simply appealing to a different definition. It's sort of begging the question...

"reinventing" trampolines?

I don't think it fair to accuse me saying "You have just re-invented the trampoline [....]" given that I started my explanation by saying "This amounts to a way to add explicit trampolines to the OO language:"

A fair question is to ask what distinction I am making between delayed values, with coercion by forcing, and TCO. There are multiple ways to answer, including: (1) TCO implies an interpreter (or compiler) treating calls in the tail position specially - not so with delayed values; (2) TCO means something along the lines of Clinger's definition whereas delayed values can be defined without reference to tail expressions at all; also, a language with delayed values is not required to be "properly tail recursive" (yet it solves the challenge problem Steele poses); (3) delayed values are useful elsewhere besides tail positions and otherwise besides being immediately forced when returned by a procedure - they are more general.

Thus, TCO and delayed values are very different language features. The demonstration that delayed values can conveniently solve the problem Steele poses is an existence proof that he is mistaken to say that "OO requires TCO".

Microsoft not interested for C#

Microsoft aren't planning to do this for C# and don't see why they should.

I'd like to be able to have recursive methods TCO'd (explicitly would be fine) simply to have the compiler do the work of making them iterative and allowing my code to remain readable.

Blub Paradox?

The page you linked made this argument (among others):

There are few real C# methods where it would be legal to emit tail calls (other languages encourage coding patterns which have more tail recursion, and many that rely heavily on tail call optimization actually do global re-writing (such as Continuation Passing transformations) to increase the amount of tail recursion).

This strikes me as a Blub Paradox argument against supporting a feature: Why would I need it? Where am I using it?

If C# programmers knew they could count upon a tail-call optimization, I suspect you'd see a lot more real methods where it's legal to emit tail-calls. The reason you don't see it as necessary? perhaps because programmers are doing the extra labor of working around not having it.

C/C++ programmers often make a similar argument against garbage-collection, and do so even as they struggle to properly identify and collect garbage while integrating various forms of concurrency and callbacks from different frameworks. From the outside, this struggle is reason enough for the feature. From the inside, the programmers involved see the final victories of the struggle as proof they do not need it.

That isn't to say there aren't other decent arguments against emitting .tail from C#. The implementation is, as they note, often non-trivial in its overhead.

Mirror of the article

The original URL for the article seems to no longer be responding. I've managed to dig out an Internet Archive copy, however, and have put up a mirror of it here: http://www.eighty-twenty.org/index.cgi/tech/oo-tail-calls-20111001.html.