May 23, 2024
Lately I’ve been messing around with Python 3.12, discovering new features around typing and pattern matching. Combined with dataclasses, they provide support for a style of programming that I’ve employed in Kotlin and Typescript at work. That style in turn is based on what I’d do in OCaml or Haskell, like modelling data with algebraic data types. However, the more advanced concepts from Haskell — and OCaml too, I guess — don’t transfer that well to mainstream languages.
What I’m describing in this post is a trade-off that I find comfortable to use in Python, especially with the new features that I’ll describe. Much of this works nicely in Kotlin and Typescript, too, with minor adaptions.
The principles I try to use are:
map
, reduce
, or
for-comprehensions instead of for-loops
when
statements rather than
if instanceof(...)
or inheritance and dynamic dispatch
This list is not exhaustive, but I’m trying to keep this focused. Also, I’m intentionally not taking this in the direction of Haskell, with typeclass hierarchies, higher-kinded types, and so on. I don’t believe cramming such constructs in would benefit Python programs in practice. Furthermore, I won’t be talking about effects and hexagonal architecture in this post.
The examples are all type-annotated and checked with Pyright. You could do all of this without static type-checking, as far as I know.
Finally, note that I consider myself a Python rookie, as I mostly use it for small tools and scripts. The largest program I’ve written in Python is Quickstrom. This post is meant to inspire and trigger new ideas, not to tell anyone how to write Python code.
All right, let’s get started and see what’s possible!
First, let’s get some boilerplate stuff out of the way. I’m running Python 3.12. Some of the things I’ll show can be done with earlier versions, but not all.
We’ll not be using any external packages, only modules from the standard library:
from typing import *
from dataclasses import dataclass
You might not want to use wildcard imports in more serious programs, but it’s acceptable for these examples.
Let’s start with a classic example from the functional programming world: an evaluator for a simple expression-based language. It only supports a few operations in order to keep it simple. First, we the different types of expressions there are using dataclasses and a union type:
type Expr = int | bool | str | BinOp | Let | If
@dataclass
class BinOp:
"<"] | Literal[">"]
op: Literal[
lhs: Expr
rhs: Expr
@dataclass
class Let:
str
name:
value: Expr
body: Expr
@dataclass
class If:
cond: Expr
t: Expr f: Expr
The syntax for type aliases and the union type operator
(|
) are both new additions. You could create type aliases
before using regular top-level bindings, but mutually recursive types
required some types to be enclosed in strings. Otherwise, Python would
complain that the second type (e.g. BinOp
in the code
above) wasn’t defined. It’s a bit cleaner now.
Note that we existing primitive types from Python (int
,
bool
, and str
), combined with dataclasses for
complex expressions. The str
is interpreted as a reference
to a name bound in the lexical scope, not as a string literal, as we’ll
see in the following snippet.
The evaluator tracks bindings in the Env
. Evaluating an
expression results in a value that is either an integer or a
boolean.
type Env = dict[str, Value]
type Value = int | bool
Now, let’s look at the eval
function. Here we
pattern-match on the expression, which is a union. For literals, we just
return the value:
def eval(env: Env, expr: Expr) -> Value:
match expr:
case int() | bool():
return expr
...
References are looked up in the environment:
case str():
return env[expr]
Let-bindings create a new environment with the new binding:
case Let(name, value, body):
= env | {name: eval(env, value)}
new_env return eval(new_env, body)
Finally, the BinOp
and If
branches
pattern-match on the evaluated nested expressions to make sure they’re
of the correct types:
case BinOp(op, lhs, rhs):
= eval(env, lhs)
l = eval(env, rhs)
r match op, l, r:
case "<", int(), int():
return l < r
case ">", int(), int():
return l > r
case _:
raise ValueError(
f"Invalid binary operation {op} on {lhs} and {rhs}"
)
case If(cond, t, f):
match eval(env, cond):
case True:
return eval(env, t)
case False:
return eval(env, f)
case c:
raise ValueError(f"Expected bool condition, got: {c}")
All right, let’s try it out:
>>> example = Let("x", 1, If(BinOp("<", "x", 2), 42, 0))
>>> eval({}, example)
42
Nice! But this is far from a robust evaluator. If we run it with a
deep enough expression, we’d get a RecursionError
saying
that the maximum recursion depth was exceeded. This is a commonly
occurring problem when writing recursive functions in Python.1 The eval
function could
be rewritten with an explicit stack for operations and operands, but
it’s a bit fiddly.
In some cases, you can restructure a recursive function as
tail-recursive, and then manually convert it to a loop. Perhaps you
could automatically optimize tail-calls, or use a trampoline. Some
solutions avoid stack overflows, at the expense of increased heap memory
usage. In simpler cases, combinators like map
and
reduce
might suffice, instead of explicit recursion.
Either way, recursive functions and stack overflow is something to watch out for.
Since Python 3.12, it’s also much nicer to work with generic types. Previously, you had to define type variables before using them in type signatures. This felt very awkward to me.
Let’s look at an example that models a rose tree data type. To spice
it up a little, I’m including a map
function for both types
of the tree nodes. This is equivalent to fmap
in Haskell, but without the typeclass and higher-order types.
type RoseTree[T] = Branch[T] | Leaf[T]
@dataclass
class Branch[A]:
list[RoseTree[A]]
branches:
def map[B](self, f: Callable[[A], B]) -> Branch[B]:
return Branch([b.map(f) for b in self.branches])
@dataclass
class Leaf[A]:
value: A
def map[B](self, f: Callable[[A], B]) -> Leaf[B]:
return Leaf(f(self.value))
Let’s print these trees using pattern matching. Here’s a function that’s written as a loop, maintaining a list of remaining sub-trees to print:
def print_tree[T](tree: RoseTree[T]):
= [(tree, 0)]
trees while trees:
match trees.pop(0):
case Branch(branches), level:
print(" " * level * 2 + "*")
= [(branch, level + 1) for branch in branches] + trees
trees case Leaf(value), level:
print(" " * level * 2 + "- " + repr(value))
It could be even simpler using plain recursion, but then we could run into stack depth issues again. Anyway, let’s try it out:
= Branch(
example
[1),
Leaf(2),
Leaf(3), Leaf(4)]),
Branch([Leaf(5), Leaf(6)]),
Branch([Leaf(7),
Leaf(
]
)>>> print_tree(example.map(str))
*
- '1'
- '2'
*
- '3'
- '4'
*
- '5'
- '6'
- '7'
As you can see from the repr
being printed, all the
values are mapped to strings.
As a last example, I’d like to show how you can do basic structural subtyping using protocols. This is useful in cases where you don’t want to define all variants of a union in a single place. For instance, you might have many types of events that can be emitted in an application. Centrally defining each type of event adds unwanted coupling. Breaking apart a base class for events and the code that later on consumes the events decreases cohesion. In such cases, a protocol might be a better option.
We’ll need a new import:
from datetime import datetime
Now, consider the following events in module A:
@dataclass
class Increment[Time]:
id: str
time: Time
def description(self: Self) -> str:
return "Incremented counter"
@dataclass
class Reset[Time]:
id: str
time: Time
def description(self: Self) -> str:
return "Reset counter"
We made them generic just to showcase the combination of protocols
and generics. The Time
type parameter isn’t instantiated in
any other way than datetime
in this example.
In another module B — that doesn’t depend on A, and isn’t depended
upon by A — the protocol is defined, along with the
log_event
function:
class Event[Time](Protocol):
id: str
time: Time
def description(self: Self) -> str: ...
def log_event(event: Event[datetime]):
print(f"Got {event.id} at {event.time}: {event.description()}")
Increment
and Decrement
both implement the
Event
protocol by virtue of being structurally compatible.
They can both be passed to log_event
:
log_event("foo", datetime.now()),
Increment( )
If we annotate the Event
protocol with
@runtime_checkable
, we can check it with
isinstance
and use it in match cases:
@runtime_checkable
class Event[Time](Protocol):
...
def log(x: Any):
match x:
case Event() if isinstance(datetime, x.time):
log_event(x)case _:
print(x)
Pretty neat!
That’s all I have for now. Maybe more Python hacking and blog posts will pop up if there’s interest. I’m positive to the evolution of Python and functional programming, as it’s something I use quite regularly.
Join the discussion on Twitter, Hacker News, or Lobsters.
Thank you @tusharisanerd for reviewing a draft of this post.
See On Recursion, Continuations and Trampolines for more in-depth explanations of various solutions to recursive functions and the stack.↩︎