The Artima Developer Community
Sponsored Link

All Things Pythonic
Python 3000 Status Update (Long!)
by Guido van van Rossum
June 19, 2007
Summary
Here's a long-awaited update on where the Python 3000 project stands. We're looking at a modest two months of schedule slip, and many exciting new features. I'll be presenting this in person several times over the next two months.

Advertisement

Project Overview

Early History

The first time I came up with the idea of Python 3000 was probably at a Python conference in the year 2000. The name was a take on Windows 2000. For a long time there wasn't much more than a list of regrets and flaws that were impossible to fix without breaking backwards compatibility. The idea was that Python 3000 would be the first Python release to give up backwards compatibility in favor of making it the best language going forward.

Recent History

Maybe a year and a half ago (not coincidentally around the time I started working for Google, which gave me more time for work on Python than I had had in a long time) I decided it was time to start designing and planning Python 3000 for real. Together with the Python developer and user community I came up with a Plan. We created a new series of PEPs (Python Enhancement Proposals) whose numbers started with 3000. There was a PEP 3000 already, maintained by others in the community, which was mostly a laundry list of ideas that had been brought up as suitable for implementation in Python 3000. This was renamed to PEP 3100; PEP 3000 became the document describing the philosophy and schedule of the project.

Since then, we have, well, perhaps not moved mountains, but certainly a lot of water has flowed under the bridge of the python-dev mailing list, and later the separate python-3000 mailing list.

Tentative Schedule

A schedule was first published around a year ago; we were aiming for a first 3.0 alpha release by the end of the first half of 2007, with a final 3.0 release a year later. (Python 3.0 will be the version when it is released; "Python 3000" or "Py3k" is the project's code name.)

This schedule has slipped a bit; we're now looking at a first alpha by the end of August, and the final release is moved up by the same amount. (The schedule slip is largely due to the amount of work resulting from the transition to all-Unicode text strings and mutable raw bytes arrays. Perhaps I also haven't delegated enough of the work to other developers; a mistake I am frantically trying to correct.)

Python 2.6

There will be a "companion" release of Python 2.6, scheduled to be released a few months before 3.0, with an alpha release about 4 months before then (i.e., well after the first 3.0 alpha). The next two sections explain its role. If you're not interested in living on the bleeding edge, 2.6 is going to be next version of Python you'll be using, and it will not be very different from 2.5.

Compatibility and Transition

Compatibility

Python 3.0 will break backwards compatibility. Totally. We're not even aiming for a specific common subset. (Of course there will be a common subset, probably quite large, but we're not aiming to make it convenient or even possible to write significant programs in this subset. It is merely the set of features that happen to be unchanged from 2.6 to 3.0.)

Python 2.6, on the other hand, will maintain full backwards compatibility with Python 2.5 (and previous versions to the extent possible), but it will also support forward compatibility, in the following ways:

  • Python 2.6 will support a "Py3k warnings mode" which will warn dynamically (i.e. at runtime) about features that will stop working in Python 3.0, e.g. assuming that range() returns a list.
  • Python 2.6 will contain backported versions of many Py3k features, either enabled through __future__ statements or simply by allowing old and new syntax to be used side-by-side (if the new syntax would be a syntax error in 2.5).
  • Complementary to the forward compatibility features in 2.6, there will be a separate source code conversion tool. This tool can do a context-free source-to-source translation. As a (very simply) example, it can translate apply(f, args) into f(*args). However, the tool cannot do data flow analysis or type inferencing, so it simply assumes that apply in this example refers to the old built-in function.

Transitional Development

The recommended development model for a project that needs to support Python 2.6 and 3.0 simultaneously is as follows:

  1. Start with excellent unit tests, ideally close to full coverage.
  2. Port the project to Python 2.6.
  3. Turn on the Py3k warnings mode.
  4. Test and edit until no warnings remain.
  5. Use the 2to3 tool to convert this source code to 3.0 syntax. Do not manually edit the output!
  6. Test the converted source code under 3.0.
  7. If problems are found, make corrections to the 2.6 version of the source code and go back to step 3.
  8. When it's time to release, release separate 2.6 and 3.0 tarballs (or whatever archive form you use for releases).

The conversion tool produces high-quality source code, that in many cases is indistinguishable from manually converted code. Still, it is strongly recommended not to start editing the 3.0 source code until you are ready to reduce 2.6 support to pure maintenance (i.e. the moment when you would normally move the 2.6 code to a maintenance branch anyway).

Step (1) is expected to take the usual amount of effort of porting any project to a new Python version. We're trying to make the transition from 2.5 to 2.6 as smooth as possible.

If the conversion tool and the forward compatibility features in Python 2.6 work out as expected, steps (2) through (6) should not take much more effort than the typical transition from Python 2.x to 2.(x+1).

Status of Individual Features

There are too many changes to list them all here; instead, I will refer to the PEPs. However, I'd like to highlight a number of features that I find to be significant or expect to be of particular interest or controversial.

Unicode, Codecs and I/O

We're switching to a model known from Java: (immutable) text strings are Unicode, and binary data is represented by a separate mutable "bytes" data type. In addition, the parser will be more Unicode-friendly: the default source encoding will be UTF-8, and non-ASCII letters can be used in identifiers. There is some debate still about normalization, specific alphabets, and whether we can reasonably support right-to-left scripts. However, the standard library will continue to use ASCII only for identifiers, and limit the use of non-ASCII in comments and string literals to unit tests for some of the Unicode features, and author names.

We will use "..." or '...' interchangeably for Unicode literals, and b"..." or b'...' for bytes literals. For example, b'abc' is equivalent to creating a bytes object using the expression bytes([97, 98, 99]).

We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction. This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API).

New I/O Library

The I/O library is also changing in response to these changes. I wanted to rewrite it anyway, to remove the dependency on the C stdio library. The new distinction between bytes and text strings required a (subtle) change in API, and the two projects were undertaken hand in hand. In the new library, there is a clear distinction between binary streams (opened with a mode like "rb" or "wb") and text streams (opened with a mode not containing "b"). Text streams have a new attribute, the encoding, which can be set explicitly when the stream is opened; if no encoding is specified, a system-specific default is used (which might use guessing when an existing file is being opened).

Read operations on binary streams return bytes arrays, while read operations on text streams return (Unicode) text strings; and similar for write operations. Writing a text string to a binary stream or a bytes array to a text stream will raise an exception.

Otherwise, the API is kept pretty compatible. While there is still a built-in open() function, the full definition of the new I/O library is available from the new io module. This module also contains abstract base classes (see below) for the various stream types, a new implementation of StringIO, and a new, similar class BytesIO, which is like StringIO but implements a binary stream, hence reading and writing bytes arrays.

Printing and Formatting

Two more I/O-related features: the venerable print statement now becomes a print() function, and the quirky % string formatting operator will be replaced with a new format() method on string objects.

Turning print into a function usually makes some eyes roll. However, there are several advantages: it's a lot easier to refactor code using print() functions to use e.g. the logging package instead; and the print syntax was always a bit controversial, with its >>file and unique semantics for a trailing comma. Keyword arguments take over these roles, and all is well.

Similarly, the new format() method avoids some of the pitfalls of the old % operator, especially the surprising behavior of "%s" % x when x is a tuple, and the oft-lamented common mistake of accidentally leaving off the final 's' in %(name)s. The new format strings use {0}, {1}, {2}, ... to reference positional arguments to the format() method, and {a}, {b}, ... to reference keyword arguments. Other features include {a.b.c} for attribute references and even {a[b]} for mapping or sequence access. Field lengths can be specified like this: {a:8}; this notation also supports passing on other formatting options.

The format() method is extensible in a variety of dimensions: by defining a __format__() special method, data types can override how they are formatted, and how the formatting parameters are interpreted; you can also create custom formatting classes, which can be used e.g. to automatically provide local variables as parameters to the formatting operations.

Changes to the Class and Type System

You might have guessed that "classic classes" finally bite the dust. The built-in class object is the default base class for new classes. This makes room for a variety of new features.

  • Class decorators. These work just like function decorators:

    @art_deco
    class C:
        ...
    
  • Function and method signatures may now be "annotated". The core language assigns no meaning to these annotations (other than making them available for introspection), but some standard library modules may do so; for example, generic functions (see below) can use these. The syntax is easy to read:

    def foobar(a: Integer, b: Sequence) -> String:
        ...
    
  • New metaclass syntax. Instead of setting a variable __metaclass__ in the body of a class, you must now specify the metaclass using a keyword parameter in the class heading, e.g.:

    class C(bases, metaclass=MyMeta):
        ...
    
  • Custom class dictionaries. if the metaclass defines a __prepare__() method, it will be called before entering the class body, and whatever it returns will be used instead of a standard dictionary as the namespace in which the class body is executed. This can be used, amongst others, to implement a "struct" type where the order in which elements are defined is significant.

  • You can specify the bases dynamically, e.g.:

    bases = (B1, B2)
    
    class C(*bases):
        ...
    
  • Other keyword parameters are also allowed in the class heading; these are passed to the metaclass' __new__ method.

  • You can override the isinstance() and issubclass() tests, by defining class methods named __instancecheck__() or __subclasscheck__(), respectively. When these are defined, isinstance(x, C) is equivalent to C.__instancecheck__(x), and issubclass(D, C) to C.__subclasscheck__(D).

  • Voluntary Abstract Base Classes (ABCs). If you want to define a class whose instances behaves like a mapping (for example), you can voluntarily inherit from the class abc.Mapping. On the one hand, this class provides useful mix-in behavior, replacing most of the functionality of the old UserDict and DictMixin classes. On the other hand, systematic use of such ABCs can help large frameworks do the right thing with less guesswork: in Python 2, it's not always easy to tell whether an object is supposed to be a sequence or a mapping when it defines a __getitem__() method. The following standard ABCs are provided: Hashable, Iterable, Iterator, Sized, Container, Callable; Set, MutableSet; Mapping, MutableMapping; Sequence, MutableSequence; Number, Complex, Real, Rational, Integer. The io module also defines a number of ABCs, so for the first time in Python's history we will have a specification for the previously nebulous concept file-like. The power of the ABC framework lies in the ability (borrowed from Zope interfaces) to "register" a concrete class X as "virtually inheriting from" an ABC Y, where X and Y are written by different authors and appear in different packages. (To clarify, when virtual inheritance is used, the mix-in behavior of class Y is not made available to class X; the only effect is that issubclass(X, Y) will return True.)

  • To support the definition of ABCs which requires that concrete classes actually implement the full interface, the decorator @abc.abstractmethod can be used to declare abstract methods (only in classes whose metaclass is or derives from abc.ABCMeta).

  • Generic Functions. The inclusion of this feature, described in PEP 3124, is somewhat uncertain, as work on the PEP seems to have slowed down to a standstill. Hopefully the pace will pick up again. It supports function dispatch based on the type of all the arguments, rather than the more conventional dispatch based on the class of the target object (self) only.

Other Significant Changes

Just the highlights.

Exception Reform

  • String exceptions are gone (of course).
  • All exceptions must derive from BaseException and preferably from Exception.
  • We're dropping StandardError.
  • Exceptions no longer act as sequences. Instead, they have an attribute args which is the sequence of arguments passed to the constructor.
  • The except E, e: syntax changes to except E as e; this avoids the occasional confusion by except E1, E2:.
  • The variable named after as in the except clause is forcefully deleted upon exit from the except clause.
  • sys.exc_info() becomes redundant (or may disappear): instead, e.__class__ is the exception type, and e.__traceback__ is the traceback.
  • Additional optional attributes __context__ is set to the "previous" exception when an exception occurs in an except or finally clause; __cause__ can be set explicitly when re-raising an exception, using raise E1 from E2.
  • The old raise syntax variants raise E, e and raise E, e, tb are gone.

Integer Reform

  • There will be only one built-in integer type, named 'int', whose behavior is that of 'long' in Python 2. The 'L' literal suffix disappears.
  • 1/2 will return 0.5, not 0. (Use 1//2 for that.)
  • Octal literal syntax changes to 0o777, to avoid confusing younger developers.
  • Binary literals: 0b101 == 5, bin(5) == '0b101'.

Iterators or Iterables instead of Lists

  • dict.keys() and dict.items() return sets (views, really); dict.values() returns an iterable container view. The iter*() variants disappear.
  • range() returns the kind of object that xrange() used to return; xrange() disappears.
  • zip(), map(), filter() return iterables (like their counterparts in itertools already do).

Miscellaneous

  • Ordering comparisons (<, <=, >, >=) will raise TypeError by default instead of returning arbitrary results. The default equality comparisons (==, !=, for classes that don't override __eq__) compare for object identity (is, is not). (The latter is unchanged from 2.x; comparisons between compatible types in general don't change, only the default ordering based on memory address is removed, as it caused irreproducible results.)
  • The nonlocal statement lets you assign to variables in outer (non-global) scopes.
  • New super() call: Calling super() without arguments is equivalent to super(<this_class>, <first_arg>). It roots around in the stack frame to get the class from a special cell named __class__ (which you can also use directly), and to get the first argument. __class__ is based on static, textual inclusion of the method; it is filled in after the metaclass created the class object (but before class decorators run). super() works in regular methods as well as in class methods.
  • Set literals: {1, 2, 3} and even set comprehensions: {x for x in y if P(x)}. Note that the empty set is set(), since {} is an empty dict!
  • reduce() is gone (moved to functools, really). This doesn't mean I don't like higher-order functions; it simply reflects that almost all code that uses reduce() becomes more readable when rewritten using a plain old for-loop. (Example.)
  • lambda, however, lives.
  • The backtick syntax, often hard to read, is gone (use repr()), and so is the <> operator (use !=; it was too flagrant a violation of TOOWTDI).
  • At the C level, there will be a new, much improved buffer API, which will provide better integration with numpy. (PEP 3118)

Library Reform

I don't want to say too much about the changes to the standard library, as this is a project that will only get under way for real after 3.0a1 is released, and I will not personally be overseeing it (the core language is all I can handle). It is clear already that we're removing a lot of unsupported or simply outdated cruft (e.g. many modules only applicable under SGI IRIX), and we're trying to rename modules with CapWords names like StringIO or UserDict, to conform with the PEP 8 naming standard for module names (which requires a short all-lowercase word).

And Finally

Did I mention that lambda lives? I still get the occasional request to preserve it, so I figured I'd mention it twice. Don't worry, that request has been granted for over a year now.

Speaking Engagements

I'll be presenting (or have presented) this material in person at several events:

The slides from OSCON are up: http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt (earlier versions were nearly identical).

Talk Back!

Have an opinion? Readers have already posted 47 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Guido van van Rossum adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Guido van Rossum is the creator of Python, one of the major programming languages on and off the web. The Python community refers to him as the BDFL (Benevolent Dictator For Life), a title straight from a Monty Python skit. He moved from the Netherlands to the USA in 1995, where he met his wife. Until July 2003 they lived in the northern Virginia suburbs of Washington, DC with their son Orlijn, who was born in 2001. They then moved to Silicon Valley where Guido now works for Google (spending 50% of his time on Python!).

This weblog entry is Copyright © 2007 Guido van van Rossum. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use