LuaJIT Roadmap 2012/2013 ************************ This is the LuaJIT roadmap for 2012/2013, bringing you up to date on the current and future developments around LuaJIT. I'm happy to answer your questions here on the LuaJIT mailing list, on related news aggregators or by mail. * Status of LuaJIT 2.0, new features and release planning * Plans for LuaJIT 2.1, new garbage collector and other features * Call for Sponsors LuaJIT 2.0 ========== Current status -------------- Overall, the LuaJIT 2.0 code base is in good shape to become stable, soon. The beta releases are already used in production by many developers and in many different projects. LuaJIT 2.0 has grown quite a few more architectural ports than expected in the last roadmap from 2011. But this is a good thing: developers get to use a stable VM for their target architectures *right now*. And it gives me more leeway to introduce some major changes to the next version. LuaJIT 2.0 already runs on all major operating systems. Soon, it'll support close to a dozen architectures or architectural variations. This pretty much covers the complete desktop and server markets, almost all of the smartphone market and a sizeable chunk of the 32 bit embedded CPU market, too. Coverage will become even better over time, due to expected market shake-outs. LuaJIT is widely considered to be one of the fastest dynamic language implementations. It features a compact, innovative top-of-the-line just-in-time (JIT) compiler. The integrated LuaJIT FFI library is a major additional benefit: it largely obviates the need to write tedious manual bindings with the classic Lua/C API. There's no need to learn a separate binding language -- it parses plain C declarations! The JIT compiler is able to generate code on par with a C compiler for access to native C data structures. Calls to C functions can be inlined in JIT-compiled code. LuaJIT 2.0 has extensive architecture-specific and OS-specific customizations. This, together with excellent cross-compilation support, makes LuaJIT an ideal tool for developers who need to embed a nearly universally portable, light-weight *and* high-speed dynamic VM into their projects. [Phew! Enough of the marketing speak for now ... ;-) ] What's next ----------- Now that LuaJIT 2.0.0-beta10 is out, a couple of reorganizations will happen in the source tree. After that, one new optimization and two new ports will be added. These are (probably) the last major changes to LuaJIT 2.0 before the final (non-beta) release. All other planned features will have to wait for LuaJIT 2.1. Addition of a minified Lua interpreter -------------------------------------- A customized, heavily stripped and minimized Lua interpreter will be included to assist the build process. This weighs in at only 173 KB, or 45 KB compressed. It'll be compiled first during the build process (as a host executable when cross-compiling). The first use case is to run DynASM. This allows generating the machine-specific files for the current target architecture at build time. Which in turn allows the removal of various pre-translated files. The addition of a minimal Lua interpreter opens up more options for customizing and simplifying the build process in the future. E.g. most of the C code, that's only used at build time, can be replaced with Lua code. The program to generate the (mostly illegible) minified C source code for the Lua interpreter will be included. Security-conscious people can check that it generates identical output, given the original Lua sources. Or they may use the standard Lua 5.1/5.2 interpreter for the build process (build option). Removal of pre-generated buildvm_${arch}.h files ------------------------------------------------ The pre-generated, architecture-specific files buildvm_${arch}.h contain the LuaJIT interpreter-generator for each architecture, ready for consumption by a C compiler to generate the 'buildvm' executable. The actual sources are in the buildvm_${arch}.dasc files. The assembler source code of the interpreter needs to be translated with DynASM, which is a Lua program. To avoid a chicken-and-egg situation, those files had to be shipped pre-generated. Due to the proliferation of architectures and architectural variations, the pre-generated files have already grown to 844 KB. Compressed, this adds only 133 KB to the released tar.gz files, but that's still too much. And more is to come. Also, even a single-line change in one of the *.dasc files triggers lots of changes in the corresponding *.h file. This causes needlessly big commits in the git repository. The addition of a minified Lua interpreter solves this problem: the pre-generated buildvm_${arch}.h files can be removed. Only the output file for the selected target architecture will be translated with DynASM at build time, utilizing the minified Lua interpreter. Many more architectural variations can now be added with no concern over the size of the intermediate *.h files. In case you're following the git repository: it's recommended that you do a 'make cleaner' [sic!] to clean up your build tree, right after the big commits for this change arrive. It should still work without that step, though. Move lib/* to src/jit/* ----------------------- The JIT-compiler-specific Lua modules currently shipped in lib/* need to be installed in the package path, relative to a 'jit' directory, before they can be used. To allow testing of the un-installed command line executable from within the 'src' directory, the modules will be moved to src/jit/*. Other hierarchies (e.g. src/ffi/*) may be added in the future. The 'install' target of the top-level Makefile will of course be adjusted accordingly. Watch out if you've modified this file or if you've automated the install process with other tools. New optimization: Allocation sinking and store sinking ------------------------------------------------------ A corporate sponsor, who wishes to remain anonymous, has sponsored the development of allocation sinking and store sinking optimizations for LuaJIT. Avoiding temporary allocations is an important optimization for high-level languages. LuaJIT already eliminates many of these with multiple techniques: e.g. floating-point numbers aren't boxed and the JIT compiler eliminates allocations for most immutable objects. Alas, traditional techniques to avoid the remaining allocations (escape analysis and scalar replacement of aggregates) are ineffective for dynamic languages. The goal of this sponsorship is to research the combination of store-to-load-forwarding (already implemented) with store sinking and allocation sinking (to be implemented). This innovative approach is highly effective in avoiding temporary allocations in the fast paths, even under the presence of many slow paths where the temporary object may escape to. This approach is most effective for dynamic languages, but may be successfully applied elsewhere, when the classic techniques fail. Work for this feature is currently in progress. New port: ARM VFP support and hard-float EABI support ----------------------------------------------------- A corporate sponsor, who wishes to remain anonymous, has sponsored the VFP support (hardware FPU) and the hard-float EABI support for the ARM port. After that work is complete, the ARM port of LuaJIT can be built for three different CPU/ABI combinations: * ARMv5+, soft-float EABI, soft-float FP operations (already exists) * ARMv6+, soft-float EABI, VFPv2+ FP operations * ARMv6+, hard-float EABI, VFPv2+ FP operations (e.g. Debian armhf) Work on the VFP support and hard-float support for the ARM port is scheduled for Q3 2012. New port: PPC32on64 interpreter for PS3 and XBox 360 ---------------------------------------------------- Current-generation consoles based on PowerPC CPUs cannot run the existing PPC port of LuaJIT. Several changes are needed: * The JIT compiler must be disabled for the consoles, as the hypervisors do not allow execution of code generated at runtime. * Changes to the LuaJIT interpreter to run as a 32 bit program on PPC64 (PPC32on64). Registers are 64 bit wide, even though pointers are still 32 bit. This affects e.g. the carry bit and pointer addressing. The assembler code needs to be adapted. * Some common PPC instructions are micro-coded on the console CPUs, which causes unwanted slow-downs. These instructions need to be replaced with other instruction sequences. * Support for modified calling conventions. These changes allow embedding the LuaJIT 2.0 interpreter in PS3 or XBox 360 projects, with a substantial speedup compared to the standard Lua 5.1 interpreter. The console ports will be integrated some time after the build process reorganizations are complete. Minor new features ------------------ The following minor features are on my TODO list for LuaJIT 2.0: - Add 'goto' statement and labels, compatible with Lua 5.2. This feature will also be available from the Lua 5.1 mode of LuaJIT 2.0, where 'goto' is not a keyword. The parser figures out whether it's a variable name or a statement. - Support '%a' and '%A' for string.format and parse hexadecimal floating-point numbers (0x1.2a7p9 => 596.875) independent of the C99-conformance of the C library (works even with MSVCRT). - Other Lua 5.2-compatibility features: Return result status for os.execute() and pipe close. Support extra format specifiers for io.lines() and fp:lines(). Feature freeze -------------- After the above features have been implemented, beta11 will be released and a feature freeze will be announced: no new features will be accepted into the LuaJIT 2.0 code base. Bug fixes to existing features will always be accepted, of course. I'm willing to make small concessions for the FFI library, as it's relatively young. Minor upwards-compatible features, that are important for usability, might make it into the code base, even after the feature freeze (e.g. backports from LuaJIT 2.1). Release plans ------------- After the feature freeze and a concerted cleanup effort, several release candidates and the final 2.0.0 release will be put out. My goal is to complete all of this before the end of 2012. Bug fixes will be accumulated in the git repository, as usual. New dot releases (2.0.x), which include all of these fixes, will be made available at irregular intervals. I'm planning to give LuaJIT 2.0 LONG-TERM SUPPORT, provided there's sufficient interest in the community and continued sponsorship. The LuaJIT 2.0 release will likely be maintained and supported for several years. It will be updated to fix future incompatibilities, e.g. with new toolchain or OS releases. LuaJIT 2.1 ========== After LuaJIT 2.0 has become stable, work on LuaJIT 2.1 may begin. This section is intended to give you a short overview of my plans for LuaJIT 2.1. Compatibility ------------- A new release is always a good point to do some cleanup. LuaJIT has accumulated quite a bit of slack during the 2.0 development phase. And some of that has to go, e.g. the x87-compatibility in the interpreter for x86 CPUs without SSE2. Other features planned for removal will be announced in a separate message, before work on LuaJIT 2.1 starts. But there's one important message: compatibility with Lua 5.1 is there to stay! Many users of LuaJIT, especially those with big code bases, have a heavy investment in Lua 5.1-compatible infrastructure, tools, frameworks and in-house knowledge. Understandably, they don't want to throw away their investment, but still keep up with the newest developments. As I've previously said, Lua 5.2 provides few tangible benefits. LuaJIT already includes the major new features, without breaking compatibility. Upgrading to be compatible with 5.2, just for the sake of a higher version number, is neither a priority nor a sensible move for most LuaJIT users. To protect the investment of my users and still provide them with new features, LuaJIT 2.1 will stay compatible with Lua 5.1. New garbage collector --------------------- The garbage collector used by LuaJIT 2.0 is essentially the same as the Lua 5.1 GC. The current garbage collector is relatively slow compared to implementations for other language runtimes. It's not competitive with top-of-the-line GCs, especially for large workloads. The main innovation in LuaJIT 2.1 is a complete redesign of the garbage collector from scratch: the new garbage collector will be an arena-based, quad-color incremental, generational, non-copying, high-speed, cache-optimized garbage collector. You can read more about the design of the new GC here: http://wiki.luajit.org/New-Garbage-Collector Note: this page is a work-in-progress! More details will be added and the gaps will be filled in over time. Planned features ---------------- Based on recognized needs and suggestions from LuaJIT users, here are some other features, that I'd like to work on. Hopefully, many of them will make it into LuaJIT 2.1 or future versions. The list is in no particular order: - Metatable/__index specialization Accesses to metatables and __index tables with constant keys are already specialized by the JIT compiler to use optimized hash lookups (HREFK). This is based on the assumption that individual objects don't change their metatable (once assigned) and that neither the metatable nor the __index table are modified. This turns out to be true in practice, but those assumptions still need to be checked at runtime, which can become costly for OO-heavy programming. Further specialization can be obtained by strictly relying on these assumptions and omitting the related checks in the generated code. In case any of the assumptions are broken (e.g. a metatable is written to), the previously generated code must be invalidated or flushed. Different mechanisms for detecting broken assumptions and for invalidating the generated code should be evaluated. This optimization works at the lowest implementation level for metatables in the VM. It should equally benefit any code that uses metatables, not just the typical frameworks that implement a class-based system on top of it. - Value-range propagation (VRP) Value-range propagation is an optimization for the JIT compiler: by propagating the possible ranges for a value, subsequent code may be optimized or conditionals may be eliminated. Constant propagation (already implemented) can be seen as a special case of this optimization. E.g. if a number is known to be in the range 0 <= x < 256 (say it originates from string.byte), then a later mask operation bit.band(x, 255) is redundant. Similarly, a subsequent test for x < 0 can be eliminated. Note that even though few programmers would explicitly write such a series of operations, this can easily happen after inlining of functions combined with constant propagation. - Hyperblock scheduling Producing good code for unbiased branches is a key problem for trace compilers. This is the main cause for "trace explosion" and bad performance with certain types of branchy code. Hyperblock scheduling promises to solve this nicely at the price of a major redesign of the compiler: selected traces are woven together to a single hyper-trace. This would also pave the way for emitting predicated instructions, which benefits some CPUs (e.g. ARM) and is a prerequisite for efficient vectorization. - FFI C pre-processor The integrated C parser of the FFI library currently doesn't support #define or other C pre-processor features. To support the full range of C semantics, an integrated C pre-processor is needed. This would provide a nice solution to the C re-declaration problem for FFI modules, too. - Partial C++ support for the FFI Full C++ support for the FFI is not feasible, due to the sheer complexity of the task: one would need to write more or less a complete C++ compiler. However, a limited number of C++ features can certainly be supported. Of course, one could argue, anything but full support doesn't make sense. But you'll never know, unless you try ... It would be an interesting task to evaluate what subset of C++ can be supported with reasonable effort or which C++ libraries can be successfully bound via the FFI. Basically: how far can C++ support go, how much effort would be needed and does it really pay off in practice? Such a project should be split into the evaluation phase and an implementation phase, which implements the C++ subset, based on the prior evaluation. - User-definable intrinsics for the FFI This is a low-level equivalent to GCC inline assembler: given a C function declaration and a machine code template, an intrinsic function (builtin) can be constructed and later called. This allows generating and executing arbitrary instructions supported by the target CPU. The JIT compiler inlines the intrinsic into the generated machine code for maximum performance. Developers usually shouldn't need to write machine code templates themselves. Common libraries of intrinsics for different purposes should be provided or contributed by experts. - Vector/SIMD data type support for the FFI Currently, vector data types may be defined with the FFI, but you really can't do much with them. The goal of this project is to add full support for vector data types to the JIT compiler and the CPU-specific backends (if the target CPU has a vector extension). A new "ffi.vec" module declares standard vector types and attaches the machine-specific SIMD intrinsics as (meta)methods. Prerequisites for this project are allocation sinking, the user-definable intrinsics and the new garbage collector. More about the last two features can be read here: http://lua-users.org/lists/lua-l/2012-02/msg00207.html Most of these features are still in an early planning stage. I'm sure the community will come up with many more interesting ideas. Which of these will become a reality depends on the interest in the community and on sponsorships (see below). Call for Sponsors ================= First, I'd like to say a BIG THANK YOU to all LuaJIT sponsors! Almost all of the recent work on LuaJIT 2.0 has been sponsored by various corporate sponsors. The full track record is here: http://luajit.org/sponsors.html All of those architectural ports and new features wouldn't have been possible without your sponsorships! I think this sends a happy message to the greater open source community: the open source development model *does* work out and it can be a sustainable (side) business for its creators! Nonetheless, I have to look forward: as you've seen above, I've got big plans with LuaJIT 2.1. In fact, the plans are so big that I fear it may be hard to get enough sponsorships to cover just the work on the one major features, the new garbage collector. For LuaJIT 2.0, the ports to the various architectures made most of the money. The companies sponsoring them had a genuine, often urgent, business need for these ports. Sadly, this source is drying up, as the major architectures are well covered. The new garbage collector is certainly a desirable feature and IMHO the correct next evolutionary step for LuaJIT. Alas, developers have learned to work around the deficiencies of the current GC (by carefully avoiding allocations). The benefits of a new garbage collector are hard to quantify, without actually implementing it. And that's *a lot of work*, which makes it not exactly cheap. Maybe too expensive for a single company. It'll be a tough sell in any case. So far, I've relied exclusively on corporate sponsorships for various legal and administrative reasons. Ok, so the recent trend towards crowd funding got me thinking ... But let's be realistic: the Lua community is small, the LuaJIT community is even smaller -- it's growing fast, though. I simply don't know whether it's possible to gather enough people and enough money to finance the continued development of LuaJIT. And there's another issue: to me, it looks like the whole crowd funding idea is rapidly deteriorating into an arms race of marketing experts. So many people are jumping on that bandwagon now ... you'll never make it, unless you permanently stay on the front pages somehow. Alas, I'm not good at marketing and a garbage collector is a very technical and *very* unsexy project (for most people, anyway). But then, I'd really love to be proven wrong ... To be fair, I have to make this statement: I'd really like to work on LuaJIT and I'd like to continue shaping it's future. However, I fear, without sponsorships I'd have to do more work as a consultant (in unrelated jobs). That doesn't leave me enough spare time to do a significant amount of work on LuaJIT. Therefore, I cannot start working on LuaJIT 2.1, before I've got full covenants for a) maintaining two major code bases, b) the ground work to clean up the code base and prepare it for c) the work on the new garbage collector for LuaJIT 2.1. I estimate this to be worth on the order of EUR 80K+ ($100K+), only for the near future after the release of LuaJIT 2.0. We're not in a hurry, though. I'd like to publicly discuss all options thoroughly with the LuaJIT community and beyond. I'll open a new topic on the LuaJIT mailing list right after this posting. If you require anonymity, please write to me by mail, see: http://luajit.org/sponsors.html Thank you! [Important note: please do NOT send money, checks or anything like that to me at this time! If there's a crowd funding effort or a corporate funding pool, this will be announced separately.] --Mike