What Makes a (Graphics) Systems Paper Beautiful

Addendum: I've added a few additional tips for conference papers chairs, papers committee members, and papers sorters.

"SIGGRAPH hates systems papers," I've heard frustrated researchers say.

During a SIGGRAPH PC meeting, I've heard a committee member disparagingly comment: "I don't believe there is novelty here, maybe it's a good systems paper."

And from a recent SIGGRAPH post-rebuttal discussion post: "This is clearly a systems paper rather than a research contribution."

This is not an issue of whether systems research has a place in the graphics community. (It does!) Rather, these comments suggest that both graphics papers authors and reviewers hold a misunderstanding of the intellectual value of graphics systems research. Understanding the key principles and values of systems research thinking is important to system designers that wish to communicate their findings and for reviewers evaluating those results. Improving this understanding will make the graphics community better equipped to evaluate and disseminate its valuable systems results.

With this article, I hope to contribute to our shared understanding of (graphics) systems research principles. I suspect that most computer graphics researchers, even those that do not explicitly aim to create new systems, can benefit from applying systems thinking principles to their work.

This article is not an attempt to provide a comprehensive guide to writing systems papers, for which there are many excellent takes. For example, I recommend Jennifer Widom's Tips for Writing Technical Papers [Widom 2006] or "How (and How Not) to Write a Good Systems Paper" [Levin and Redell 1983].

For those preferring a TL;DR version: Good systems thinking involves establishing defensible answers to questions such as:

Am I convinced the work is based on a compelling set of goals and constraints?
What is the central insight or organizing principle being proposed?
What are the benefits the system provides its users? (Do I agree they are valuable?)
Can I think of alternative (e.g., simpler) solutions that might be a preferred way to meet the stated goals and constraints?
Am I convinced the design decisions made are responsible for successfully achieving the stated goals?
Does the system provide the community with a new capability that was not possible (or too difficult) to do before? What are the implications of that capability?

I recommend that graphics systems paper authors (and paper reviewers) consider this checklist when describing and evaluating work.

Graphics systems research is important to SIGGRAPH.

Advances in graphics systems have long played an important part of SIGGRAPH. In fact, systems work has historically had disproportionate impact, not only on the progress of computer graphics, but on a broad range of technical fields. It is easy to make an impressive list of examples.

The Reyes paper [Cook 84] contributed the key ideas behind the rendering system that generated images for most feature films for over two decades.
Reality Engine Graphics [Akeley 93] described techniques that continue to underlie the modern GPU-accelerated graphics pipeline, which is now in every laptop and smartphone in the world (a $100B market). Gaming aside, today's rich user interfaces would not exist without GPU acceleration.
The programmable vertex processor [Lindholm 01] ultimately evolved into the general programmable cores in GPUs today. Evolutions of this architecture have accelerated applications in numerous fields beyond graphics (physics, molecular biology, medical imaging, computational finance to name a few). Modern programmable GPU cores are now the primary computation engines for deep learning and for many of the fastest supercomputers in the world.
The ideas in RenderMan Shading Language [Hanrahan 90] and Cg [Mark 04] shaped the design of modern shading languages, which persist in both online and offline graphics systems today.
Brook [Buck 04] was the direct precursor to CUDA, a language that became central to the popularization of parallel computing and is now used by a broad set of domains to program GPUs.
Phototourism [Snavely 06] provided a new way to organize and browse the growing corpus of online photo collections, and opened the door to new applications and products leveraging big visual data.
The Direct3D 10 System [Blythe 06] and OptiX [Parker 10] describe architectures that form the basis for nearly all GPU-accelerated real-time rasterization and ray tracing systems today (including recent ray tracing hardware in GPUs).
Ideas in the Frankencamera [Adams 10] can clearly be seen in current versions of Google's Android camera API, which is used as the hardware abstraction to program one of the most popular, and most advanced, cameras in the world.
Halide [Ragan-Kelley 12] is now used in production at Google to process most images taken on Android phones, as well as in Photoshop and by Instagram. While Halide was designed for graphics applications, its ideas have directly inspired emerging code generation systems for deep learning like TVM, which underlie frameworks like Apache MX.Net.

In recent years we've seen papers at SIGGRAPH that describe the HDR photo processing pipeline in modern smartphone cameras [Hasinoff 16][Wadhwa 18], practical VR video generation pipelines [Andersen 16], the design decisions behind the implementation of the widely used OpenVDB library [Museph 13], and that of systems for large-scale video processing [Poms 18]. SIGGRAPH 2016 had an entire technical papers session on domain specific languages (DSLs) for graphics and simulation, and in 2018 an entire special issue of TOG was dedicated to the design of modern production rendering systems.

Papers that focus on systems contributions are in the minority at SIGGRAPH, but they are frequently featured. Without question these efforts have carved out an important place in the graphics community, as well as in the broader tech world.

What constitutes a research contribution?

Ask researchers throughout computer graphics to name defining characteristics of their favorite papers. Regardless of area, I suspect the lists would be similar. Great papers teach us something we didn't know about our field: either they contain an idea that we had never thought about, make us think differently (or more clearly) about a concept we thought we knew, or introduce a hard-to-create artifact (or tool) that is useful to making future progress. We tend to call these "contributions", a term I've increasingly come to appreciate as it embodies what a good paper does for the field. For example:

The reader learned something new:

A new way to formalize a problem (e.g, applying new mathematical machinery to a task)
A better algorithm (faster, more stable, simpler, fewer parameters) or an approximation that works shockingly well.
A proof establishing a previously unknown property, or a relationship between two concepts.
Identification of primitives that reveal the structure of a problem domain.
Made the reader think "I didn't realize that was possible" or "I've never thought about this problem in that way."
Introduces a cool new application.

Perhaps less surprising, but generally useful to the progress of others:

An experiment that provides a baseline result for a new type of task.
A new dataset that enables the field to attack new problems.

Graphics systems research, like any good research, is not excused from making intellectual contributions.

As with all good research papers, the intellectual contribution of a systems paper is much more in the ideas and wisdom the paper contains, and much less about the specific artifact (the implementation) it describes.

However, those not accustomed to assessing systems research can struggle to identify its intellectual contributions because they are often not written down explicitly in the form of an algorithm or equation. Conversely, it is common for paper authors, having spent considerable time implementing and engineering a system (and maybe even deploying it with success to users), to erroneously think that simply describing the system's capabilities and implementation constitutes a "good" systems paper.

Problem definition: a good systems paper describes a system's requirements in terms of goals, non-goals, and constraints.

In some areas of computer graphics, it is common for problem inputs and outputs to be well defined. The challenge is that the problem itself is hard to solve! In contrast, articulating and defining the right problem to solve is a critical part of systems research --- often this is the hardest part. Good systems thinking requires an architect to internalize a complex set of (potentially conflicting) design goals, design constraints, and potential solution strategies. Therefore, in order to assess the contribution of a systems paper, it is essential to clearly understand these problem requirements.

Problem characterization requires a system architect to spend considerable time understanding an application domain and to develop knowledge of hardware, software, human factors, and algorithmic techniques in play. Therefore, the work put into clearly establishing and articulating system requirements is itself part of the intellectual contribution of a graphics systems paper.

In other words, good systems papers often put forth an argument of the flavor:

We are looking to achieve the goals A, B, C, under the constraints X, Y, Z. No system existed to do so when we set out on our efforts (otherwise we probably would have used it!), so from our experiences building systems in this area, we can distill these experiences into a specific set of system requirements. These requirements may not be obvious to readers who have not spent considerable time thinking about or building applications in this space.

This setup is critical for several reasons:

It defines the requirements of a "good" solution by communicating the key concerns of the problem space. Since unexpected constraints and requirements reveal themselves only when real end-to-end systems are built, good systems papers perform this legwork for the community.
It provides a framework for assessing the quality of the system design decisions made by the authors. Evaluating the quality of the proposed solution involves asking the question: are the design decisions made by the authors the reason why the system achieved the stated goals?
It provides context that leads to more generalizable knowledge. Since readers likely do not have the same goals and constraints as a paper's authors, understanding the author's goals and constraints helps readers understand which design decisions are applicable to their own problems, and which aspects of the proposed system might need to be changed or ignored.

Let's take a look at some examples.

Example 1: Section 1 of the Reyes paper dedicated an entire column of text to Pixar's need to handle unlimited scene complexity, which they defined in terms of scene geometry and texture data. The column also established global lighting effects as an important non-goal, since the authors' experiences at Pixar suggested that global effects could often be reasonably approximated with local computations using texture mapping.

Example 2: A decade later Bill Mark articulated a complex set of design goals for Cg. To improve developer productivity, there was a need to raise the level of abstraction for programming early GPU hardware. Doing so was challenging because developers expected the performance transparency of low-level abstractions (the reason to use GPUs was performance!) and GPU vendors wanted flexibility to rapidly evolve GPU hardware implementations. These goals were quite different from those facing the creators of prior shading languages for offline renderers and were ultimately addressed with an approach that was basically "C for graphics hardware", instead of a graphics-specific shading language like RSL.

Example 3: Google's recent HDR+ paper expresses the system requirements as a set of guiding principles for determining what algorithms were candidates for inclusion in an modern smartphone camera application: be run fast enough to provide immediate feedback, do not require human tweaking, and never produce output that looks worse than a traditional photo.

Example 4: In Yong He's recent work on the Slang shader programming language, problem requirements were established via a detailed background section (Section 2) that describes how the goal of maintaining modular shader code was at odds with the goal of writing high performance shaders in modern game engines. This section served to articulate goals and also discuss design alternatives.

Note to paper reviewers: When serving on the SIGGRAPH PC, I have observed reviewers request that exposition about problem characterization be shortened on the basis that the text did not directly describe the proposed system. While all paper writing should strive to be appropriately concise, these suggestions failed to recognize the technical value of accurately characterizing the problem to be solved. This feedback essentially asked the authors to remove exposition concerning the key intellectual contribution to make room for additional system implementation detail.

The key insight: many good systems papers put forth a novel organizing observation about the structure of a problem.

Given stated goals and constraints, a systems paper will often propose a formulation of a problem that facilitates meeting these requirements. In other words, many systems papers make the argument:

It is useful to think about the problem in terms of these structures (e.g., using these abstractions, these representations, or by factoring the problem into these modules), because when one does, there are compelling benefits.

Benefits might take the form of: improved system performance (or scaling), enhanced programmer productivity, greater system extensibility/generality, or the ability to provide new application-level capabilities that have not been possible before.

Identifying useful problem structure often forms the central intellectual idea of a systems paper. As in other areas of computer graphics, elegant conceptual insights can be summarized in a few sentences. For example:

Reyes Rendering System: the micropolygon is a simple unifying representation that serves as a common interface between many surface types and many surface shading techniques. Breaking surfaces into micropolygons simultaneously meets the goals of supporting arbitrary geometric complexity (because complex surfaces can always be broken into micropolygons) and avoiding aliasing artifacts.
Renderman shading language: given the diversity of materials and lights in scenes, it is desirable to define an interface for extending the capabilities of a renderer by providing a programming language for expressing these computations. For productivity and performance, that programming language should provide high-level abstractions that are aligned with the terms of the rendering equation.
Cg: a programming language for emerging programmable GPUs should not be a domain-specific shading language, rather it should be a relatively general-purpose language that is designed to facilitate performance transparency when targeting GPUs.
Frankencamera: modern camera hardware interfaces are incompatible with the needs of multi-shot photography, but a simple timeline abstraction is sufficient for describing the behavior of a camera for multi-shot sequences.
Halide: compositions of six simple scheduling directives (split, reorder, compute_at, store_at, vectorize, parallelize) are sufficient to capture the major "high-level" code optimization decisions for a wide range of modern image processing applications.
Slang: the ad hoc strategies used by modern games to achieve shader code modularity and code specialization (preprocessor hacking, string pasting, and custom DSLs) can be expressed much more elegantly if HLSL was extended with a small set of well-known features from popular modern programming languages.
Shader components: in order to simultaneously achieve the benefits of code specialization and modularity when authoring a shader library, it is necessary to align code decomposition boundaries for code specialization with those for CPU-GPU parameter passing.
Ebb: relational algebra abstractions can be used to express a variety of physical simulation algorithms in a representation-agnostic manner.

Note to paper reviewers: In many of the above examples, once an organizing principle for the system is identified, the details of the solution are quite simple.

For example, consider the Frankencamera [Adams 2010]. A timeline is certainly a well-known abstraction for describing sequences of events, but it would be erroneous to judge the Frankencamera paper in terms of the novelty of this abstraction. The contribution of the paper was the observation that circa-2010 camera hardware interfaces were misaligned with the requirements of computational photography algorithms of the time, and that aligning the two could be reasonably accomplished using a timeline-like abstraction backed by a "best-effort" system implementation.

A common error when judging the magnitude of systems contributions is to focus on the novelty or sophistication of individual pieces of the solution. This is a methods-centric view of contributions. Instead, assessment should focus on identifying whether unique perspective or insight enabled simple solutions to be possible. The sophistication of the decision process used to select the right method (or make tweaks to it) can be more important than the sophistication of the methods ultimately used.

Good systems papers highlight key design decisions (and discuss alternatives to those decisions).

Given a set of requirements, a systems architect is usually faced with a variety of solution strategies. For example, a performance requirement could be solved through algorithmic innovation, through the design of new specialized hardware, or both (modifying an existing algorithm to better map to existing parallel hardware). Alternatively, the path to better performance might best go through narrowing system scope to a smaller domain of tasks. A productivity goal might be achieved through the design of new programming abstractions, which might be realized as a new domain specific language, or via a library implemented in an existing system.

As a result, a systems paper author must identify the key choices made in architecting their system, and elaborate on the rationale for these design decisions. Doing so typically involves discussing the palette of potential alternatives, and providing an argument for why the choices made are a preferred way to meet the stated requirements and design goals. It is not sufficient to merely describe the path that was taken without saying why was deemed a good one.

Discussion of key design decisions provides wisdom and guidance for future system implementors. By understanding the rationale for a system designer's choices, a reader will be able to better determine which decisions made in the paper may be applicable to their own requirements.

While reflecting on their design decisions, researchers should consider the following:

Differentiate key decisions from implementation details. A systems architect should clearly indicate which decisions they consider to be carefully thought out decisions that are central to the system's design (contributions of the paper they want to be given credit for) and which decisions were made "just to get something working". When less critical decisions need to be mentioned for completeness of exposition, it is useful to clarify that "algorithm X was used, but the decision was not deemed fundamental and many other equally good options are likely possible."

Make an argument for what "should" be done, not what "can" be done. In other words, good systems architects strive to argue why the decisions made are preferred solutions to meet the specified systems goals. One common property of a preferred solution is that it is a simple one. Therefore, a common question when assessing the quality of design decisions is whether the prescribed solutions are the simplest possible approach to achieving the desired goals. Presenting a more elaborate method can often be tempting for students, but a well architected system will eschew complexity that is not fundamental to meeting goals.

Identify cross-cutting issues. Many important design decisions are informed by considering cross-cutting issues that only reveal themselves when building an end-to-end system. For example, if only a single design goal or single aspect of the system was considered, there might be multiple viable solutions. However, it is often the case that a system architect's design decisions are motivated by end-to-end view of the problem. For example:

Algorithm X might produce lower quality output than algorithm Y, but X might be faster and simpler, and the errors produced by X might be acceptable because they covered for by the operation of a later processing stage.
Running a more expensive algorithm in stage 1 of a system might be preferable, because it generates more uniform outputs that lend themselves to better parallelization in part 2.
A particular global optimization might be possible in a certain processing stage, but that optimization would prevent composition of the stage with other modules of the system, limiting system extensibility.

Discussion of cross-cutting issues is an important aspect of systems thinking. (it is less common in methods-centric research). Cross-cutting and end-to-end issues are often the reason why more sophisticated techniques from the research community may be less desirable for use in effective systems. Often, new methods are (wisely) developed under simplifying assumptions that help facilitate exploration of new techniques. Systems thinking must consider the complete picture of the context in which techniques are used in practice.

Note to paper authors: Failure to describe (and subsequently evaluate) design decisions is the most common pitfall in systems paper writing. I have observed submissions describe intriguing systems, but be justly rejected because the exposition did not reflect on what had been done and why. These papers failed to provide general systems-building wisdom for the community and read more like enumerations of features or system documentation.

The evaluation: were the key design decisions responsible for meeting the stated requirements and goals?

If a paper clearly describes a system's goals and constraints, as well as articulates key system design decisions, then the strategy for evaluating the system is to provide evidence that the described decisions were responsible for meeting the stated goals.

Particularly when a system's evaluation focuses on performance, it is tempting to compare the proposed system's end-to-end performance against that of competing alternative systems. While such an evaluation demonstrates that performance goals were met, it is equally (and sometimes more) important to conduct experiments that specifically assess the benefit of key optimizations and design decisions. Evaluation of why success was achieved is necessary to verify that the central claims of the paper are sound. Failing to perform this evaluation leaves open the possibility that the success of the system is due to other factors. (e.g., high-quality software engineering), than the proposed key ideas.

The evaluation: does the system introduce new capabilities to the field?

SIGGRAPH PC meetings often begin with a discussion about the importance of promoting "new ideas" and there is broad agreement that the metrics for evaluation change when they introduce a new topic area or break new ground. For example, systems that introduce new experiences or new ways of thinking about data (e.g., Phototourism [Snavely 2006] or AverageExplorer [Zhu 2014]) have no obvious numerical comparison to prior systems. Evaluating this type of work typically involves demonstrating the new experiences or insights now possible given the new capability. (In these cases, the choice of methods was "good" insofar as they enabled the new capability to be achieved.)

When assessing the merit of a systems paper, it is important for reviewers to consider the extent to which the system introduces new capability to the field, and what the implications of these new capabilities are. While new interaction techniques, such as the two examples cited above, are often easier to identify as providing new capabilities, it can also be the case that dramatic improvements in performance, scale of operation, or programmer productivity transport the field into a "new capability" regime. For example, the ability to provide real-time performance for the first time (enabling interactive applications or human-in-the-loop experiences), or the ability to write applications that leverage increasingly large image databases, could be considered new capabilities if it was previously difficult or impossible for programmers to attempt these tasks.

In these cases, reviewers should be mindful about the value of extensive quantitative comparison to prior systems or methods because prior systems that meet the stated constraints may not exist. Similarly, "user studies" might be less valuable than understanding the extent to which a system allowed its authors (who might be practicing experts in a domain) to perform tasks that had never been performed by the community before. When apples-to-apples evaluations are not realistic to provide, responsibility lies with paper authors to make a clear argument for why the provided evaluation is sufficient to lend scientific credibility to the proposed ideas, and on reviewers to carefully consider the implications of the proposed work. Requests for lengthy numerical evaluation should not be used as a substitute for author/reviewer thought and judgment.

Final Thoughts

I hope this article has highlighted the depth of thought required for good systems research and good systems paper writing. Architecting good systems is a challenging, thoughtful task that involves understanding a complex set of factors, balancing conflicting goals and constraints, assessing a myriad of potential solutions to produce a single working system, and measuring the effects of these ideas.

The approach "we have a result, now just write it up" rarely turns out well when writing a systems paper. Since there is typically not a new proof, equation, or algorithm pseudocode to point to as an explicitly identifiable contribution, the intellectual value in systems work is conveyed through careful exposition that documents wisdom gained from the design process. Personally, I find the act of writing to be a valuable mechanism to achieve clarity about what my work has accomplished. As I attempt to make a case for a system's design, more alternatives and evaluation questions come to mind. (Are we sure we can't take this feature out and get the same result? How do we know we really need this?)

On the flip side, reviewing a systems paper requires considerable thought and judgment. The reviewer must assess their level of agreement with the stated goals, requirements and design decisions. They must measure the value of the services afforded by these design decisions and consider their utility and significance to users. Last, they must determine if there is evidence the proposed decisions were actually responsible for the outcomes. Since the true test of good systems work lies in whether the ideas achieve adoption by the broader community over time, a reviewer must employ their own taste and experiences to make predictions about the likelihood this will occur and the amount of wisdom they have gained.

I wish everyone good luck with future graphics systems work!

Acknowledgments: Thanks to Andrew Adams, Maneesh Agrawala, Fredo Durand, Bill Mark, Morgan McGuire, Jonathan Ragan-Kelley, Matt Pharr, Peter-Pike Sloan, and Jean Yang for helpful feedback.