Truism: flash is not the same as disk. So why don’t we take advantage of that – rather than hiding it?

Partly is it is the human SOP: first build the old thing out of the new stuff. Not to mention the commercial allure of hundreds of millions of SATA interfaces in the wild.

Helping us move on is a paper by researchers Vijayan Prabhakaran, Thomas L. Rodeheffer and Lidong Zhou Transactional Flash (pdf) of Microsoft Research. Vijayan has also co-authored flash papers with Ted Wobber et. al. noted elsewhere on StorageMojo.

Flash is a good fit.
The authors note that the essence of all transactional constructs is to avoid in-place data modification – enabling roll back to a known state. Since flash SSDs can’t re-write data in place, TransFlash makes a virtue of flash necessity.

Flash SSD architectures also have much parallelism, due to the use of many flash chips, each including multiple planes and blocks, with multiple I/O paths to support garbage collection and wear-leveling – and now, WriteAtomic.

Finally, the data scattering caused by avoiding in-place data rewrites – typically through copy-on-write strategies – is not the problem for flash that it is for disks: flash excels at fast random reads.

What is TransFlash?
TransFlash is a flash SSD with 3 important enhancements:

  • It exports a transactional interface WriteAtomic.
  • The flash controller implements a cyclic commit that uses flash’s per-page metadata storage – typically 128 bytes – instead of the common independent commit record.
  • Both of these features are implemented in the flash translation layer controller firmware – no hardware engineering required.

The authors named their invention TxFlash, but I like TransFlash better since Tx also abbreviates transmit. It also sounds sexier, a rare quality in computer science naming. Really guys, it will help commercial adoption.

WriteAtomic model
The key API construct is described thusly:

TxFlash exports a new interface, WriteAtomic (p1 . . . pn), which allows an application to specify a transaction with a set of page writes, p1 to pn. TxFlash ensures atomicity, i.e., either all the pages are written or none are modified. TxFlash further provides isolation among multiple WriteAtomic calls. Before it is committed, a WriteAtomic operation can be aborted by calling an Abort. By ensuring atomicity, isolation, and durability, TxFlash guarantees consistency for transactions with WriteAtomic calls.

The authors compare 3 commit protocols – traditional commit, simple cyclic commit, and back pointer cyclic commit – and evaluate their resource requirements. The table shows that the new commit protocols reduce I/O overhead, differing in their treatment of aborted transactions.
transflash_commit_protocols
The simple cyclic commit has to erase aborted transactions before any new writes can be written to the same page. This could slow response times if aborted transactions are common.

Compared to traditional commits, the new protocols double transaction throughput because they don’t require additional commit writes and write ordering. This is most important with small transactions, as transfer times affect large transactions.

End-to-end benefit
The author’s simulations with a pseudo-device driver under various workloads found that TransFlash adds minimal overhead. The big win is in file system complexity, that:

. . . can be reduced by using the transactional primitives from the storage system. For example, the journaling module of TxExt3 contains about 3300 LOC when compared to 7900 LOC in Ext3. Most of the reduction were due to the absence of recovery and revoke features and journal-specific abstraction.

The StorageMojo take
TransFlash works on multiple levels:

  • It simplifies a longstanding problem with little required device investment.
  • It creates a high-value storage interface – with its attendant margin enhancement opportunities – for an industry whose current margin cows will soon die.
  • It reduces file system complexity – an under-appreciated issue – while improving performance for small write transactions.

History will favor BPCC as Moore’s Law drives flash translation layer controller performance up and flash storage costs down. Unless someone comes up with something even better.

Whether or not TransFlash ever sees the light of day, the paper is a welcome reminder of the benefits of pushing the envelope. With all the new storage technologies coming online we’ll have many opportunities to change the I/O landscape in coming years.

Courteous comments welcome, of course.