Home The SSD Endurance Experiment: Testing data retention at 300TB
Reviews

The SSD Endurance Experiment: Testing data retention at 300TB

Geoff Gasior
Disclosure
Disclosure
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

Solid-state drives are everywhere, and we shouldn’t be surprised. SSDs have long been much faster than mechanical hard drives—and the difference striking enough for even casual users to perceive. The major holdup was pricing, which has become much more reasonable in recent years. Most modern SSDs slip under the arbitrary dollar-per-gigabyte threshold, and many good ones can be had for 70 cents per gig or less.

Higher bit densities are largely responsible for driving down SSD prices. As flash manufacturers transition to finer fabrication techniques, they’re able to cram more gigabytes onto each silicon wafer. This lowers the per-gig cost for SSD makers, but it also has an undesirable side effect. The higher the bit density, the lower the endurance. The very process that’s making SSDs more affordable is also shortening their life spans.

All flash memory is living on borrowed time. Writing data breaks down the physical structure of individual NAND cells until they’re no longer viable and have to be retired. SSDs have overprovisioned “spare area” to stand in for failed flash, but that runs out eventually, and then what? More importantly, how many writes can current drives take before they fail?

Seeking answers to those questions, we started our SSD Endurance Experiment. This long-term test is in the midst of hammering six SSDs with an unrelenting stream of writes. We won’t stop until all the drives are dead, but we’re pausing at regular intervals to monitor health and performance. Our subjects have now reached the 300TB mark, so it’s time for another check-up—and a new wrinkle. We’ve added an unpowered retention test to see if the drives can hold data when left unplugged for a few days.

If you’re unfamiliar with our experiment, I suggest reading our introductory article on the subject. It outlines the specifics of our setup and subjects in far more detail than I’ll indulge here.

The basics are pretty simple. Our subjects include five different models: Corsair’s Neutron GTX 240GB, Intel’s 335 Series 240GB, Kingston’s HyperX 3K 240GB, and Samsung’s 840 Series 250GB and 840 Pro 256GB. Anvil’s Storage Utilities software provides the endurance test, which writes a series of incompressible files to each drive. We’re also testing a second HyperX SSD with the software’s 46% incompressible “applications” setting to gauge the impact of the write compression tech built into SandForce controllers.

With the exception of the Samsung 840 Series, all of the SSDs have MLC flash with two bits per cell. The 840 Series has TLC NAND, which delivers a 50% boost in storage density by packing an extra bit into each cell. The extra bit makes verifying the contents of the cell more difficult, especially as write cycling takes its toll. That’s why TLC flash typically has lower endurance than its MLC counterpart.

We expected the 840 Series to be the first to show weakness, and that’s exactly what happened. After 100TB of writes, we noticed the first evidence of flash failures in the drive’s SMART attributes. The attribute covering reallocated sectors tallies the number of flash blocks have been retired and replaced by reserves in the overprovisioned area. There were only a few reallocated sectors at first, but the number grew dramatically on the way to 200TB, and the pace quickened on the path to 300TB.

At our most recent milestone, the 840 Series reports 833 reallocated sectors. Samsung remains tight-lipped about the size of each sector, but if AnandTech’s 1.5MB estimate is accurate, our drive has used 1.2GB of its spare area to replace retired sectors. The 840 Series still has lots of overprovisioned flash in reserve, and it still offers the same user-accessible capacity as it did fresh out of the box. That said, its flash is clearly degrading at a much higher rate than the MLC NAND in the other SSDs—no surprise there.

Only two other SSDs have registered reallocated sectors thus far. The HyperX drive we’re testing with incompressible data reported four reallocated sectors after 200TB of writes. That number hasn’t changed since. However, the HyperX has been joined by the Intel 335 series, which now has one reallocated sector.

The Corsair Neutron GTX, Samsung 840 Pro, and Kingston HyperX drive with compressible data are the only ones that remain free of bad blocks after 300TB. Of course, the HyperX has written only 215TB to the flash thanks to its compression mojo.

In addition to tracking reallocated sectors, we’re monitoring each drive’s health using the included utility software. Samsung’s SSD Magician app reports that the 840 Series and 840 Pro are both in “good” health despite the former’s high reallocated sector count. Intel’s SSD Toolbox says the 335 Series is in good health, as well. Corsair’s SSD utility doesn’t have a general health indicator, and Kingston’s software doesn’t cooperate with the Intel storage drivers on our test rigs. However, we can get health estimates for all the drives using Hard Disk Sentinel, which makes its own judgments based on SMART data.

  100TB 200TB 300TB
Corsair Neutron GTX 240GB 100% 100% 100%
Intel 335 Series 240GB 88% 73% 58%
Kingston HyperX 3K 240GB 100% 98% 98%
Kingston HyperX 3K 240GB (Comp) 100% 100% 100%
Samsung 840 Pro 256GB 78% 51% 26%
Samsung 840 Series 250GB 66% 19% 1%

Well, that’s not very helpful. HD Sentinel seems to assess health using different SMART attributes for each SSD. The ratings for the 335 Series correspond to the “estimated life remaining” values produced by Intel’s own software. There’s no correlation between the Samsung software and HD Sentinel’s assessment of the 840 Series and 840 Pro, though. It’s unclear why HD Sentinel has such little faith in the 840 Pro, which hasn’t suffered any flash failures. Even the low health ratings for the 840 Series seem a tad pessimistic given the amount of spare area in reserve.

The lack of standardization for wear- and health-related attributes seems to be part of the problem here. Each SSD maker exposes a different mix of variables, making comparisons difficult. We’d like to see SSD vendors agree to offer a common set of attributes covering reallocated sectors, accumulated errors, overall health, and both host and flash writes. Some SSDs don’t even have SMART attributes to track total writes. The Crucial M500 is one example, and we left that drive out of the endurance experiment as a result.

Data retention
The primary goal of this experiment is to see how many writes each SSD can take before it dies. Problems may crop up before the drives stop responding completely, though. We need to know if the SSDs are still viable—not just if they’re still alive.

Anvil’s endurance benchmark has an integrated MD5 test that provides some help on this front. We have it configured to verify the integrity of a 700MB video file pre-loaded on each drive. The file is part of 10GB of static data that sits on the SSDs during the endurance test. Even though that data isn’t disturbed as the endurance test runs, wear-leveling algorithms should move it around in the flash as writes accumulate.

Thus far, the built-in hash check hasn’t reported any errors. As several of our readers have pointed out, though, the integrated test doesn’t tell us whether data is retained accurately when the system is powered off. We actually considered making unpowered retention testing a staple of our regular check-ups. However, that kind of testing involves days of inactive downtime that we’d rather spend writing to the drives.

With our 840 Series sample clearly wilting, we decided it was worth sacrificing some time on an unplugged retention test. Our 700MB movie file is relatively small, so we swapped in a 200GB TrueCrypt file nearly large enough to fill each drive. Then something odd happened. While running an initial MD5 check on the file we copied, the 840 Series produced an unexpectedly incorrect result. We hashed the file again, and the result was still incorrect. This time, the hash test produced an entirely different string. Third time’s the charm? Nope. Strike three, and another different result.

All the other SSDs passed the initial MD5 check, so we started over with the 840 Series. We re-copied our TrueCrypt file, and the results were correct the first, second, and third time we hashed it. So we repeated the process again. Once more, the 840 Series passed three times in a row. We couldn’t reproduce the initial mismatches.

Puzzled, we shut down our test systems and proceeded with the unpowered portion of the retention test. Five days later, we fired them up again and checked the files. All the drives passed, including the 840 Series.

For a moment, I thought I’d imagined those initial errors. But no, I took screenshots. The SMART attributes also provide corroborating evidence. Before the retention test, the 840 Series’ unrecoverable error count was zero. The drive now says it’s suffered 172 unrecoverable errors. Something went seriously wrong, and Samsung’s error correction mechanism was unable to compensate.

Even though our 840 Series drive appears to have rebounded, it suffered a serious failure. In a normal desktop system, unrecoverable errors could result in permanent file corruption and data loss. I certainly wouldn’t trust our test subject with my own data anymore. Since the drive appears to be operating normally again, we’ll keep it in the experiment, albeit with a black mark on its record.

Disappointingly, only the Samsung and Kingston SSDs have SMART attributes that track unrecoverable errors. So far, the 840 Pro and HyperX drives are free of unrecoverable errors. The Corsair Neutron GTX only tallies “soft ECC correction” events, and it doesn’t report any of those. We’re in the dark with the Intel 335 Series, whose SMART attributes are devoid of error-related variables.

Performance
We benchmarked all the SSDs before we began our endurance experiment, and we’ve gathered more performance data at every milestone since. It’s important to note that these tests are far from exhaustive. Our in-depth SSD reviews are a much better resource for comparative performance data. What we’re looking for here is how each SSD’s benchmark scores change as the writes add up.

Apart from a few anomalies tied to the HyperX drives in the 4KB random read test, all the SSDs are maintaining reasonably consistent performance as the endurance experiment progresses. Even the Samsung 840 Series shows no ill effects.

These tests were conducted with the SSDs connected to the same SATA port in the same system. The drives were secure-erased before testing, giving us a nice apples-to-apples comparison. We also have performance data from the endurance test itself. These numbers track the speed of each loop, which writes about 190GB to the drives. The results are somewhat less reliable, because the endurance test is running simultaneously on six drives split between two test machines. The Corsair, Intel, and Samsung SSDs are connected to 6Gbps SATA ports, while the Kingston drives are limited to 3Gbps connectivity. Keeping those caveats in mind, we can still get a sense of how each SSD’s write speed changes over the course of the experiment.

The Samsung 840 Pro’s write speed spiked dramatically in the first run after our 200TB check-up. Since we secure-erase the drives after each threshold, that result isn’t unexpected. Performance typically increases after a secure erase, and some of the other SSDs exhibit similar behavior. The 840 Pro spiked higher than it did previously because it only wrote 145GB during that first run. There’s no indication of why the Anvil test stopped short of the prescribed 190GB, and there were no issues with subsequent runs. The 840 Pro’s SMART attributes don’t report any errors or programming failures, either.

Apart from that outlier, there’s little change from our post-200TB results. All the SSDs are running the endurance test at about the same speed as they were at the last milestone.

Lessons learned so far
The most important thing to take away from our experiment is that modern SSDs can survive an awful lot of writes without issue. We’re up to 300TB, and all the drives remain functional. The MLC-based models are holding up nicely, with only a handful of bad blocks between them. The TLC NAND in the Samsung 840 Series is degrading much faster, which we expected given the flash’s higher bit density. However, the drive still has plenty of overprovisioned spare area in reserve. And, like the other SSDs, the 840 Series has maintained largely consistent performance overall.

Only a couple of the SSDs have published endurance specifications, and we’ve already blown past those figures. The Kingston HyperX 3K is rated for only 192TB of total writes, while the Intel 335 Series is good for 20GB of “typical client” writes for three years, or just 22TB overall. We’ve also far exceeded the volume of writes I’d expect my own SSD to endure over its useful lifetime. The solid-state system drive in my primary desktop has logged a mere 1.3TB of writes since I installed it 18 months ago.

To be fair, our endurance experiment has lower write amplification than typical client workloads. Anvil’s test is comprised almost entirely of sequential writes, while real-world desktop activity involves a lot of random I/O. There isn’t a whole lot of data on the typical write amplification for client workloads, but everything I’ve seen and heard from SSD makers suggests a multiplication factor below 10X. If we take my personal usage patterns as an example and use 10X write amplification as a worst-case scenario, it would take nearly 35 years to write 300TB to the flash.

So, yeah, that’s why we’re not using real-world I/O in our endurance experiment. We wouldn’t be able to get results within a reasonable timeframe.

The data we’ve collected suggests that modern SSDs can easily survive many years of typical desktop use. Even TLC-based offerings should have more than enough endurance to handle what the vast majority of consumers will throw at them. That said, mounting flash failures appear to be responsible for the data integrity errors we encountered on the 840 Series. I would have no qualms about using TLC-based SSDs in my own systems, but I would check the SMART attributes periodically to keep an eye out for reallocated sectors. If those start piling up, it’s a good idea to replace the drive. As we saw with the 840 Series, error correction can’t necessarily keep up as flash failures accelerate.

From the beginning, we knew the 840 Series would be at a disadvantage versus its MLC-based rivals. The results bear that out, and they indicate we probably have a long way to go before the other SSDs start to falter. That’s good news overall, but it means there’s much more writing to do. Stay tuned.

Latest News

Crypto analyst Predicts Bitcoin Consolidation and Identifies Altcoin Bottom
Crypto News

Crypto analyst Predicts Bitcoin Consolidation and Identifies Altcoin Bottom

Cardano Founder Celebrates Blockchain's Cost-Effectiveness
Crypto News

Cardano Founder Celebrates Blockchain’s Cost-Effectiveness

The founder of Cardano, Charles Hoskinson, has praised blockchain’s performance, especially its cost-effectiveness.  The applause came following Cardano’s completion of a block transaction for 1,600 recipients at a cumulative fee...

memecoin base blockchain
Crypto News

New Meme Coin on BASE Blockchain Has the Potential to Make Millionaires

The first multi-chain Shiba Inu-themed meme coin is now available on Coinbase’s BASE blockchain, as well as Ethereum, BSC Chain, Avalanche, Polygon, and Solana. This addition potentially positions the project...

28 Google Employees Fired for Protesting Against The Company’s Israeli Contract
News

28 Google Employees Fired for Protesting Against The Company’s Israeli Contract

Statistics

90+ Jaw-Dropping Squarespace Statistics of 2024 You Must See

Joint International Police Operation Disrupts LabHost
News

Joint International Police Operation Disrupts LabHost – A Platform That Supported 2,000+ Cybercriminals

Apple Removes WhatsApp and Threads From App Store In China
News

Apple Removes WhatsApp and Threads from App Store in China