Here's an idea for penalising senders who use out of date mailing lists: temporarily reject recipients depending on the relative proportion of valid and invalid recipients. The more invalid addresses they try to send mail to, the slower they'll be able to deliver mail.
This isn't suitable for use in all situations, because there are badly-run lists that nonetheless carry legitimate and desirable traffic. There are also well-run lists that will be unreasonably penalised by my site's annual bulk cancellations of departing users. But it might be useful in borderline situations, e.g. when the sender has passed the Spamhaus check but failed a DNSWL check.
The way to implement this is using my rate measurement equation to keep track of each sender's valid and invalid recipients, two rates per sender. When a sender addresses an invalid recipient, update their invalid recipient rate and also calculate their decayed valid recipient rate (which is just a * rold). When a sender addresses a valid recipient, do the complementary calculations. The proportion of valid recipients is rok / (rbad + rok). Pick a random number between 0 and 1 and if it's greater than that proportion, return a 450 (temporary rejection) otherwise return 250 (ok) or 550 (bad).
If you are feeling mean you can crank up the deferral rate faster than the invalid recipient rate. The rationale for this is that a list with a 50% validity rate is shockingly bad and probably deserves more like a 90% deferral rate (say). So calculate (rok / (rbad + rok))n for some n that increases with your meanness, and use the result for comparing with the random number.
If you are feeling kind you can pick your random numbers between 0 and slightly less than 1, so senders don't have to be whiter than white to avoid penalties. This tweak might make the technique less troublesome.
(I initially thought about deferring just valid recipients, but that has the effect of cleaning the invalid recipients out of the sender's queue, so when they retry the deferred valid recipients, the proportion of validity we measure for them will go up. Deferring both valid and invalid recipients solves this problem and makes the implementation simpler and more pleasingly symmetrical.)
This is actually fairly similar to some of the logic in SAUCE. It keeps a single annoyance number per sender which gets increased when they do something bad and decreased when they do something right, and which decays exponentially so past behaviour fades from memory. SAUCE's main penalty is teergrubing, though it will defer incoming mail if provoked enough, or if it is greylisting the sender. I don't think there's much point in teergrubing except to trigger bugs in spam bots (hence sendmail has a simple greet_pause feature, but no general teergrubing). However there might be some benefit from more intelligent dynamic greylisting.