Three Years of @fewerror

Oct. 7, 2016 at 9:10am with 2 notes

@fewerror, my most popular bot by a very wide margin, turned three years old earlier this week.

’Twas the night before I moved out of my place in Cambridge, to start a new job in London 10 days later. I couldn’t sleep, and decided the time had finally come to automate a running joke about hypercorrection: it’s not funny to correct people’s use of “less” rather than “fewer” along prescriptivist grounds, but it is funny to do so in totally inappropriate contexts. I feel like I’d been talking about automating this for an extremely long time, fretting over the details, but in a moment of clarity I realised it’s relatively simple:

add part-of-speech tags to the input
match the pattern “less w”, where w is basically any part of speech except a plural noun
correct the speaker to “fewer w”

There are a few linguistic details you need to work out (the de-facto standard Penn Treebank part-of-speech tags do not distinguish singular nouns from mass nouns; correcting “less adj noun” to “fewer adj” may be correct) but if you err on the side of false negatives, you get pretty good results from not much code. You may or may not agree that the joke is funny (many people I admire and respect are very firmly in the second camp), but it has 25% more followers than me so it must add some value to the world. I maintain a gallery of praise for @fewerror, not all of which is necessarily sincere…

Over the past three years I have discovered that the hardest part of “automating smug men getting things wrong” is not actually the natural language processing: it is making it behave appropriately and respect the social norms. In some ways this is made harder by the bot’s entire premise being to interrupt with an annoying joke.

Initially, the bot searched the public timeline for the word “less” to find its victims, and was not properly rate-limited. This got it suspended within a few hours. Do not underestimate the rate at which people tweet, even in 2013.
Then, it only replied to tweets by people it follows – and would automatically follow back if you follow it. But I didn’t implement the dual logic: if you unfollow it, it should stop talking to you. This Hotel California behaviour is very poor form. It also took me a long time to realise it should un-CC any users mentioned in the tweet who do not also follow it.
On the other hand, my poor bot ethics did allow me to manually opt-in Richard Dawkins, leading to the bot’s finest hour. I’m afraid I’m still pleased with myself for doing this, and I justify it to myself as punching up.
Even though it tweeted a racist slur within six months of its launch, it took me until March 2016 (over two years! ugh) to add a bad words filter.
You have to jump through hoops to handle retweets (native and manual) in a sensible way.

I feel quite bad about some of the past bad behaviour, most of which I could have anticipated if I had tried. I think it is well-behaved these days, but if you disagree in a way that can’t be solved by un-following it, please let me know on Twitter or GitHub. My goal is that no-one should feel the need to block it.

The other unexpected timesink of running a relatively simple Twitter bot is that you get to be your own ops team, dealing with infinite loops in the libraries you use and setting up automatic deployment after tests pass in CI and all these good things. If you’re thinking of running a bot, I’m afraid it’s worth getting deployment and monitoring working from the start: it’ll save you time in the long run. If at all possible, consider outsourcing the hard work with v21’s Cheap Bots Done Quick.

Three years and three jobs later, I’d like to believe this project is done. Maybe one day I’ll make some procedural “art” which is less divisive, but for now I’ll raise a mug to an artefact that still seems to find new fans!

conundrum liked this
wjt posted this