thinking stiff

Cursebird Leaderboard

Posted in twitter by thinkingstiff on 2009/02/04

The programmer of Cursebird, @richardhenry, has released a preliminary leaderboard (That would have saved me a lot of time three weeks ago!). One interesting thing I see on the list is that @bollocks gets every one of his tweets counted because of his name. He doesn’t even have to bother cussing.

Leaderboard Problems

Most people have wanted to see a leaderboard, but they haven’t necessarily thought through what that means. As I was working with @ThinkgingStiff, I thought a lot about the problems of creating one. Most people want bots excluded, and that seems easy at first. Bots like @ThinkingStiff and @Fuckbot are obviously there just to get to the top. But since Twitter doesn’t forbid bots, it leaves the job of excluding them up to the programmer. A first pass cleaning of them would be easy enough, but they will keep coming, I imagine, and it will become a daily task.

You also have bots that are reposting articles, scanning websites for keywords, or aggregating data. These are useful outside of the Cursebird universe and many of them are interesting and have many followers. Should they be excluded? After that, would come real people that are cussing their hearts out just to make it to the top. They have real accounts, real friends, and are really cussing. As soon as you have a leaderboard, people are going to compete to get to the top. That is just human nature. So pretty soon, you can be sure, the leaderboard will be all people that are trying to be there. It’s not very easy to determine who is just trying to be on the list and who just cusses a lot.

And once Cusrsebird has a list, with rules about who can be on it, there will inevitably be “good citizens” who want to rid the world of wrongdoers who will be scanning everyone’s tweets looking for “fakes.” They will then email the programmer to complain. Trying to keep the leaderboard clean would not be a job I would want to manually tackle on a daily basis.

Another problem I thought of is the effects of a leaderboard on website performance. Once there is a leaderboard, everyone will be clicking on these people to see who they are. Currently, it displays every swear a tweeter has ever said. For someone like @ThinkingStiff, at over 7000 swears, it takes a long time for the page to load. Only displaying the previous 500 might help with that.

Possible Solutions

It seems like all the manual solutions suck. I thought about a few automated ways to make bots less likely, but all of them depend on processing time on the database, and I don’t know how much time Cursebird has free in a day.

1. Limit the total daily swears to something more human, say 50. Twitter’s limit is 1000, of which about 500 will get counted (explanation). Only counting the first 50 swears would make bots much less useful.

2. Require an account to have at least one tweet over a month old. When someone starts to write a bot, they usually use a new account. If they have to wait a month to see any results, they will probably lose interest.

3. Require an account to have at least 10 followers. Very few people want to follow a bot. They may start, but will quickly stop following them.

4. Exclude duplicate swears. This is a big one and wold eliminate most common bots. A bot is either random words, which people quickly discover, or a finite list of tweets that get repeated.

Suggestion #1 could be done real time or as a batch on everyone. Suggestion #2, #3 and #4 would only need to be done on maybe 100 of the top cussers when building the leaderboard. Of course there are ways around #3, and #4 as a bot writer, but #1 is a big roadblock.

Conclusion

I’ll admit that a bot-free list would be interesting. Tweeters like @mollena are amazing. She just really cusses (and tweets) all the time.

Share: Digg . del.icio.us . StumbleUpon . Facebook . Twitter

I cursebird more than you do.

Posted in twitter by thinkingstiff on 2009/02/02

Discovering Cursebird

If you follow me on Twitter (and you should, I’m fucking awesome: twitter.com/mattwalton) you know that recently my fellow tweeters and I discovered cursebird.com. Cursebird is a website that tracks, in real-time, people who are cussing on Twitter. If you tweet, “This has been a long fucking day!,” your tweet will show up on Cursebird’s website for all to see.

The site tracks the following words (and variants): fuck, shit, cunt, dick, cock, twat, bloody, bastard, bollocks. It shows the percentage each word is used and whether its usage is increasing or decreasing. It also shows four pieces of information for an individual tweeter: number of swears, a score out of 100, rank compared to others, and a statement of who you cuss like.

One of the first things people do upon discovering this site is start cussing up a storm on Twitter and posting their rank. It’s fun to watch your rank and score go up. Strangely, most of my friends were in the top 1000, and several were in the top 500. I guess we’re just a filthy bunch.

After playing with it for a bit, I and others had questions about how the scoring worked, how the count worked (tweets or words), and who was number one and how many did they have. While most people gave up and just went back to their regular tweeting behaviors, I created a second twitter account, @ThinkingStiff, and started trying to figure it all out.

Researching Cursebird

The first thing I wanted to figure out was what it was counting: tweets with cussing in them, number of cuss words, or some algorithm to figure out how “well” you cussed. So I started sending one word tweets that just said “fuck.” I quickly learned that Twitter doesn’t allow this. If you send the same tweet as the last one, it doesn’t even bother posting it. Then I switched to fuck, shit, fuck, shit… That got past Twitter’s restriction and I was steadily climbing up the charts on Cursebird. It was obvious at this point that Cursebird just counts the number of tweets with a cuss word in them. It doesn’t care how many cuss words are in it, or what words you used. So, one question was answered.

I then learned that Twitter has a limit on how many tweets you can post. When you hit this limit you get this message:

Wow, that’s a lot of Twittering! You have reached your limit of updates for the hour. Try again later.

After some research I discovered it’s not really an hourly limit, it’s a daily limit of 1000 tweets. They limit you in three hour blocks, so you can send 125 tweets every three hours. Once you’ve sent that many tweets, you have to wait until the three hours is up to send more.

The next question I had was how the score worked. It’s out of 100 so my first guess was that it was a percentile. But I was confused because the same swear count was resulting in different scores. I then saw someone’s score of “Not sure of 100.” This was a clue that maybe the processing of the rank and score was delayed (I’m a database programmer, so this made sense to me). Sure enough, it takes about 15 to 20 minutes for the rank and score to update. So you start out at “Lame of 100” when you have zero swears and move to “Not sure out of 100” on your first swear until it processes your score for the first time. Then it’s a percentile. So a “50 of 100” means you have cussed more than 50 percent of the people tracked on Cursebird. With that, my second question was answered.

Along with the score is a statement of who you swear like. You start at “swears like a Mute” with zero swears and work your way up to “swears like a George Carlin Wannabe.” Both the score and the statement weren’t that interesting to me because once you have over about 50 swears your score is “100 of 100” and your statement stays at “swears like a George Carlin Wannabe.” The programmer (twitter.com/richardhenry) said that it is possible to get “101 of 100” but I’m pretty sure he was kidding. Unless he just picked some arbitrary large number and if you ever get that high you get the magic score.

Climbing The Ranks

The only remaining questions I had were who was ranked number one and how many swears did they have. I was quickly climbing the ranks and guessed it would take me a couple of nights to take number one. So for a couple of nights, while watching movies, I typed Shit and Fuck over and over. I was careful to space them out to one every few minutes because I wasn’t sure what Twitter’s policy was on things like this. I didn’t want my account closed before I completed my tests. One thing I noticed while doing this was that Cursebird doesn’t pick up all your tweets. It’s more like half of them. So if you cuss ten times, Cursebird may only see five of them. I’ll go into this more later.

Something I didn’t expect, which started on day two, was people complaining. I had no followers, because every time someone tried to follow me I blocked them. I did this specifically so no one would have to see my tweets every few minutes. So the only place to see my tweets was on Cursebird. And the whole point of Cursebird was to track cussing so it didn’t make any sense to me that people would care, but they did. One girl told me I was ruining Twitter. One guy reported me as spam. Another guy called me a “motherfucking cheap bastard asshole.” So apparently I wasn’t cussing “right.”

That’s when I started trying to look like a real cussing tweeter, whatever that is. The simple “fuck” and “shit” tweets turned into more elaborate stream of consciousness type tweets all with the word fuck or shit in them. Each one was unique and was just whatever I was thinking about at the time. It took more concentration to come up with them, so I couldn’t really do anything else while I was doing it. I did this hours at a time while watching the live Cursebird stream of everyone else cussing. After awhile, watching and producing all this banal word vomit was making me feel crazy.

I changed my picture every day so people wouldn’t see too many tweets from the same person. I also searched Twitter (http://search.twitter.com/search?q=%40ThinkingStiff) to make sure no one was complaining about my excessive tweets. It seemed to be working. My new tweets were not offending people (even though many of them were pretty vulgar) and some people even found them interesting.

The Bot

I did this every night for a week and I was up to almost 700 swears, but I still wasn’t in 1st. In fact, I couldn’t seem to get past 3rd. It was taking way longer than I expected and I was getting totally sick of it. I started thinking that maybe the programmer, while doing testing, had accounts with many thousands of swears and I would never be able to pass them. I also thought about writing a bot (a bot, or robot, is a piece of software that automates a task) that could tweet all day long. I was watching the live stream, so I knew there wasn’t any other bots running at the time or I’d see their swears.

The very next day I discovered someone else had written a bot (@fuckbot) and was quickly gaining on me. He was 400 swears behind me and at the rate he was going he would pass me in about four hours. Fuck. So I quickly wrote a bot. It took me about two hours to get it running the way I wanted it. I used the 500 or so tweets that I’d manually written and the bot picked one of the 500 randomly and tweeted it. Because I babysat/tweeked/monitored my bot constantly, it was less like a bot and more like a sentient artificial life form. @Fuckbot was now 200 swears behind me. I knew we were both constrained by the same 1000 tweets a day limit, so I figured that, unless my bot crashed, we’d stay at about the same pace.

And my bot crashed. So I fixed a bug to make sure it wouldn’t happen again and restarted it. He was now 180 behind me. His bot was spewing out random racist crap that looked nothing like real tweets, so I knew that he would be getting complaints about it (because I was and I simply said “fuck” and “shit” over and over). He also had a big red Fuckbot avatar. If Twitter had a problem with cussing bots, I would know soon.

Several days passed with both of us running our bots and Twitter still hadn’t canceled his account. I played with tweaking the time between my tweets to see if I could gain any on him. I started at 60 seconds and slowly progressed up. I was hoping that at some point Cursebird would capture more of my tweets or that I’d get more in synch with Twitter’s maximum and I wouldn’t have the downtime every three hours where Twitter starts limiting it. That failed miserably and I ended up slipping closer to @Fuckbot. He was now 140 swears from me. So I put it back at 60 and haven’t touched it since. That seems to be the best interval, although I haven’t tried going below 60.

I’m #1

Around 1019 swears (that’s when I noticed) I took the number two position. Soon, @Fuckbot took 3rd. And that would be the end of @Fuckbot’s journey. Twitter blocked him from the public timeline so, while he can still tweet, Cursebird can’t see his tweets. His grand total was 1621 swears (cursebird.com/fuckbot).

With over 1000 swears and @Fuckbot out of the race, I was much less worried about someone overtaking me. But I was still a little worried that I might suffer the same fate as @Fuckbot. I started adding new tweets to my collection on a daily basis to keep it interesting and less repetitive. A few days later, at 2814 swears and after almost two weeks, I noticed I was in 1st place!

I’ve kept the bot running and as of the time of this writing, I have almost 6300 swears (cursebird.com/ThinkingStiff). Now that I’m not worried about getting my account closed I stopped blocking people and changing my picture. People are talking about @ThinkingStiff a little more frequently (http://search.twitter.com/search?q=%40ThinkingStiff), but less of it is hateful now.

Another bot has come on the scene recently, @shiteatingfuck. He’s 5500 swears behind me, so he probably won’t catch up until I stop mine. He started out using @Fuckbot’s random cuss word method but has since switched to movie quotes so he’ll probably be around for awhile. He’ll take 2nd in less than a week.

Results

Here are my results climbing up the ranks. Your results will be different because there’s been a lot of cussing since I started.

Tweets: Rank

  • 60: 471st
  • 70: 345th
  • 80: 252nd
  • 90: 209th
  • 100: 171st
  • 150: 70th
  • 200: 34th
  • 250: 23rd
  • 300: 13th
  • 350: 9th
  • 400: 8th
  • 450: 4th
  • 500: 3rd
  • 700: 3rd <— wrote bot
  • 1019: 2nd
  • 2814: 1st

Here are the people that I know of that are in the top 10. There is no published top 10 list, so this is just from seeing people come up on the live feed. Many people have asked for a Top 10 list (thoughts on a leaderboard). I wouldn’t mind seeing one either, but after staring at it for hundreds of hours, I’ve learned the most fascinating part of Cursebird is the live feed (like Twistori). It’s amazing how it all just seems to look the same after awhile. Everyone complains in the same way about the same stuff. Just a slice of life.

Top 10 (Feb 2 2009)

  1. @ThinkingStiff
  2. @geofftest <— account closed
  3. @Fuckbot
  4. @shiteatingfuck
  5. @redchinese19 (Thanks @gocards300)
  6. @Mollena

UPDATE (2/4/2009): @richardhenry released a preliminary leaderboard.

Technical

I used the TwitterLib from Witty to access the Twitter API (Witty). I randomly pick one of about 500 tweets to post every 60 seconds. I verify that I haven’t posted the same tweet in the last two hours, to help eliminate the appearance of repetition. Cursebird misses tweets, but as far as I can tell it’s Twitter’s fault. Not every tweet shows up in the public timeline or the search API. It doesn’t seem like the API is doing “contains” searching, so it would be nice to add “ass” to the list of tracked words. Or at least “asshole.” If you’re writing your own bot, and want to play with the 60 second interval, remember that you are limited by the 1000 tweets a day (125 every 3 hours) that you can’t get around. So your goal should be to get as many tweets to show up in the Twitter Search.

Share: Digg . del.icio.us . StumbleUpon . Facebook . Twitter

Tagged with: , , , , ,