Was It A Failure? My Twitter App

by dez on April 25, 2010 · 9 comments

in Personal, Social

I’ve been telling a few people about a Twitter application that I was creating. The whole process was/is a good idea, but the problem behind it is that it’s not able to be scaled to turn loose on the public.

Idea:

  • Index the followers of an account to find out when they’re updating their status
  • Only index status updates that are made from sources that promote interactivity (aka seeing their own timeline).
    • Twitterfeed: Ignored
    • TXT: Ignored
    • Web: Indexed
    • Tweetdeck: Indexed
  • Manually verify sources: 2548 Total sources identified, 1461 verified, 252 marked as ‘index’, non verified sourced tweets are still stored until manual verification is done on that source.
  • Chart by day and by 30 minute increment
  • Deliver trending reports for subscribers on a comparitive basis versus previous weeks
  • Allow subscription accounts to have an archive of reports and continually base trending statistics on stored data to allow for more accurate forecasting.
  • Allow one-off accounts to get a quick picture of what’s going on with their followers, show stats for their followers that have already been indexed, and deliver full report a few days later

——–

I’ve got all but the reporting via email portion built. My server’s IP address and my username are whitelisted (20k calls per hour). I’m able to generate multiple reports to a user without crashing my server (massive learning on indexing was required).  I have 1.7M tweets indexed and reportable. I’m halfway done with an @reply, RT, Retweet button feature that would have allowed for delving into a report to see if there are conversations going on.

And I stopped. The project doesn’t scale to more than a few hundred users. Why? I’m allowed 480,000 API calls to Twitter per day from my server. This means that at MOST I can only index 480,000 users per day. Twitter has a nasty habit of responding with an empty set of data for a user timeline when you ask for more than 50 of their previous tweets (it’s like a holy crap difference in good responses).  If I make a call that comes back empty (but really shouldn’t be) I still get that call taken against my API calls allowed. Knowing that I’d only be able to index a user once per day when my system has exactly 480,000 users to index means that I can only grab 20 tweets at a time (knowing that I’ll most likely only get an empty response when that user’s timeline is, in fact, empty). I’ve accounted for empty accounts in another table and have also indexed how many times I’ve checked them. Just to answer your question: Yes, I was only trying to index a user once a cycle even though they may have been following more than one of the accounts I was trying to provide reports for.

I have tweaked my script an made it fast enough to make 4 calls per second to twitter (I saw a max of 5/s) when I’m asking for 20 updates (the standard response to a user timeline request). By the way, to max out your hourly calls to Twitter you’d have to make 5.55 calls per second, but you’re at the mercy of Twitter’s response time as well, which is usually pretty fast, but can hang periodically.

With this information in mind I stopped. The point of this application was to make sure that I got a complete history of these accounts. Not a spotty record (that wouldn’t make for good statistics).

While this attempt at a Twitter application may have failed, what I learned in the process makes it a very good success for myself personally.

By tweaking my code to make it as fast as possible I learned a LOT about script memory management, arrays of all types, database efficiency, indexing productively.

The majority of my code is made using functions instead of duplicating code, but I can clean it up even more. I will probably do this to create my own Twitter API usage library for use by me in the future.

I’m still going to explore other interesting things to do with Twitter data in the future. I need to learn OAuth for most of those (this app didn’t require OAuth since I was doing all the calls via unauthenticated means aka, I didn’t care about private accounts).

I did learn a few very important pieces of information during this though:

  • Apps that can tell you how many of your followers are online right now and claim to be accurate are lying to you.
    • This would require a feature like Facebook Chat which Twitter doesn’t have (nor do I see them ever having, which is a good thing)
  • Bug testing while code writing is extremely inefficient
    • I’m a software tester. I had to continually stop myself from constantly checking my code as it resulted in a very slow development pace
  • Comments and descriptive variable/function names are your own Table of Contents
    • I’m going to assume this goes for seasoned developers as well, but when you’re trying to debug something you’re writing from scratch it helps to know what you were thinking when you wrote it.
  • Writing a language doc for yourself makes everything easier
    • I fell into this trap almost right away. I was calling something A in one place and B in another. While they were both descriptive of the goal, it made my timeline reading for debugging nearly impossible

While the goal of this particular Twitter app failed because of scope, what I learned in the process was anything but a failure.

Thank you to Kurt, Marty, and Jachin for your time helping me through a few issues that a few hours of Googling couldn’t answer.

Failure should be our teacher, not our undertaker. Failure is delay, not defeat. It is a temporary detour, not a dead end. Failure is something we can avoid only by saying nothing, doing nothing, and being nothing.

– Denis Waitley

There will be more.

–dez

  • Dr. MOM

    Ever onward. As we all have learned from history, many acclaimed people failed more than succeeded…but because of each person and his or her tenacity to persevere…success for the final product occurred. Or, some other success was realized that was not planned. You are on the right track and will find something that will work to help yourself and others.

    • http://iamdez.com dez

      Thanks, mom :-)

  • http://www.astheria.com Kyle Meyer

    Dez,

    If you would have maxed out at 480,000 users:

    - Charge for the service: limits amount of users using the service by reducing conversion rate
    - Cap the service: doing so at event a few hundred users (a feat if it is a pay service), may put you at capacity, but you are profiting within the boundaries presented to you. Charge on a subscription basis and allow new users if some ‘cancel’ their subscription.

    You make money, and if you hit the cap (meaning you are making a ridiculous amount of money), it would be what most people call “a good problem to have.” Perhaps, if in such a situation, Twitter would make some exceptions. I suspect Twitterific, Tweetie, and more make more than 5.55 requests/second.

    • http://iamdez.com dez

      If I was just indexing the person who signed up for the service this wouldn’t be a problem. However I’m indexing the followers of the person who signs up for the service. This means that it may not even take a few hundred accounts to max out the number of followers indexed.

      I like the scheme, but when there’s a cap to the number of users your ability to continually provide better service is also limited. As the base of this service would be to index member accounts’ followers there’s a problem with success of those accounts pushing me over the limit.

      However you’ve got my brain chugging on this so I’m going to try to figure out something.

      Also, apps don’t make more calls than that… the users that use those services/apps are the ones that are using the API calls under authentication. I’m sure Tweetdeck, etc are using their calls for something else, but from what I know there are only 8 companies that Twitter has partnered with to give them the firehose… also, if I had the firehose, I’d need a LOT more storage space. The limited information I’m keeping and the 1.7M tweets are taking up 70MB of storage. However, yes, that is a good problem to have and storage is cheap.

  • http://www.famzoo.com Bill at FamZoo

    Greetings from the land of “failure is our teacher” – Silicon Valley. I second your Mom’s sentiment!

    Just listened to your discussion on Albert Maruggi’s podcast and enjoyed it.

    Very cool idea and valuable exercise.

    As a developer, was kinda surprised by the comment “Bug testing while code writing is extremely inefficient” if I understand it properly. I think testing each little code iteration along the way is a best practice because the mind is still fresh about all of the subtle permutations.

    Cheers,
    Bill

    Keep on truckin’

    • http://iamdez.com dez

      I agree, each iteration for testing is a good idea. I don’t think I explained myself correctly, however. I was committing and testing nearly. Every line written instead of a feature at a time.

  • dave

    480K users per whitelisted IP … get more IPs. Step 1= proof of concept with 1 IP. Step 2 = get money/funding Step 3 = scale!!

    • http://iamdez.com dez

      More IP addresses won’t work since they are all grouped in under the same authorization for 20k/hour.

  • Pingback: Social Utilities: They’re Everywhere… — iamdez

Previous post:

Next post: