I’ve been telling a few people about a Twitter application that I was creating. The whole process was/is a good idea, but the problem behind it is that it’s not able to be scaled to turn loose on the public.
Idea:
- Index the followers of an account to find out when they’re updating their status
- Only index status updates that are made from sources that promote interactivity (aka seeing their own timeline).
- Twitterfeed: Ignored
- TXT: Ignored
- Web: Indexed
- Tweetdeck: Indexed
- Manually verify sources: 2548 Total sources identified, 1461 verified, 252 marked as ‘index’, non verified sourced tweets are still stored until manual verification is done on that source.
- Chart by day and by 30 minute increment
- Deliver trending reports for subscribers on a comparitive basis versus previous weeks
- Allow subscription accounts to have an archive of reports and continually base trending statistics on stored data to allow for more accurate forecasting.
- Allow one-off accounts to get a quick picture of what’s going on with their followers, show stats for their followers that have already been indexed, and deliver full report a few days later
——–
I’ve got all but the reporting via email portion built. My server’s IP address and my username are whitelisted (20k calls per hour). I’m able to generate multiple reports to a user without crashing my server (massive learning on indexing was required). Â I have 1.7M tweets indexed and reportable. I’m halfway done with an @reply, RT, Retweet button feature that would have allowed for delving into a report to see if there are conversations going on.
And I stopped. The project doesn’t scale to more than a few hundred users. Why? I’m allowed 480,000 API calls to Twitter per day from my server. This means that at MOST I can only index 480,000 users per day. Twitter has a nasty habit of responding with an empty set of data for a user timeline when you ask for more than 50 of their previous tweets (it’s like a holy crap difference in good responses). Â If I make a call that comes back empty (but really shouldn’t be) I still get that call taken against my API calls allowed. Knowing that I’d only be able to index a user once per day when my system has exactly 480,000 users to index means that I can only grab 20 tweets at a time (knowing that I’ll most likely only get an empty response when that user’s timeline is, in fact, empty). I’ve accounted for empty accounts in another table and have also indexed how many times I’ve checked them. Just to answer your question: Yes, I was only trying to index a user once a cycle even though they may have been following more than one of the accounts I was trying to provide reports for.
I have tweaked my script an made it fast enough to make 4 calls per second to twitter (I saw a max of 5/s) when I’m asking for 20 updates (the standard response to a user timeline request). By the way, to max out your hourly calls to Twitter you’d have to make 5.55 calls per second, but you’re at the mercy of Twitter’s response time as well, which is usually pretty fast, but can hang periodically.
With this information in mind I stopped. The point of this application was to make sure that I got a complete history of these accounts. Not a spotty record (that wouldn’t make for good statistics).
While this attempt at a Twitter application may have failed, what I learned in the process makes it a very good success for myself personally.
By tweaking my code to make it as fast as possible I learned a LOT about script memory management, arrays of all types, database efficiency, indexing productively.
The majority of my code is made using functions instead of duplicating code, but I can clean it up even more. I will probably do this to create my own Twitter API usage library for use by me in the future.
I’m still going to explore other interesting things to do with Twitter data in the future. I need to learn OAuth for most of those (this app didn’t require OAuth since I was doing all the calls via unauthenticated means aka, I didn’t care about private accounts).
I did learn a few very important pieces of information during this though:
- Apps that can tell you how many of your followers are online right now and claim to be accurate are lying to you.
- This would require a feature like Facebook Chat which Twitter doesn’t have (nor do I see them ever having, which is a good thing)
- Bug testing while code writing is extremely inefficient
- I’m a software tester. I had to continually stop myself from constantly checking my code as it resulted in a very slow development pace
- Comments and descriptive variable/function names are your own Table of Contents
- I’m going to assume this goes for seasoned developers as well, but when you’re trying to debug something you’re writing from scratch it helps to know what you were thinking when you wrote it.
- Writing a language doc for yourself makes everything easier
- I fell into this trap almost right away. I was calling something A in one place and B in another. While they were both descriptive of the goal, it made my timeline reading for debugging nearly impossible
While the goal of this particular Twitter app failed because of scope, what I learned in the process was anything but a failure.
Thank you to Kurt, Marty, and Jachin for your time helping me through a few issues that a few hours of Googling couldn’t answer.
Failure should be our teacher, not our undertaker. Failure is delay, not defeat. It is a temporary detour, not a dead end. Failure is something we can avoid only by saying nothing, doing nothing, and being nothing.
– Denis Waitley
There will be more.
–dez
Pingback: Social Utilities: They’re Everywhere… — iamdez