on February 28, 2010 by Justin in Uncategorized, Comments (6)

TweetMiner Degraded Performance – Sun 28th Feb 2010

TweetMiner is experiencing degraded performance on Sun 28th Feb 2010.

Here’s why…

Connections between “Rackspace Cloud Sites” and “Twitter” are being randomly denied. This means that when TweetMiner tries to connect to Twitter, to get your tweets, the connection is being denied (on a random basis).

Neither Rackspace or Twitter are doing this. For some reason it’s a node “out there” on the internet between Rackspace and Twitter that is causing the problem. When running a trace route we find that it’s hop #5 that is the problem. Hop #5 is owned by ATT.

We think the reason why this has occurred is because each and every site on Rackspace cloud “sites” connects to the outside world via one single IP address. So, along with TweetMiner, there are manyother web-sites all using the same IP address to connect to Twitter from Rackspace.

“In theory” it’s possible that the ATT server (hop #5) is seeing too many requests from the same IP to the same server as some kind of “denial of service” attack. If it did then it would act in exactly this way by arbitrarily closing down connections.

How can TweetMiner or Rackspace (or Twitter) resolve this… the answer is that none of these services can resolve this. The issue lies with ATT servicer hop#5.

So what the heck is TweetMiner going to do about it?

  1. There is a possibility that ATT will fix this on tomorrow on Monday. That would be nice for a fast fix.
  2. Even if this issue does resolve itself TweetMiner can’t allow this kind of single point of failure to exist within it’s business infrastructure which means that we will be spending the next week moving TweetMiner off of “Rackspace Cloud Sites” onto a non shared IP – server provider to be established
  3. Since we can’t quanitfy exactly how long the TweetMiner degradation might last the best we can do is to say we will resolve it as quickly as possible and we will find a way to credit our customers for the downtime.

It just remains to be said we are as frustrated about this as you are and we’re working as hard as possible to fix the issue!

Update (Mon 01 March 2AM PST):

I’ve been working on this all day (Sunday) and built a twitter API proxy in order to avoid the hopping issue above. I installed it but the server still showed the exact same errors. So I called rackspace and explained this – and now it turns out this probably is a Rackspace issue at an infrastructure level. The whole ATT thing was probably a red herring. So, finally, they have acknowledged the problem (after a lot of arguing) and are now looking into it. I will find out more tomorrow (Mon) and post updates here.

Update (Mon 01 March 8:40AM PST):

TweetMiner is back online. The real issue was that another Rackspace customer on the same shared server as TweetMiner had configured PHP to not close sessions. Therefore they had 210,000 open sessions in one single directory. This caused Apache to have problems because disk I/O seek times are severely impaired when a directory has more than 10,000 files. The knock on effect was that the shred hosting services were being depleted – and TweetMiner’s disk access was being suffocated by the other site.

The good news is it’s up an running again, and I am now seeking alternative hosting arrangements to put TweetMiner on a more dedicated stand alone system.

Update (Tue 02 March 12:18AM PST):

The slow effects have been completely reversed and TweetMiner is running blindingly fast again.

6 Comments

  1. Luigi

    February 28, 2010 @ 8:01 pm

    Thanks for the update Justin. These problems happen :D i don’t doubt you/them will resolve this pretty quickly.

  2. Jenna

    February 28, 2010 @ 8:14 pm

    Justin,
    Thank you for the update and I hope that TweetMiner can resolve this issue as soon as possible and in the best interest of all. Your fans and users are all pulling for you! Thanks for the outstanding service you have provided and I look forward to being able to use TweetMiner again soon.

  3. Imad Naffa

    March 1, 2010 @ 12:46 am

    Thanks for the update Justin. As of 10 pm PST, I was not able to use TM. Will check again in the morning. Imad

  4. jonathan

    March 1, 2010 @ 3:22 am

    Yes – thanks for the update. Hope you resolve it quickly. Good service..

  5. Jennifer

    March 1, 2010 @ 8:32 am

    Thanks Justin! Much appreciated!!!! Thanks for all the hard work!

  6. Neville

    March 1, 2010 @ 8:58 am

    Thanks for working so hard on this. You’re a regular Sherlock Holmes, sniffing out the culprit. Cheers mate.

Comments are disabled.