It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
" Thanks for that info. "
anytime :)
Post edited April 16, 2010 by StealthKnight
From one point of view, this is somewhat reasonable; I think it could show a piece of this age's culture (taken in a VERY broad meaning, in which the collective textual production of a generation is called 'culture'). However, archiving EVERY SINGLE tweet is really a waste and ridiculous. If they have so much time on their hands (good for them!), they could take a sample of it in some manner.
avatar
Arkose: [...]Twitter was full of, er... twits. ;)

And my hovercraft is full of eels! :-) (Sorry, I just had to post this.)
Many of the comments in this very thread (at least the snarky ones) are about 140 characters long.
If we're going to make fun of Twitter (a medium that I think has potential) then we might try to rise above it when we don't have character limits and we claim we're better. Otherwise we're the ones being the twits.
Maybe I'm wrong.
The amusing thing is that most of the posters here don't have a problem with Twitter but with stupid Twitter users; guess what, there are stupid users on message boards, IRC and everywhere so don't bash on a medium because of what the users are doing. This doubly when you are the one who decides who is worthy of being followed, unlike a message board without an ignore function where you get to read all the posts no matter who made them.
As for the archive, it also seems that you don't get that the context for a period is very important: it will be very useful when looking back at these times to be able to have as much data as possible from as many sources as possible in order to establish that context (for example compare the opinions the press had when Obama was elected with those posted on social media services). This goes hand in hand with the severe lack of context-generating information for the past, where we've just started digging for more data to make sense of it all (from letters sent by a mill owner to receipt stubs).
Also, 140 characters occupy roughly 4KB of space on disk (really this is just 140 bytes so the following math is severely inaccurate):
4 KB per tweet * 75 million (the current number of Twitter users) = 300 GB
We now need to extrapolate how many accounts are active and tweeting, which is a little harder to do. A study done by Sysomos showed that 5% of users accounted for 75% of all activity (out of which 32% of these were bots but we'll ignore that for now)
That means that the total number of active users is actually around:
5/100 * 75 million = 3 750 000
So let's say an average of about 10 tweets per day for these active users, that means
4KB * 3 750 000 = 15 GB/day
Assuming constant tweeting (which never happens, but let's just assume), we get
4KB * 3 750 000 * 365 = 5.475 TB per year for these 3 750 000 users.
This is a highly inflated number because we will store those tweets as just 140 bytes and not 4KB (this number was me writing a 140 character text and saving it in notepad as UNICODE); if we were to just do that math, we'd end up with:
140 bytes * 3 750 000 * 365 = 191.6 GB per year for the top posters.
With this in mind I seriously have a problem understanding how StealhKnight managed to get to those numbers:
avatar
StealthKnight: If every user was to post every day and send 55 million messages then it would equal 772092.7897 TB a day. All the messages per day probably figure around to 7.52974 GB. If this were done every day, congress would need 2748.3551 GB for a years worth of twitters. They would have to dedicate a room for twitter messages that are collected in a lifetime.

I would love to hear the math behind that and the numbers it is based on.
And yes, I am one of the people that find a lot of value in Twitter and defend it whenever I find useless bashing.
The thing I hate the most is that people don't say that they don't get Twitter, it's that they say it's useless, without appending the magical words "for me", as in "Twitter is useless for me".
Hey, I LOVE Twitter. Not sure why though.
Anyway speaking of Twitter they're having ads (I mean "promoted tweets"). I know they need to achieve profitability but I don't like tweets from people I don't follow in my feed (the new retweet was a nightmare because of that).
According to their own latest published documents ( http://blog.twitter.com/2010/02/measuring-tweets.html ), they're at 50 million tweets a day. According to the Chirp conference /today/, they're at 55 million tweets a day. They've been around since the beginning of 2007. At a constant 55 million a day since then, with no growth, this makes it 60.225 billion tweets.
If you want to be better, GigaTweet http://popacular.com/gigatweet/ estimates the current number of tweets at around 12.276 billion at the time of posting. That number's the most realistic IMO.
Assume all messages are 140 characters long. There's another 20 characters of data necessary for user identification. 160 characters at 12.276 billion each = 1,829GiB of data. That's a single 2TB hard drive.
Except this is all just text, not random data. Text compresses /very/ well. Given an average-but-well-written algorithm, even chucking in error-correction and seek-ability indexes, you can easily get 1:4 with just text. 457GB.
In other words, you could fit /all/ of Twitter on a single laptop hard drive. Newegg has a Western Digital 500GB laptop drive for $75 http://www.newegg.com/Product/Product.aspx?Item=N82E16822136314 or a Seagate 500GB regular hard drive for $55 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148395 .
Worth doing? That's another issue. Implementing it? Possibly costly depending on how much work Twitter does for them. But the storage of that much data isn't expensive today. :P
avatar
AndrewC:

Looking back, I have no clue I managed to get those numbers. Mine was a estimate anyway based on the numbers in the article. Yours is better.
avatar
riumplus: Blah blah blah Twitter blah blah blah math blah blah blah LAPTOP HARD DRIVE RECOMMENDATIONS

I love you.