The New York Times is running an interesting piece about the ever growing glut of data. The article details IBM and Google’s concern over the data glut and if new and upcoming students trained to handle the explosion of data. It is quite a fascinating piece.
At the heart of this criticism is data. Researchers and workers in fields as diverse as bio-technology, astronomy and computer science will soon find themselves overwhelmed with information. Better telescopes and genome sequencers are as much to blame for this data glut as are faster computers and bigger hard drives.
Please click through and read the whole article. It is very good and very true. This topic should be at the forefront of any person who works in the Computer/Technology field. First there is the problem of how to store this much data. Currently I work for a small publisher (O’Reilly media). It is easy to think that a small publisher probably doesn’t have huge storage needs. But so far since I’ve started working here (1 full year going on my second) we just ordered our second storage shelf, this time for almost 14TB. The new shelf has yet to be installed, but the other day my IT coworker was talking to management in a meeting. Our last shelf was around 1TB, but lasted less than a year. He said at almost 14TB this should last us a long time, but then added, “But we say this every time.” It is so true, especially with storage so cheap and drives so big. It reminds me of my first computer in the mid 90’s with 10GB of storage. I told my parents I’d never needed a bigger hard drive. Then I went away for my freshmen year of college and filled it right up with stupid pictures and movie files.
When I worked for the University of Illinois Engineering department the problems were worse. One research group that I worked for had 1 professor and maybe 5 students (including undergrad). They were relatively new so there was no infrastructure or file server and there really wasn’t much money for it anyway. One day I went to the Professor’s office. He must have had at least 30 hard drives each at least 500GB if not 1TB. Those were just the hard drives he had his students carried around a handful themselves. Another research group, with decades of history, started a scanning project. They would scan hundreds of slides at once each producing around 1MB of data. We installed a file array starting off at 4TB, but was expandable to 14. Unfortunately I left and am not sure what they have or need now. My point is that data storage is a huge problem. And is growing extremely fast. The article mentioned facebook’s 1Petabyte of photos, I’m guilty of quite a few of those, but that is just mentioning one company, many more could have been mentioned. Finally there is even personal space. Since I got my new camera I myself am looking at more storage for home. I am looking for personal NAS boxes. So I see the basic point. The future of IT is data and what to do with it.
Computer scientists and, for that matter, any scientists need to pay special attention. Not only do we need a way to store a lot of this data, but probably more importantly we need to do something with it. A lot of this will rest on programmers, but it isn’t limited to them. When I worked at the U of I the students worked on a cluster I built for them. They would code in C tweaking their algorithm to save every last processor cycle. These students weren’t in Computer Science. This summer I took a course at Boston University. One of my classmates was clearly not a computer person. I asked her why she took the course. She was a statistician and was heading to Grad School for statistics. The school asked her to take programming courses so she could analyze data sets. And of course then there are the Computer Scientists, and our future depends upon analyzing such data.
The future is big data; lots of it. And it is no longer just Google and IBM analyzing and storing it. Now even the smallest of research groups or a little publisher can generating mounds of information. Time to start paying very close attention.