About 1.8 zettabytes (1.8 trillion gigabytes) of data is being created every year. In all this data there are answers to problems we have been wondering about for ages. It’s just how you can process the information most efficiently and derive correlations from the complexity of the data on the internet. You may not be able to prove anything scientifically, but you may be able to prove hypotheses statistically with huge amounts of data which is hidden somewhere in this intimidating data set. So is it possible to mine hidden information from these huge scales? Can one use existing technologies such as Apache Hadoop, Nutch, Map Reduce, and Google API to develop an engine that can derive comprehendible correlational data autonomously and efficiently?
With all this data being produced every year, finding a radical and innovative way of processing large and complex data sets is a need that is unfulfilled. For any computer, processing unstructured data is a very arduous and long process (all the internet’s data is unstructured). This exercise of an engine implementation is an attempt at combining multiple high-end technologies to work in unison to crutch and sift through large and complex data sets to Read More »