Big Data Coorelation: Purpose

Question

About 1.8 zettabytes (1.8 trillion gigabytes) of data is being created every year. In all this data there are answers to problems we have been wondering about for ages. It’s just how you can process the information most efficiently and derive correlations from the complexity of the data on the internet. You may not be able to prove anything scientifically, but you may be able to prove hypotheses statistically with huge amounts of data which is hidden somewhere in this intimidating data set. So is it possible to mine hidden information from these huge scales? Can one use existing technologies such as Apache Hadoop, Nutch, Map Reduce, and Google API to develop an engine that can derive comprehendible correlational data autonomously and efficiently?

Purpose

With all this data being produced every year, finding a radical and innovative way of processing large and complex data sets is a need that is unfulfilled. For any computer, processing unstructured data is a very arduous and long process (all the internet’s data is unstructured). This exercise of an engine implementation is an attempt at combining multiple high-end technologies to work in unison to crutch and sift through large and complex data sets to Read More »

Advertisements

Big Data Clustering: Introduction & Topic

The past few years have entailed newer problems to the advancement of human intelligence. Trillions of gigabytes of data are being produced every year, and the total cumulative power of all the computers in existence today can merely compute half that amount using a traditional database system to crunch sheer data. This very problem has created a new industry we now know as Big Data. According to Wikipedia, big data is used “for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”

Read More »