What is big data?
The definition of big data states big data as “the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”2. In a sense this is true if we consider the internet to be the collection of large data sets. This emerging industry already has a couple key miners that have developed technologies that fit their purposes of either sifting of crunching through data. The tools we will be using in this exercise were developed by Apache, Google, and Hortonworks but the creation of the engine which will utilize these engines in unison will be the proprietary idea that will be created in this exercise.
Read More »
About 1.8 zettabytes (1.8 trillion gigabytes) of data is being created every year. In all this data there are answers to problems we have been wondering about for ages. It’s just how you can process the information most efficiently and derive correlations from the complexity of the data on the internet. You may not be able to prove anything scientifically, but you may be able to prove hypotheses statistically with huge amounts of data which is hidden somewhere in this intimidating data set. So is it possible to mine hidden information from these huge scales? Can one use existing technologies such as Apache Hadoop, Nutch, Map Reduce, and Google API to develop an engine that can derive comprehendible correlational data autonomously and efficiently?
With all this data being produced every year, finding a radical and innovative way of processing large and complex data sets is a need that is unfulfilled. For any computer, processing unstructured data is a very arduous and long process (all the internet’s data is unstructured). This exercise of an engine implementation is an attempt at combining multiple high-end technologies to work in unison to crutch and sift through large and complex data sets to Read More »
The past few years have entailed newer problems to the advancement of human intelligence. Trillions of gigabytes of data are being produced every year, and the total cumulative power of all the computers in existence today can merely compute half that amount using a traditional database system to crunch sheer data. This very problem has created a new industry we now know as Big Data. According to Wikipedia, big data is used “for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”
Read More »