What is big data?
The definition of big data states big data as “the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”2. In a sense this is true if we consider the internet to be the collection of large data sets. This emerging industry already has a couple key miners that have developed technologies that fit their purposes of either sifting of crunching through data. The tools we will be using in this exercise were developed by Apache, Google, and Hortonworks but the creation of the engine which will utilize these engines in unison will be the proprietary idea that will be created in this exercise.