Extracting Intelligence From the Internet Collective Conscious


All technical jargon aside I set out to make an engine that could make correlative conclusions based off of the unstructured data sets found on the internet. The designed engine used Hadoop based technologies to leverage the limited hardware capabilities and increase the engines efficiency to allow the engine to sort the unstructured data without a large computational capacity. The engine’s distributed file system allowed for the parallel processing of data rather than in a linear fashion to exponentially increase efficiency. The engine was tested on three queries for which the answer to was already known and proven with extensive research.

The engine first collected relevant URL’s using the Google API and used a Hadoop based crawler to collect the textual data from the collected URL’s. An initial word count projected a structure onto the unstructured data set for manipulation by an algorithm. The neural network inspired algorithm utilized different weights to quantify confidence levels which were decided through a logarithmic regression model.  The final output consisted of a structured ten words along with its respective analytical data (ex. weights…), and graphs containing comparative patterns and logarithmic regressions. The algorithm manipulated the projected structure to come to correlative conclusions that had a 96% accuracy with a ± 3% margin of error. All in all, the engine effectively satisfied all its parameters using Hadoop technologies to process large amounts of unstructured data sets in parallel; And even more it could autonomously and efficiently derive intelligence from the collective conscience we know as the internet.