Fluid Intelligence: Introduction


Fluid intelligence: the capacity to think logically and solve problems in novel situations, independent of acquired knowledge

Psychology has found the basis of fluid intelligence in the juxtaposition of layered memory and application as means to essentially “connect two fluid ideas with an an abstractly analogous property”. Such a mathematical design would have to be able to therefore derive temporal relationships with weighted bonds between two coherently disparate concepts through the means of similar properties. These properties within node types will have to be self-defined and self-propagated within idea types.


In a pursuit towards a truly dynamic artificial intelligence, it is necessary to establish a recurrent method to decipher the presence of concrete yet abstract entities (“ideas”) independent of a related and coherent topic set.
A considerable amount of work venturing into this field has culminated in the prevalence of statistical methods to extract probabilistic models dependent on large amounts of unstructured data. These Bayesian data analytic techniques often result in an understanding superficial in the context of a true relational understanding. Furthermore, this “bag-of-words” approach when looking at amounts of unstructured data (quantifiable by correct relationships derived between the idea nodes) often relate to a single dimensional understanding of the topics at hand. Traditionally, when these topics are transformed, it is difficult to extract hierarchy and queryable relations using matrix transformations from a derived data set.

The project that I will be describing in the subsequent posts is an effort to change the approach from which dynamic fluid intelligence is derived, finding a backbone in streaming big data. Ideally, this model would be able to take a layered, multi-dimensional approach to autonomous identification of properties of dynamically changing ideas from portions of said data set. It would also be able to find types of relationships, ultimately deriving a set of previously undefined relational schemas through unsupervised machine learning techniques that would ultimately allow for a queryable graph with properties and nodes initially undefined.


Big Data Coorelation: Hadoop

Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System is the foundation for any Hadoop Cluster and/or single-node implementations. The HDFS is the underlying difference between a normal MySQL6 database and a Hadoop implementation. This small change in approaching the data makes all the difference.

A standard MySQL server serves the purpose for any small endeavors and can support an infrastructure about the size of Apple’s database with no problems. The method for processing data usually follows a linear though pattern.Take an example of a phrase “Hello world”. In a very rough representation a MySQL server would save the entire phrase on one hard disk. Then, when the data would be needed the CPU would send a request for the data, the hard disk would spin, and the data would be read/processed.


This traditional approach to managing a database hits a few, key problems with no rational and affordable solution. The largest problem that is faced in this system is a mechanical one. At a certain point of complexity and size, a single hard disk can no longer physically spin fast enough to keep up with the seek capabilities of a single CPU. This problem can lead two solutions: make a better hard disk or rethink the way data is processed in the world today. Hadoop offers a solution to rethink the way this problem is dealt with in a radical new way. A Hadoop cluster implements a parallel computing cluster using inexpensive and standard pieces of hardware. The cluster is distributed among many servers running in parallel. The philosophy behind Hadoop is basically to bring the computing to the data. To successfully implement this, the system has to distribute pieces of the same block of data among multiple servers. So basically each data node holds part of the overall data and can process the little data that it holds. This pyramid scheme is visible when the system is scaled up to an infrastructure of Google’s size. The system no longer has the physical barrier of the spinning disks but rather a problem of just storage capacity (which is a very solvable and good problem to have).


Read More »

In the Comparison of Genetic Operators For Solving the Traveling Salesman Problem: Selection

In comparing selection methods, for the sake of comparison it was in our best interest to leave the least to randomness except in the selection method. The mutation method was the center inverse mutation throughout all the trials and a center mutation point was chosen every time. The cutoff percentage was the same (30%) for each trial and the number of generations was a fixed 5000.

The numbers displayed below are the average of 10 trials conducted with the same input graph but a different initial population for each trial.Selection Comparison

In the Comparison of Genetic Operators For Solving the Traveling Salesman Problem: Mutation

In attempt to statistically compare the operators, the input graph and the initial population was kept the same for each trial. The numbers displayed below are the average of 10 trials conducted with the same input graph but a different initial population. The algorithm was ran with an input graph consisting of 26 static nodes and approximately 4.03E26 possible combinations. Each trial ran 5000 generations with an input population of 5000 chromosomes. The fitness percentage was 30% throughout every trial.

Mutation Operators and Crossover Point

In this trial the method of selection was kept standard using the percentage cutoff method to avoid any influence from the selection method.

Random Crossover Point Center Crossover Point
Reverse Sequence Mutation 336 414
Center Inverse Mutation 253 310

The representation of each mutation operator over iterations was tested with a constant center crossover point.

Mutation Operator Comparison

Genetic Algorithm: Selection

In every generation, a selection agent comes to play which sifts out the fit chromosomes from the unfit chromosomes. The selection agent “kills off” a user specified percentage of organisms in the population.However, it is under the discretion of the selection agent in determining which chromosomes to kill. As mentioned earlier, fitness is defined by having the lowest weight in the circumstances put forth by the TSP. However selection may not necessarily be only off of that. This can be seen when comparing the two most prevalent types of selection operators:
Read More »

Genetic Algorithms: Crossover

The method of crossover remains fairly constant regardless of the problem and scope. Crossover is achieved by first selecting a crossover point within a pair of defined and unique organisms P1 and P2 (which are the equivalent of parents for the crossed over parent). The chromosomes are then split at the selected crossover point. The second half of P2 (P2H2)  is then appended to the first half of P`1  (P1H1) to make one child chromosome (C1). The second child (C2) is made by appending the second half of P1 (P1H2) to the first half of P2 (P2H1).Read More »

Genetic Algorithm Definitions for TSP

A genetic algorithm is a type of evolutionary algorithm and therefore TSP must be fit to fill all the constraints necessary to execute a genetic algorithm. An organism in the sense of TSP can be defined as a viable path that visits every node in the graph. Each path must start with a node, visit all the nodes present in the graph, and then return to the same node that it started with. An example of a viable path with an input graph of 10 vertices is shown below with each letter representing a node in the input graph:

{A, C, J, D, G, H, E, B, F, I, A}

The population in TSP can be defined as a set of unique paths. Fitness can be defined as the weight or distance of the path. Thus, a lower weight will result in higher fitness and vice versa. A sample population of two organisms is shown below. In front of each organism is its weight or for the cases of this exercise— its fitness:

set{ 143 : {A, C, J, D, G, H, E, B, F, I, A} , 210 : {A , B, J, D, C, E, I, F, H, G, A} }

Genetic Algorithms: Intro

In this exercise, we attempt to utilize genetic algorithms to find an optimal, but not perfect, solution to the traveling salesman problem. A genetic algorithm emulates nature in its optimization process. Nature uses several mechanisms which have led to the emergence of new species and still better adapted to their environments. The laws which react to species evolution have been known by the research of Charles Darwin in the last century: Genetic algorithms are powerful methods of optimization that utilize these rules defined by evolution in their process to find a pseudo-optimal answer. These algorithms were modeled on the evolution of species. The genetic algorithm utilizes the properties of genetics such as selection, crossover, mutation.

Introduction to the Traveling Salesman Problem

The Problem
The traveling salesman problem (TSP) is a typical example of a very hard combinatorial optimization problem. The problem is to find the shortest tour that passes through each vertex in a given graph exactly once. The TSP problem is classified as an NP-complete problem. There are some intuitive methods to find the approximate solutions, but all of these methods have exponential complexity, they take too much computing time or require too much memory. Mathematically TSP can be expressed as:
min [f(T),T = (T[1],T[2],T[3],… …,T[n])]
Read More »