Big Data Clustering: Introduction & Topic

The past few years have entailed newer problems to the advancement of human intelligence. Trillions of gigabytes of data are being produced every year, and the total cumulative power of all the computers in existence today can merely compute half that amount using a traditional database system to crunch sheer data. This very problem has created a new industry we now know as Big Data. According to Wikipedia, big data is used “for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”

Read More »

Advertisements

In the Comparison of Genetic Operators For Solving the Traveling Salesman Problem: Selection

In comparing selection methods, for the sake of comparison it was in our best interest to leave the least to randomness except in the selection method. The mutation method was the center inverse mutation throughout all the trials and a center mutation point was chosen every time. The cutoff percentage was the same (30%) for each trial and the number of generations was a fixed 5000.

The numbers displayed below are the average of 10 trials conducted with the same input graph but a different initial population for each trial.Selection Comparison

Genetic Algorithm: Selection

In every generation, a selection agent comes to play which sifts out the fit chromosomes from the unfit chromosomes. The selection agent “kills off” a user specified percentage of organisms in the population.However, it is under the discretion of the selection agent in determining which chromosomes to kill. As mentioned earlier, fitness is defined by having the lowest weight in the circumstances put forth by the TSP. However selection may not necessarily be only off of that. This can be seen when comparing the two most prevalent types of selection operators:
Read More »

Genetic Algorithm: Mutation

During the progression of a genetic algorithm, the population can hit a local optima (or extrema). Nature copes for this local optima by adding random genetic diversity to the population set “every-so-often” with the help of mutation. Our genetic algorithm accomplishes this via the mutation operator. Although there are a plethora of mutation types our GA focused on a select two:

1. Reverse Sequence Mutation – In the reverse sequence mutation operator, we take a random point in the sequence or organism. We split the path (P1) at the selected point. The second half of the split path (P1H2) is then inverted and appended to the end of the first half (P1H1) with the necessary corrections made to make sure the last node is the same as the start node to get a final mutated path (M1).

P1: {A, C, J | D, G, H, E, B, F, I, A}  ⇒ M1: {A, C, J, I, F, B, E, H, G, D, A}

2. Center Inverse Mutation – The chromosome (path or organism) is divided into two sections at the middle of the chromosome. Each of these sections are then inverted and added to a new path. The order of each of these halves remains constant, meaning the first inverted half remains the first half in the mutated path. The necessary corrections are made to amend the mutated path into a viable path so solve the TSP.

P1: {A, C, J, D, G, H, E | B, F, I, A}  ⇒ M1: {A, E, H, G, D, J, C, I, F, B, A}

Genetic Algorithms: Crossover

The method of crossover remains fairly constant regardless of the problem and scope. Crossover is achieved by first selecting a crossover point within a pair of defined and unique organisms P1 and P2 (which are the equivalent of parents for the crossed over parent). The chromosomes are then split at the selected crossover point. The second half of P2 (P2H2)  is then appended to the first half of P`1  (P1H1) to make one child chromosome (C1). The second child (C2) is made by appending the second half of P1 (P1H2) to the first half of P2 (P2H1).Read More »

Genetic Algorithm Definitions for TSP

A genetic algorithm is a type of evolutionary algorithm and therefore TSP must be fit to fill all the constraints necessary to execute a genetic algorithm. An organism in the sense of TSP can be defined as a viable path that visits every node in the graph. Each path must start with a node, visit all the nodes present in the graph, and then return to the same node that it started with. An example of a viable path with an input graph of 10 vertices is shown below with each letter representing a node in the input graph:

{A, C, J, D, G, H, E, B, F, I, A}

The population in TSP can be defined as a set of unique paths. Fitness can be defined as the weight or distance of the path. Thus, a lower weight will result in higher fitness and vice versa. A sample population of two organisms is shown below. In front of each organism is its weight or for the cases of this exercise— its fitness:

set{ 143 : {A, C, J, D, G, H, E, B, F, I, A} , 210 : {A , B, J, D, C, E, I, F, H, G, A} }

Genetic Algorithms: Intro

In this exercise, we attempt to utilize genetic algorithms to find an optimal, but not perfect, solution to the traveling salesman problem. A genetic algorithm emulates nature in its optimization process. Nature uses several mechanisms which have led to the emergence of new species and still better adapted to their environments. The laws which react to species evolution have been known by the research of Charles Darwin in the last century: Genetic algorithms are powerful methods of optimization that utilize these rules defined by evolution in their process to find a pseudo-optimal answer. These algorithms were modeled on the evolution of species. The genetic algorithm utilizes the properties of genetics such as selection, crossover, mutation.

Introduction to the Traveling Salesman Problem

The Problem
The traveling salesman problem (TSP) is a typical example of a very hard combinatorial optimization problem. The problem is to find the shortest tour that passes through each vertex in a given graph exactly once. The TSP problem is classified as an NP-complete problem. There are some intuitive methods to find the approximate solutions, but all of these methods have exponential complexity, they take too much computing time or require too much memory. Mathematically TSP can be expressed as:
min [f(T),T = (T[1],T[2],T[3],… …,T[n])]
Read More »