The Butterfly Effect

for the idea in all of us

Part 2: Sparse Learning in Hippocampal Simulations

In this series, we are observing the semantic errors of a hippocampal simulation of neurointerfaces, and the sampling grid approach used to model its unsupervised feature maps. This section will get into the linear algebra and calculus behind the sampling grids and how they relate to a variate error in the final system.

Parametrized Sampling Grid

A sampling grid, neuroanatomically a receptive network, will be parameterized to allow the mutation of various neurobiological parameters, such as dopamine, oxytocin, or adrenaline, and produce a synthetic, reactionary response in the neurointerface stack. A modulation of the initial sampling grid will be used to classify the transformations to their respective location in the comprehensive memory field. In order to perform the spatial transform of the normalized input feature map, a sampler must sample a set of parameters from { \tau }_{ \theta }({ G }_{ i }) where $G$ represents a static translational grid of the applied transforms. The input feature map U, the raw equivalent of the receptive fields, along with its primed resultant of the { f }_{ loc }(x) = V function will be accounted for as well in the translational grid. Each coordinate in G represented as { \left( { x }_{ j }^{ s },{ y }_{ j }^{ s } \right) }_{ j }, giving a gradient dimensionality $j$ to the spatial grid input. A gradient dimensionality allows the sparse network to have an infinite number of spatial perspectives as I will soon be posting about concentric bias simulation for mental illnesses.

Each coordinate in the { \tau }_{ \theta }({ G }_{ i }) represents a spatial location in the input where the sampling kernel can concentrically be applied to get a projected and subsequent value in V \. This, for stimuli transforms, can be written as:

{ V }_{ i }^{ c }(j)=\frac { \sum _{ n }^{ H }{ \sum _{ m }^{ W }{ { U }_{ nm }^{ c } } k\left( { x }_{ i }^{ s }-{ m };{ \Phi }_{ x } \right) k\left( { y }_{ i }^{ s }-n;{ \Phi }_{ x } \right) { :\quad \forall }_{ i }\in \left[ 1\dots { H }^{ ' }{ W }^{ ' } \right] } { :\quad \forall }_{ c }\in \left[ 1\dots C \right] }{ \left< { j }|{ { H }^{ ' } }|{ { W }^{ ' } } \right> }

Here, \Phi represents the parameterized potential of the sampling kernel of the spatial transformer which will be used to forward neuroanatomical equivalences through recall gradients.

The use of kernel sampling can be varied as long as all levels of gradients can be simplified to functions of { \left( { x }_{ j }^{ s },{ y }_{ j }^{ s } \right) }_{ j }. For the purposes of our experimentation, a bilinear sampling kernel will be used to co-parallely process inputs, allowing for a larger parametrization of learning transforms. To allow backpropagation of loss through this sampling mechanism, the gradient functions must be with respect to U and G. This observation was initially established as a means to allow sub-differentiable sampling in a similar bilinear sampling method:

\frac { \delta { V }_{ i }^{ c } }{ \delta { U }_{ nm }^{ c } } =\sum _{ n }^{ H }{ \sum _{ m }^{ W }{ \max _{ j }{ (0,1-\left| { x }_{ i }^{ s }-m \right| ) } \max _{ j }{ (0,1-\left| { y }_{ i }^{ s }-n \right| ) } } }

\frac { \delta { V }_{ i }^{ c } }{ \delta { x }_{ i }^{ s } } =\sum _{ n }^{ H }{ \sum _{ m }^{ W }{ { U }_{ nm }^{ c }\max _{ j }{ (0,1-\left| { y }_{ i }^{ s }-n \right| ) } \begin{cases} 0 & if\left| m-{ x }_{ i }^{ s } \right| \ge 1 \\ 1 & if\quad m\ge { x }_{ i }^{ s } \\ -1 & if\quad m<{ x }_{ i }^{ s } \end{cases} } }

Therefore, loss gradients can be attributed not only to the spatial transformers, but also to the input feature map, sampling grid, and, finally, back to the parameters, \Phi & \theta. The bilinear sampler has been slightly modified in this case to allow for concentric recall functions to be applied to its resultant fields. It is worth noting that due to this feature, the spatial networks representation of the learned behavior is unique in the rate and method of preservation, much like how each person is unique in his ability to learn and process information. The observable synthetic activation complexes can also be modeled through the monitoring of these parameters as they elastically adapt to the stimulus. The knowledge of how to transform is encoded in localization networks, which fundamentally are non-static as well.


Sparse Learning in Hippocampal Simulations

Sparse Learning Recall Networks

Recall-based functions are classically indicative of a mirror neuron system in which each approximation of the neural representation remains equally utilized, functioning as a load balancing mechanism. Commonly attributed to the preemptive execution of a planned task, the retention of memory in mirror neural systems tends to be modular in persistence and metaphysical in nature. Sparse neural systems interpret signals from cortical portions of the brain, allowing learned behaviors from multiple portions of the brain to execute simultaneously as observed in Fink’s studies on cerebral memory structures. It is theorized that the schematic representation of memory in these portions of the brain exists in memory fields only after a number of transformations have occurred in response to the incoming stimulus. Within these transformations lies the inherent differentiating factor in functional learning behavior: specifically, those which cause the flawed memory functions in the patients of such mental illnesses.

Semantic Learning Transformation

Now, similar to my fluid intelligence paper, we will need to semantically represent all types of ideas in a way that most directly allows for future transformations and biases to be included. For this, we will use a mutated version of the semantic lexical transformations.

The transformation of raw stimulus, in this case a verbal and unstructured story-like input, to a recall-able and normalized memory field will be simulated by a spatial transformer network. These mutations in raw input are the inherent reason for differentiated recall mechanisms between all humans. An altered version of the spatial transformer network, as developed in \cite{JaderbergSpatialNetworks} in Google’s Deepmind initiative, will be used to explicitly allow the spatial manipulation of data within the neural stack. Recall gradients mapped from our specialized network find their activation complexes similar to that of the prefrontal cortex in the brain,

An altered version of the spatial transformer network, as developed in Google’s Deepmind initiative, will be used to explicitly allow the spatial manipulation of data within the neural stack. Recall gradients mapped from our specialized network find their activation complexes similar to that of the prefrontal cortex in the brain, tasked with directing and encoding raw stimulus.

The Spatial Transformer Network (Unsupervised)

Originally designed for pixel transformations inside a neural network, the sampling grid or the input feature map will be parameterized to fit the translational needs of comprehension. The formulation of such a network will incorporate an elastic set of spatial transformers, each with a localisation network and a grid generator. Together, these will function as the receptive fields interfacing with the hypercolumns.

Now these transformer networks allowed us to parameterize any type of raw stimulus to be parsed and propagated through a more abstracted and generalized network capable of modulating fluid outputs.

The localisation network will take a mutated input feature map of U\in { \textbf{R}}^{ { H }_{ i }\times { W }_{ i }\times { C }_{ i } }, with width W, height H, channels C and outputs {\theta }_{i }. $i$ represents a differentiated gradient-dimensional resultant prioritized for storage in the stack. This net feature map allows the convolution of learned transformations to a neural stack in a compartmentalized system. A key characteristic of this modular transformation, as noted in Jaderberg’s spatial networks, is that the parameters of the transformations in the input feature map, as the size of \theta, can vary depending on the transformation type. This allows the sparse network to easily retain the elasticity needed to react to any type of stimulus, giving opportunity for compartmentalized learning space. The net dimensionality of the transformation { \tau }_{ \theta } on the feature map can be represented: \theta ={ f }_{ loc }\left( x \right) . In any case, the { f }_{ loc }\left( \right) can take any form, especially that of a learning network. For example, for a simple laplace transform, $\theta$ will assume a 6-dimensional position, and { f }_{ loc }\left( \right) will take the form of a convolutional network or a fully connected network (\cite{AndrewsIntegratingRepresentations}). The form of { f }_{ loc }\left( \right) is unbounded and nonrestrictive in domain, allowing all forms of memory persistence to coexist in the spatial stack.




TEDx Talk

If you didn’t get a chance to see my TED talk live, the video has just been produced and uploaded onto the TEDx channel on Youtube (below).

The talk is about some of my work in artificial intelligence: specifically the results we’ve observed in our research in synthetic neurointerfaces. Our goal was to functionally and synthetically model the human neocortical columns in an artificial intelligence to give a more differentiable insight into the cognitive behaviors we, as humans, exhibit on a daily basis.

If you would like to know more, I have published the working paper here.

Please let me know what you all think in the comments section below or on Youtube, I would love all the feedback I can get!

Synthetic Neuruointerfaces: Abstract

Earlier this week, I published my working paper on simulating synthetic neurointerfaces. It’s been quite a journey getting here, and I apologize for the delay in posting about the posting of my paper. I’m going to submit the paper to the 2017 International Conference for Learning Representations (ICLR). What I have posted is a working paper, meaning that there will be more drafts and revisions to come before January. If you have any questions please feel free to contact me. I would also like to give a disclaimer that my work purely comes from a mathematical, and a computer science background. This is a draft, and there are field experts that helped me with the computational neuroscience portion of this project. In the end, my goal was to make the brain itself, a formal system: and I have treated the brain as such throughout.

I’m very excited about this project not only because of its potential but because of what it’s already showed us. We are now able to get some basic neural representations of simple cognitive functions and modulate the functional anatomy of a synthetic neocortical column with ease, a step that we couldn’t achieve otherwise.

In this study, we explore the potential of an unbounded, self-organizing spatial network to simulate translational awareness lent by the brain’s neocortical hypercolumns as a means to better understand the nature of awareness and memory. We modularly examine the prefrontal cortical function, amygdalar responses, and cortical activation complexes to model a synthetic recall system capable of functioning as a compartmentalized and virtual equivalent of the human memory functions. The produced neurointerfaces are able to consistently reproduce the reductive learning quotients of humans in various learning complexities and increase generalizing potentials across all learned behaviors. The cognitive system is validated by examining its persistence under the induction of various mental illnesses and mapping the synthetic changes to their equivalent neuroanatomical mutations. The resultant set of neurointerfaces is a form of artificial general intelligence that produces wave forms empirically similar to that of a patient’s brain. The interfaces also allow us to pinpoint, geometrically and neuroanatomically, the source of any functional behavior.

The rest of the paper can be found here:



Coming Soon: Synthetic Neurointerfaces

I’m getting ready to release my work in persisting synthetic neurointerfaces in unbounded spatial networks. I truly believe that the use of computational tools such as this can be used to study the structure of intelligent computation in high-dimensional neural systems. What I tried to emulate in this project was a neuron by neuron representation of some basic cognitive functions by persisting a memory field in which self organizing neocortical hypercolumns could be functionally represented. The project was inspired by biological neural dynamical systems and foundationally rooted in some of the brilliant work Google’s Deep Mind project has been doing.  Before I publish any results, I would like to give a special thanks to my mentor and long time friend, Dr. Celia Rhodes Davis. Also, I would like to especially thank the Stanford Department of Computational Neuroscience  (Center for Brain, Mind & Computation) for functioning as an advisory board throughout my independent research and functioning as a sound logic board for general guidance.

Below is a problem definition, goals, and a small sneak peek regarding the immediate potential, and execution of my project:


The interface between the neuroanatomical activation of neocortical hypercolumns and their expressive function is a realm largely unobserved, due to the inability to efficiently and ethically study causational relationships between previously exclusively observed phenomenon. The field of general neuroscience explores the anatomical significance of cortical portions of the brain, extending anatomy as a means to explain the persistence of various nervous and physically expressive systems. Psychological approaches focus purely on \textit{expressive} behaviors as means to extend, with greater fidelity, the existence and constancy of the brain-mind interface. The interface between the anatomical realms of the mind and their expressive behaviors is a field widely unexplored, with surgeries such as the lobotomy and other controversial, experimental, and life-threatening procedures at the forefront of such study. However, the understanding of these neurological interfaces has potential to function as a window into the neural circuitry of mental illnesses, opening the door for cures and an ultimately more complete understanding of our brain.


We propose a method to simulate unbounded memory fields upon which recall functions can be parameterized. This model will be able to simulate cortical functions of the amygdala in its reaction to various, unfiltered stimuli. An observer network will be parallely created to analyze geometric anomalies in the neuroanatomical interface in memory recall functions, and extend equivalences between recall function parameters and memory recall gradients. This enables it to extend hypothesis to neuroanatomical functions.

Semantic Lexical Representations

In order to understand the evolutionary imperative of a fluid intelligent cognitive system, it is necessary to examine the function of artificial neural networks (ANN) as they stand today. Broadly defined, artificial neural networks are models of sets of neurons used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown.

This approach thus far has resulted in a standard design of ANNs being persisted in a two dimensional model, and this fundamental structure is used for all variants of the neural network family including deep learning and convolutional learning models.

This approach is fundamentally restrictive in the sense that all learned attributes lie on the same plane— meaning all regressive learned attributes, when compared mathematically, persist as functions of a singular dimensionality.  The function of this system is therefore limited to a single type of learned regression with strong biases against learning new regressions.

The capacity for fluid intelligent intuition in humans allows us to compartmentalize these discrete learned attributes and fluidly find relations between them. This capacity is especially critical in finding unsupervised intelligence from polymorphic unstructured data. Simply put, if we, as humans, would learn with the same characteristics of an existing ANN model, then it would have resulted in an intrinsically stovepipe way of learning. However, humans have a much more sophisticated fluid intelligent capacity. This project is an attempt at creating a fundamentally new way of designing cognitive systems: one that attempts to mimic and enhance human learning patterns.

Idea Disparity

The process of node generation from unstructured data requires a foundation to find statistical distributions of words of a set A consisting of each of the documents aggregated. The dynamic set A will be a finite and elastic set of documents that will serve the purpose of representing the first layer of temporal memory without any sub-categorizations.
Using a hybrid version of the collapsed Gibbs sampler, we are able to integrate out a set of variables into which we can assign distributions of words. Hierarchical Bayesian models yield multi modal distributions of words.

Screen Shot 2016-08-09 at 2.26.00 PM.png

This bag-of-words approach allows us to view the words of each subset distribution as statistical members of a larger set rather than lexical members of a semantic set. The equivalence is set up as x~y between a permutation of possible node types. We begin with tokenizing the documents within A as inputs to our Bayesian Gibbs sampler. As an initial dimension to work off of, the derived distributions function similarly to those generated by the Latent Dirichlet allocation methods (LDA). We use the LDA model used in the  to find topic distributions in social media data. In essence, this approach is a hybrid of the LDA classifier method. Instead of topic distributions, we are able to find probabilities of each word given each node type. The sampler is able to find the following conditional probabilities using the bag-of-words approach in which each word is viewed as a statistical element within a vocabulary rather than a coherent part of a larger context.

In the figure above, we demonstrate the hybrid Latent Dirichlet allocation classifier as it find the probability of a statistical element within a subset, Z, of the populations set of documents, A.Screen Shot 2016-08-09 at 2.26.48 PM.png

Each significant subset, Z, of our document collection, A, now becomes a contender for becoming a node within our graph.

Unsupervised Multinetwork

The topic distributions of the current snapshot of nodes (of intermixed types) are then forwarded to an unsupervised neural network with a range of 10-20 hidden layers. A flexible preconditioned version of the conjugate gradient back-propagation method is used:

Screen Shot 2016-08-09 at 2.28.46 PM.png
Alpha is the next optimal location vector relative to its position in the gradient of the linearized distribution sets, where the trained value would be a set of vectors of magnitude determining the distance of each distribution from the others from the subset. The hybrid gradient descent algorithm helps minimize the cross-entropy values during its classification. A separate and adequate network is trained and maintained for each subset of the original document set.
The distributions with the greatest distance are then passed to another unique clustering algorithm based around minimizing the Davies–Bouldin index between cluster components but still maintaining the statistical significance between cluster distributions derived in the LDA phase.

Screen Shot 2016-08-09 at 2.29.14 PM

Where n is the number of clusters, c is the centroid of cluster x, sigma X is the average distance of all elements in cluster x to centroid c, and is the distance between centroids.

Fluid Intelligence: Introduction


Fluid intelligence: the capacity to think logically and solve problems in novel situations, independent of acquired knowledge

Psychology has found the basis of fluid intelligence in the juxtaposition of layered memory and application as means to essentially “connect two fluid ideas with an an abstractly analogous property”. Such a mathematical design would have to be able to therefore derive temporal relationships with weighted bonds between two coherently disparate concepts through the means of similar properties. These properties within node types will have to be self-defined and self-propagated within idea types.


In a pursuit towards a truly dynamic artificial intelligence, it is necessary to establish a recurrent method to decipher the presence of concrete yet abstract entities (“ideas”) independent of a related and coherent topic set.
A considerable amount of work venturing into this field has culminated in the prevalence of statistical methods to extract probabilistic models dependent on large amounts of unstructured data. These Bayesian data analytic techniques often result in an understanding superficial in the context of a true relational understanding. Furthermore, this “bag-of-words” approach when looking at amounts of unstructured data (quantifiable by correct relationships derived between the idea nodes) often relate to a single dimensional understanding of the topics at hand. Traditionally, when these topics are transformed, it is difficult to extract hierarchy and queryable relations using matrix transformations from a derived data set.

The project that I will be describing in the subsequent posts is an effort to change the approach from which dynamic fluid intelligence is derived, finding a backbone in streaming big data. Ideally, this model would be able to take a layered, multi-dimensional approach to autonomous identification of properties of dynamically changing ideas from portions of said data set. It would also be able to find types of relationships, ultimately deriving a set of previously undefined relational schemas through unsupervised machine learning techniques that would ultimately allow for a queryable graph with properties and nodes initially undefined.

Big Data Coorelation: Hadoop Stack


The Apache Hive project gives a Hadoop developer a view of the data in the Hadoop Distributed File System. This is basically a file manager for Hadoop. Using a SQL-like language, Hive lets you create summarizations of your data, perform ad-hoc queries, and analysis of large datasets in the Hadoop cluster. The overall approach with Hive is to project a table structure on the dataset and then manipulate it with HiveQL. The table structure effectively projects a structured data set onto unstructured data. If we are using data in HDFS (which we are) our operations can be scaled across all the data nodes and we can manipulate huge datasets.



The function of Apache HCatalog is to hold location and metadata8 about the data in a Hadoop single node system or cluster. This allows scripts and MapReduce jobs to be separated from each other into data location and metadata. Basically this project is what catalogs and sets pointers to other data bits in different nodes. In our “Hello World” analogy, HCatalog would tell where and which node “Hello” is and where and which node “World” is. Since HCatalog can be used with other Hadoop technologies like Pig and Hive, HCatalog can also help those tools in cataloging and indexing their data. For our purposes we can now reference data by name and we can share or inherit the location and metadata between nodes and Hadoop sub-units.HCATALOG


Apache Pig is a high-level scripting language. This language though, expresses data analysis and infrastructure processes. When a Pig set is executed, it is translated into a series of MapReduce jobs which are later sent to the Hadoop infrastructure (single node or cluster) though the MapReduce program. Pig’s user defined functions can be written in Java. This is the final layer of the cake on top of MapReduce to give the developer more control and a higher level of precision to create the MapReduce jobs which later translate into data processing in a Hadoop cluster.


Apache Ambari is a an operational framework for provisioning and managing Hadoop clusters of multiple nodes or single nodes. Ambari is an effort of cleaning up the messy scripts and views of Hadoop to give a clean look for management and incubating.


Yarn is basically the new version of MapReduce in Hadoop 2.0. It is the Hadoop operating system that is overlaid on top of the system’s base operating system (CentOS13). YARN provides a global Resource Manager and a per-application manager in its newest iteration. The new idea behind this newer version of MapReduce is to split up the functions of JobTracker into two separate parts. This results in a tighter control of the system and ultimately results in more efficiency and ease of use. The illustration shows that an application run natively in Hadoop can utilize YARN as a cluster resource management tool along with its MapReduce 2.0 features as a bridge to the HDFS.



Apache Oozie is effectively just a calendar for running Hadoop processes. For Hadoop, it is a system to manage a workflow through the Oozie Coordinator to trigger workflow jobs from MapReduce or YARN. Oozie is also a scalable system along with Hadoop and its other sub-products. Its workflow scheduler system runs in the base operating system (YARN) and takes commands from user programs.


Continue reading “Big Data Coorelation: Hadoop Stack”

Big Data Coorelation: Hadoop

Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System is the foundation for any Hadoop Cluster and/or single-node implementations. The HDFS is the underlying difference between a normal MySQL6 database and a Hadoop implementation. This small change in approaching the data makes all the difference.

A standard MySQL server serves the purpose for any small endeavors and can support an infrastructure about the size of Apple’s database with no problems. The method for processing data usually follows a linear though pattern.Take an example of a phrase “Hello world”. In a very rough representation a MySQL server would save the entire phrase on one hard disk. Then, when the data would be needed the CPU would send a request for the data, the hard disk would spin, and the data would be read/processed.


This traditional approach to managing a database hits a few, key problems with no rational and affordable solution. The largest problem that is faced in this system is a mechanical one. At a certain point of complexity and size, a single hard disk can no longer physically spin fast enough to keep up with the seek capabilities of a single CPU. This problem can lead two solutions: make a better hard disk or rethink the way data is processed in the world today. Hadoop offers a solution to rethink the way this problem is dealt with in a radical new way. A Hadoop cluster implements a parallel computing cluster using inexpensive and standard pieces of hardware. The cluster is distributed among many servers running in parallel. The philosophy behind Hadoop is basically to bring the computing to the data. To successfully implement this, the system has to distribute pieces of the same block of data among multiple servers. So basically each data node holds part of the overall data and can process the little data that it holds. This pyramid scheme is visible when the system is scaled up to an infrastructure of Google’s size. The system no longer has the physical barrier of the spinning disks but rather a problem of just storage capacity (which is a very solvable and good problem to have).


Continue reading “Big Data Coorelation: Hadoop”

Create a website or blog at

Up ↑