-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 17:38
In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (Max-Ent) to find the posterior distribution of the input data of each layer, conditioned on the layer output variables. This posterior has a well-defined mean, the conditional mean estimator, that is calculated using a type of neural network with theoretically-derived activation functions similar to sigmoid, softplus, and relu. This implicitly provides a theoretical justification for their use. A new theorem that finds the conditional distribution and conditional mean estimator under the MaxEnt prior is proposed, unifying results for special cases. Combining layers results in an auto-encoder with conventional feed-forward analysis network and a type of linear Bayesian belief network in the reconstruction path.