Plenary Session: PLEN-2: Sergios Theodoridis - "Deep Neural Networks: A Nonparametric Bayesian View with Local Competition"
Sergios Theodoridis, National and Kapodistrian University of Athens, Greece
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 01:06:45
In this talk, a fully probabilistic approach to the design and training of deep neural networks will be presented. The framework is that of the nonparametric Bayesian learning. Both fully connected as well as convolutional networks (CNNs) will be discussed. The structure of the networks is not a-priori chosen. Adopting nonparametric priors for infinite binary matrices, such as the Indian Buffet Process (IBP), the number of weights as well as the number of nodes or number of kernels (in CNN) are estimated via the resulting posterior distributions. The training evolves around variational Bayesian arguments. Besides the probabilistic arguments that are followed for the inference of the involved parameters, the nonlinearities used are neither squashing functions not rectified linear units (ReLU), which are typically used in the standard networks. Instead, inspired by neuroscientific findings, the nonlinearities comprise units of probabilistically competing linear neurons, in line with what is known as the local winner-take-all (LTWA) strategy. In each node, only one neuron fires to provide the output. Thus, neurons, in each node, are laterally (same layer) related and only one �??survives�??; yet, this takes place in a probabilistic context based on an underlying distribution that relates the neurons of the respective node. Such rationale mimics closer the way that the neurons in our brain co-operate. The experiments, over a number of standard data sets, verify that highly efficient (compressed) structures are obtained in terms of the number of nodes, weights and kernels as well as in terms of bit precision requirements at no sacrifice to performance, compared to previously published state of the art research. Besides efficient modelling, such networks turn out to exhibit much higher resilience to attacks by adversarial examples, as it is demonstrated by extensive experiments and substantiated by some theoretical arguments. The presentation mainly focuses on the concepts and the rationale behind the methodology and less on the mathematical details.