chapter3/intro3.tex

\section*{Chapter overview}
\addcontentsline{toc}{section}{Chapter overview}

Our goal in this chapter is to understand how neural networks can be extended to other domains than what they were intended for. To this end, in \secref{sec:rep}, we make an effort to interpret the linear algebra that underpins a layer to build our intuition. First we state the obvious by explaining further the interpretation of tensor space as a neural space, as well as how to juggle between tensors and signals. Then, we propose a representation based on graphs. Between two layers, a propagation graph explains how the propagation is done. On the input layer, the neurons can have an underlying graph structure. We show a relation between these two graphs, obtained if and only if the local receptive fields of the neurons are intertwined. By introducing the notion of weight sharing in our analysis, we discover that a layer on any domain can be expressed by a linear ternary operation, that we call \emph{neural contraction}. Its operands are the \emph{input signal} $X$, the \emph{weight kernel} $\Theta$, and the \emph{weight sharing scheme} $S$. We denote $\wideparen{\Theta S X}$. We study it in \secref{sec:ternary}. We see that it is generic in comparison with related works, and propose an efficient implementation. With an experiment based on it, we see how exploiting symmetries is beneficial, which justifies the use of convolutions. Through other experiments, we explore ideas based on randomizations to apply it on general graphs. In \secref{sec:learningscheme}, we study the effect of learning how the weights are shared, which amounts to learn both $S$ and $\Theta$. We explore this avenue for graph domains, with experiments on grids, covariance graphs and on citation networks. Finally, in \secref{sec:trans}, we investigate an example of a CNN architecture used for graph signals. The convolution is based on translations on graphs which define the weight sharing scheme $S$ of the convolutional layer. We present the model of translations and the approximation we use, the subsampling layer, the data augmentation, and experiments on grid graphs or graphs resembling to grids.