v339

randomboolean · randomboolean · commit 5cf4ada8cb3f · 2018-10-12T15:38:48.000+02:00
diff --git a/chapter0/ccl.tex b/chapter0/ccl.tex
@@ -1,9 +1,11 @@
-% In this manuscript, after presenting the domains related to our subject, we built a theory for convolutions on graphs, in view of using them in CNN on graph domains. On enclidean domains, convolutional layers take advantage of the translational equivariances of the convolution. Therefore, our construction on graph domains depends on a set of transformations of the vertex set for which the resulting operation is also equivariant. More precisely, we wanted to define a class of convolutional operators that are exactly the class of linear operators that are equivariant to this set of transformation. We demonstrated that this characterization holds should this set have an algebraic structure of group, groupoid or path groupoid. In particular, we proved that this amounts to search for Cayley subgraphs. We also saw that the possible abelianity of these structures is linked with the property that the convolution is supported locally. Then, we studied neural networks intended for graph domains. We adopted an approach based on graph representations of the propagation between layers of neurons. We proved that if the local receptive fields of the neurons are intertwined, then if their input have a graph structure, it can be used to define the propagation. We also discovered that the linear part of a layer can be expressed by an operator that involves three operands: the input signal, the weight kernel and the weight sharing scheme. We called it \emph{neural contraction}, in reference to the term \emph{tensor contraction}. We showed that it is associative, commutative, and generic in the sense that it can represent any kind of layer. We used this representation to see the influence of symmetries that are present in the structure of the data. We conducted experiments to learn how the weights are shared in addtion of learning the weights. In a sense, this amounts, in the case of convolutions, to try to learn its set of transformations to which it is equivariant. We saw that it attains similar performances than other state-of-the-art models. Then we made experiments for a CNN for which the convolution is based on a set of translations on graphs, which defines how the weights are shared. We proposed an alogrithm to find the translations. We defined downscaling and data augmentation from these translations, and used a residual network architecture. We showed that this model retrieves the performances of CNNs without feeding to it the underlying structure of images, and that it attains strong performances on graph signal datasets.
+In this manuscript, after a presentation of the fields of research, we developed a theory on convolution of graph signals, and proposed new models that extend deep learning to graph domains.
 
-% In conclusion, we proposed a novel layer representation for extending CNN architectures to other input domains than those for what they where intended. This, as pointed out in the introduction, participates in rending them more generic, and thus applicable to a broader range of real world problems. In the process, we also advanced our understanding of convolutions. We hope that the reader had pleasure reading this manuscript and that it gave him ideas and shed new lights. Let us thrive for a continuous effort to help advance collectively the boundaries of human knowledge at our scales and beyond !
+In \chapref{chap:2}, we formulated two constructions of convolutions of signals defined on a vertex set $V$, based on a group $\Gamma$ acting on $V$. The $\varphi$-convolution can be employed when $\Gamma$ and $V$ are in one-to-one correspondence via an equivariant map $\varphi$, while the $\M$-convolution is a more convenient formulation that can be employed when $\Gamma$ is abelian. We proved that the characterization by equivariance to $\Gamma$, inherited from group convolutions holds. Then we introduced two properties that bind these convolutions with the edge set $E$ : edge-constraint (EC) and locality-preservation (LP). In view of describing operators that are used in deep learning, we proposed formulations with kernels of smaller supports, and proved that the weight sharing property holds. We demonstrated that the existence of convolutions on a graph can be characterized by the existence of Cayley subgraphs. For some graphs, their Cayley subgraphs might not be well representative of their topology. Therefore, we suggested a few strategies and we extended the previous results with convolutions based on groupoids rather than on groups. We constructed two types of groupoids, from partial transformations and paths, and were able to extend the results but also with some limitations. With the first type of groupoid this almost amounted to partition the graph into Cayley subgraphs, whereas with the second one it included degenerated cases.
 
-In this manuscript, we proposed a novel layer representation for extending CNN architectures to other input domains than those for what they were intended. This representation is based on a ternary operation that we called \emph{neural contraction}, from which we derived new models: \emph{Monte-Carlo Neural Networks}, \emph{Graph Contraction Networks}, \emph{Translation-Convolutional Neural Networks} ; and a new technique: \emph{graph dropout}. We also showed how to represent related models from the literature with neural contractions. As pointed out in the introduction, this work participates in making CNNs more generic, and thus applicable to a broader range of real world problems. In the process, we also advanced our understanding of convolutions, providing a thorough description with a set of expressions, mathematical results and theorems about how to extend them on graph domains while preserving key properties, and how to characterize them.
+In \chapref{chap:3} we proposed a novel layer representation for extending CNN architectures to other input domains. This representation is based on a ternary operation that we called \emph{neural contraction}, from which we derived new models: \emph{Monte-Carlo Neural Networks} (MCNN), \emph{Graph Contraction Networks} (GCT), \emph{Translation-Convolutional Neural Networks} (TCNN); and a new technique: \emph{graph dropout}. We also showed how to represent related models from the literature with neural contractions. The MCNN is a first idea exploiting the neural contraction. Roughly speaking, it is based on randomizing the structure that is leveraged, then averaging. So we didn't expect much from it but it fared well on a text categorization task. The GCT is based on the idea of learning how the weights are shared while learning them. It set new state-of-the-art performances on the task of semi-supervised classification of nodes in citation networks, but outperforms only by a small margin that is not statistically significant. In particular, we also observed that using graph dropout also significantly improved the results of alternative models on this type of tasks. The TCNN relies on constructing a convolution based on graph translations. It set a new state-of-the-art on classification of scrambled images by a large margin, and performs well on a fMRI dataset which is structured by a graph resembling a grid graph.
 
+We tested these models in two types of tasks: supervised classification of graph signals and semi-supervised classification of nodes. The first task is historically the one that gave visibility to CNNs. However, we do not know of a dataset with a very unusual graph structure that is well fitted for the first task, or if it exists, it might not be the best graph to describe the underlying structure. This is why is practice, experiments for the supervised task are done with graphs resembling grid graphs to some extent. This is not the case for the semi-supervised task.
 
-\paragraph{\h{0}}
-We hope that the reader had pleasure reading this manuscript and that it gave him ideas and shed new lights. Let us thrive for a continuous effort to help advance collectively the boundaries of human knowledge at our scales and beyond !
+In the end, both task can be abstracted to a more general one. For example, let us consider a dataset represented by a matrix $X$, of shape $B \times N$, where $B$ is the number of instances and $N$ is the number of features. The linear part of the GCN layer formulation $Y = A X \Theta$ exploits both a graph on the rows (that we saw can be learned with the GCT) and a complete graph on the columns. Thus this layer amounts to two half layers, a sparse and a dense one. This idea can be generalized given a dataset represented by a tensor $X$ of rank~$r$. The formulation would then be $Y = g(X, A_1, A_2, \ldots, A_r)$ where $g$~realizes every tensor contractions between $X$ and each $A_i$ along the corresponding ranks. In that case, the $A_i$ can be seen as a learnable normalized adjacency matrix corresponding to a graph structure on the $i\powth$ rank. This idea can originate a class of neural networks that can be called \emph{multiary neural networks}.
+
+As pointed out in the introduction, this work participates in making CNNs more generic, and thus applicable to a broader range of real world problems. In the process, we also advanced our understanding of convolutions, providing a thorough description with a set of expressions, mathematical results and theorems about how to extend them on graph domains while preserving key properties, and how to characterize them. We hope that the reader had pleasure reading this manuscript and that it gave him ideas and shed new lights !%. Let us thrive for a continuous effort to help advance collectively the boundaries of human knowledge at our scales and beyond !
diff --git a/chapter0/intro.tex b/chapter0/intro.tex
@@ -16,6 +16,6 @@ \chapter*{Introduction}\label{chp:int}
 \end{flushright}
 \end{displayquote}
 
-This manuscript is broken down into three chapters. Each chapter is preceded by a short overview so that the reader can grasp its essential contents at a glance. In \chapref{chap:1}, we present our domains of interest with a selected literature review. Then, in \chapref{chap:2}, we theorize an algebraic understanding of convolutions of graph signals. Finally, in \chapref{chap:3}, we study neural networks intended for graph domains.
+This manuscript is broken down into three chapters. Each chapter is preceded by a short overview so that the reader can grasp its essential contents at a glance. In \chapref{chap:1}, we present our domains of interest with a selected literature review. Then, in \chapref{chap:2}, we theorize an algebraic understanding of convolutions of graph signals. Finally, in \chapref{chap:3}, we study neural networks intended for graph domains. We summarize our contributions in the conclusion and dissuss limitations and perspectives.
 
 
diff --git a/chapter2/edges.tex b/chapter2/edges.tex
@@ -150,6 +150,7 @@ \subsection{On properties of the corresponding operators}
 
 \begin{definition}\textbf{Strictly edge-constrained convolution operator}\\
 We say that an EC $\varphi$ or $\M$-convolution operator, of group $\Gamma$ and generating set $\cu$, is strictly edge-constrained (EC*) if its supporting set $\cm \subset \cu$ and $\cm$ is symmetric (\ie $g \in \cm \Leftrightarrow g^{-1} \in \cm$).
+\label{def:ecc}
 \end{definition}
 
 \begin{remark}EC* convolution operators are simpler to obtain as we can construct them just with the actions $\cu \to \Phi^*_{\EC}(G)$ without composing the transformations. Also, their complexity is $\co(kn)$, where $n = |V|$ is the order of the graph and $k = |\cm|$ is the size of the kernel. In comparison, EC convolutions have complexity up to $\co(n^2)$.
@@ -178,6 +179,7 @@ \subsection{On properties of the corresponding operators}
 
 \begin{proposition}\textbf{Weight sharing}\\
 Let a $\varphi$- or $\M$-convolution operator $f_w$ of supporting set $\cm \subset \Gamma$. Then the entries of $w$ are shared between each realization of $f_w$.
+\label{prop:ws}
 \end{proposition}
 \begin{proof}
 By reading \eqref{eq:secm}, this result is obvious for $\M$-convolution operators. So let us assume that $f_w$ is a $\varphi$-convolution operator. Given a realization vertex $a \in V$, the non-null entries of $w$ in expression \eqref{eq:sec} are characterized by vertices $\alpha \in \ck_a$. Given another realization vertex $b$, its non-nil entries match those of $a$ if, and only if,
@@ -204,6 +206,7 @@ \subsection{Locality-preserving convolutions}
 
 \begin{definition}\textbf{Strictly locality-preserving convolution operator}\\
 We say that a $\varphi$-convolution operator, of supporting set $\cm \subset \Gamma$, is strictly locality-preserving (LP*) if every subgraph  $\ck_u = g_u(\ck_e)$ are isomorphic.
+\label{def:lpp}
 \end{definition}
 
 \begin{remark}
@@ -239,6 +242,7 @@ \subsection{Locality-preserving convolutions}
 \item if a $\varphi$-convolution of group $\Gamma$ is EC and LP then $\Gamma$ is abelian,
 \item an $\M$-convolution is EC if, and only if, it is also LP.
 \end{enumerate}
+\label{cor:cayleycharECLP}
 \end{corollary}
 \begin{proof}
 As a consequence of \thref{th:cayleychar} and \thref{th:cayleycharLP}.
diff --git a/chapter2/groupoids.tex b/chapter2/groupoids.tex
@@ -228,7 +228,8 @@ \subsection{Construction of partial convolutions}
 Similarly to the construction in \secref{sec:edges}, partial convolutions can define EC, EC*, LP and LP* counterparts with the same characterizations by Cayley subgraphs whose vertex sets are zero-closure of groupoids, and other similar results.
 
 \paragraph{Limitation of partial convolutions}
-However, because of the groupoid associativity, if $g \in \Psi_{\EC}^*(G)$, then, any $v \in V$ \st $g(u) = v$ would be constrained to allow to be acted by every $h$ \st $(h,g) \in \cd$. That is, unless partial transformations that we want to allow to be in each other domains are restricted to the same connected component of the graph, the supporting set of a partial EC*~convolutions would still be bounded by the minimal degree.%complicated to explain..
+However, because of the groupoid associativity, if $g \in \Psi_{\EC}^*(G)$, then, any $v \in V$ \st $g(u) = v$ would be constrained to allow to be acted by every $h$ \st $(h,g) \in \cd$. %That is, unless partial transformations that we want to allow to be in each other domains are restricted to a same part of the graph, the supporting set of a partial EC*~convolutions would still be bounded by the minimal degree.
+That is, partial convolutions alleviates the limitations of convolutions based on groups by partitioning the graph into parts onto which it behaves like if it was based on local groups. This can be seen as a limitation since we might want the convolution to behave similarly on the whole graph.
 
 \subsection{Construction of path convolutions}
 \label{sec:path}
diff --git a/chapter2/groups.tex b/chapter2/groups.tex
@@ -467,7 +467,7 @@ \subsection{Mixed domain formulation}
 \begin{definition}\textbf{M-convolution}\\
 Let a group $\Gamma$ acting on $V$.
 The \emph{$\M$-convolution} is defined as a mixed domain convolution such that $\Gamma$~is an abelian subgroup of $\Phi^*(V)$.
-%We denote it $\ast_{\M}$.
+\label{def:convM}
 \end{definition}
 
 \begin{corollary}\textbf{Characterization of M-convolution left operators}\\
diff --git a/chapter3/recap3.tex b/chapter3/recap3.tex
@@ -1,4 +1,4 @@
 \section{Conclusion}
 \label{sec:3.5}
 
-In this chapter, we developed a new perspective to look at neural networks. This led us to discover that neural layers can be formulated with a ternary operation, that we called \emph{neural contraction}, between an input signal $X$, a weight kernel $\Theta$, and a weight sharing scheme $S$. We studied this representation and proposed efficient implementations. We saw it can represent any kind of layer. In particular we showed how related works from the literature can be represented with neural contractions. Also, we used it see the influence of symmetries, and concluded on their critical role in the success of convolutions. Then, we tested models on the task of graph signals classification and on the task of semi-supervised classification of nodes. To construct the scheme $S$, we tested ideas based on randomizations, from which we derived what we called \emph{Monte-Carlo Neural Networks} (MCNN), and a technique we called \emph{graph dropout}. We also tested to learn the scheme $S$, obtaining \emph{Graph Contraction Networks} (GCT). Finally, we tested to infer the scheme $S$ from a set of translations inferred from the domain, which we called \emph{Translation-Convolution Neural Networks} (TCNN). On image and fmri datasets, GCTs and TCNNs obtained performances that match those of CNNs. On scrambled image datasets, TCNNs obtained almost the same performance as in the non-scrambled case, outperforming alternatives by a large margin. On text documents, MCNNs beat the other graph convolution alternative. On citation networks, GCTs set new state-of-the-art results, but by small margins that are not statistically significant.
+In this chapter, we developed a new perspective to look at neural networks. This led us to discover that neural layers can be formulated with a ternary operation, that we called \emph{neural contraction}, between an input signal $X$, a weight kernel $\Theta$, and a weight sharing scheme $S$. We studied this representation and proposed efficient implementations. We saw it can represent any kind of layer. In particular we showed how related works from the literature can be represented with neural contractions. Also, we used it see the influence of symmetries, and concluded on their critical role in the success of convolutions. Then, we tested models on the task of graph signals classification and on the task of semi-supervised classification of nodes. To construct the scheme $S$, we tested ideas based on randomizations, from which we derived what we called \emph{Monte-Carlo Neural Networks} (MCNN), and a technique we called \emph{graph dropout}. We also tested to learn the scheme $S$, obtaining \emph{Graph Contraction Networks} (GCT). Finally, we tested to infer the scheme $S$ from a set of translations inferred from the domain, which we called \emph{Translation-Convolution Neural Networks} (TCNN). On image and fMRI datasets, GCTs and TCNNs obtained performances that match those of CNNs. On scrambled image datasets, TCNNs obtained almost the same performance as in the non-scrambled case, outperforming alternatives by a large margin. On text documents, MCNNs beat the other graph convolution alternative. On citation networks, GCTs set new state-of-the-art results, but by small margins that are not statistically significant.
diff --git a/date.tex b/date.tex
@@ -1 +1 @@
-Build version as of 2018-10-11 15:47:10
+Build version as of 2018-10-12 15:27:12
diff --git a/main.pdf b/main.pdf
diff --git a/main.tex b/main.tex
@@ -11,12 +11,18 @@
 
 \pagestyle{empty}
 \input{date}\newline
-\input{revcount}\newline \h{0}\newline
+\input{revcount}\newline
+\h{0}\newline
 Ph. D. candidate: Jean-Charles Vialatte\newline
 Advisor: Vincent Gripon\newline
 Supervisor: Gilles Coppin\newline
+Thesis examiners: Pierre Borgnat and Matthias Löwe\newline
+Jury members: Paulo Gonçalves and Juliette Mattioli\newline
+Invited member: Mathias Herberts\newline
+\h{0}\newline
 Defense date: 2018-12-13\newline
-Institution: IMT Atlantique\newline
+Institutions: IMT Atlantique and Université Bretagne Loire Mathstic\newline
+Specialty: Mathematics and its interactions\newline
 Funded by: City Zen Data, ANRT
 
 % \linenumbers
diff --git a/revcount.tex b/revcount.tex
@@ -1 +1 @@
-Revision number 338
+Revision number 339
diff --git a/thesis_v339.pdf b/thesis_v339.pdf

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-Build version as of 2018-10-11 15:47:10`
	`1`	`+Build version as of 2018-10-12 15:27:12`
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-Revision number 338`
	`1`	`+Revision number 339`