@@ -85,25 +85,6 @@ <h1 class="title is-2 publication-title"> <i>FaceXFormer</i> : A Unified Transfo
85
85
86
86
< div class ="column has-text-centered ">
87
87
< div class ="publication-links ">
88
- <!-- PDF Link. -->
89
- <!-- <span class="link-block">
90
- <a href=""
91
- class="external-link button is-normal is-rounded is-dark" target="_blank">
92
- <span class="icon">
93
- <i class="fas fa-file-pdf"></i>
94
- </span>
95
- <span>Paper</span>
96
- </a>
97
- </span> -->
98
- <!-- <span class="link-block">
99
- <a href=""
100
- class="external-link button is-normal is-rounded is-dark">
101
- <span class="icon">
102
- <i class="fas fa-file-pdf"></i>
103
- </span>
104
- <span>Supplementary material</span>
105
- </a>
106
- </span> -->
107
88
< span class ="link-block ">
108
89
< a href ="https://arxiv.org/abs/2403.12960 "
109
90
class ="external-link button is-normal is-rounded is-dark ">
@@ -139,7 +120,7 @@ <h1 class="title is-2 publication-title"> <i>FaceXFormer</i> : A Unified Transfo
139
120
140
121
< h2 class ="title is-3 "> Motivation & Contribution</ h2 >
141
122
< img src ="./static/images/intro.png " alt ="" border ="0 " height ="600 " width ="1500 ">
142
- < img src ="./static/images/intro_table.png " alt ="" style ="border:0; height:500px ; width:1500px; ">
123
+ < img src ="./static/images/intro_table.png " alt ="" style ="border:0; height:1200px ; width:1500px; ">
143
124
< div class ="content has-text-justified ">
144
125
< p > Comparison with representative methods under different task settings.
145
126
< i > FaceXformer</ i >
@@ -158,7 +139,8 @@ <h2 class="title is-3">Motivation & Contribution</h2>
158
139
</ li >
159
140
< li > < i > FaceXformer</ i > is an end-to-end unified model capable of handling a comprehensive
160
141
range of facial analysis tasks such as face parsing, landmark detection, head pose
161
- estimation, attributes prediction, and estimation of age, gender, race, epxression and face
142
+ estimation, attributes prediction, and estimation of age, gender, race, epxression and
143
+ face
162
144
visibility.
163
145
</ li >
164
146
< li > It leverages a transformer-based encoder-decoder architecture where
@@ -174,9 +156,6 @@ <h2 class="title is-3">Motivation & Contribution</h2>
174
156
</ div >
175
157
</ div >
176
158
</ div >
177
- <!--/ Abstract. -->
178
-
179
- <!-- Paper video. -->
180
159
181
160
< section class ="section ">
182
161
< div class ="container is-max-desktop ">
@@ -187,19 +166,21 @@ <h2 class="title is-3"><i>FaceXformer</i> Framework</h2>
187
166
< div class ="content has-text-justified ">
188
167
< h5 class ="subtitle has-text-centered "> </ h5 >
189
168
190
- < img src ="./static/images/main_archi.png " alt ="" border =0 height =500 width =1500 > </ img > </
191
- < p >
192
- Overview of < i > FaceXformer</ i > framework. It employs an encoder-decoder
193
- architecture, extracting multi-scale features from the input face image < b > I</ b > , and
194
- fusing them into a unified representation < b > F</ b > via MLP-Fusion. Task tokens < b > T</ b >
195
- are processed alongside face representation < b > F</ b > in the decoder, resulting in
196
- refined
197
- task-specific tokens < b > < span style ="position: relative; display: inline-block; ">
198
- T
199
- < span
200
- style ="position: absolute; top: -7px; left: 0.6px; right: 0; font-size: smaller; "> ^</ span >
201
- </ span > </ b > . These refined tokens are then used for
202
- task-specific predictions by passing through the unified head.
169
+ < img src ="./static/images/main_archi.png " alt ="" border =0 height =500 width =1500 > </ img >
170
+ < p >
171
+ Overview of < i > FaceXformer</ i > framework. It employs an encoder-decoder
172
+ architecture, extracting multi-scale features from the input face image < b > I</ b > ,
173
+ and
174
+ fusing them into a unified representation < b > F</ b > via MLP-Fusion. Task tokens
175
+ < b > T</ b >
176
+ are processed alongside face representation < b > F</ b > in the decoder, resulting in
177
+ refined
178
+ task-specific tokens < b > < span style ="position: relative; display: inline-block; ">
179
+ T
180
+ < span
181
+ style ="position: absolute; top: -7px; left: 0.6px; right: 0; font-size: smaller; "> ^</ span >
182
+ </ span > </ b > . These refined tokens are then used for
183
+ task-specific predictions by passing through the unified head.
203
184
</ p >
204
185
205
186
@@ -217,88 +198,105 @@ <h5 class="subtitle has-text-centered"></h5>
217
198
< h2 class ="title is-3 "> Quantitative Results</ h2 >
218
199
< div class ="content has-text-justified ">
219
200
< h5 class ="subtitle has-text-centered "> </ h5 >
220
- < img src ="./static/images/parsing.png " alt ="" border =0 height =500
221
- width =1500 > </ img > </ div >
201
+ < img src ="./static/images/parsing.png " alt ="" border =0 height =500 width =1500 > </ img >
222
202
< p >
223
203
Comparison with specialized models and existing multi-task networks on Face Parsing.
224
204
</ p >
225
- < img src ="./static/images/hpe_lnd_attr " alt ="" border =0 height =500
226
- width =1500 > </ img > </ div >
227
- < p >
228
- Comparison with specialized models and existing multi-task networks on Headpose Estimation, Landmarks Detection and Attributes Prediction.
229
- </ p >
230
- < img src ="./static/images/results_others " alt ="" border =0 height =500
231
- width =1500 > </ img > </ div >
205
+ </ div >
206
+ </ div >
207
+ </ div >
208
+ < div class ="columns is-centered has-text-centered ">
209
+ < div class ="column is-four-fifths ">
210
+ < div class ="content has-text-justified ">
211
+ < h5 class ="subtitle has-text-centered "> </ h5 >
212
+ < img src ="./static/images/hpe_lnd_attr.png " alt ="" border =0 height =500 width =1500 > </ img >
232
213
< p >
233
- Comparison with specialized models and existing multi-task networks on Facial Expression Recognition, Face Visibilty, and Age Estimation.
214
+ Comparison with specialized models and existing multi-task networks on Headpose
215
+ Estimation,
216
+ Landmarks Detection and Attributes Prediction.
234
217
</ p >
235
218
</ div >
236
219
</ div >
237
220
</ div >
238
- </ section >
239
-
240
- < section class ="section ">
241
- < div class ="container is-max-desktop ">
242
- <!-- Abstract. -->
243
221
< div class ="columns is-centered has-text-centered ">
244
222
< div class ="column is-four-fifths ">
245
- < h2 class ="title is-3 "> Qualitative Results</ h2 >
246
223
< div class ="content has-text-justified ">
247
224
< h5 class ="subtitle has-text-centered "> </ h5 >
248
- < img src ="./static/images/qualitative.png " alt ="" border =0 height =500
249
- width =1500 > </ img > < p >
250
- Qualitative results of < i > FaceXFormer</ i >
225
+ < img src ="./static/images/results_others.png " alt ="" border =0 height =500
226
+ width =1500 > </ img >
227
+ < p >
228
+ Comparison with specialized models and existing multi-task networks on Facial
229
+ Expression
230
+ Recognition, Face Visibilty, and Age Estimation.
251
231
</ p >
252
232
</ div >
253
233
</ div >
254
234
</ div >
255
235
</ div >
256
- </ section >
236
+ </ div >
237
+ </ section >
238
+ < section class ="section ">
239
+ < div class ="container is-max-desktop ">
240
+ <!-- Abstract. -->
241
+ < div class ="columns is-centered has-text-centered ">
242
+ < div class ="column is-four-fifths ">
243
+ < h2 class ="title is-3 "> Qualitative Results</ h2 >
244
+ < div class ="content has-text-justified ">
245
+ < h5 class ="subtitle has-text-centered "> </ h5 >
246
+ < img src ="./static/images/qualitative.png " alt ="" border =0 height =500 width =1500 > </ img >
247
+ < p >
248
+ Qualitative results of < i > FaceXFormer</ i >
249
+ </ p >
250
+ </ div >
251
+ </ div >
252
+ </ div >
253
+ </ div >
254
+ </ section >
257
255
258
256
259
- < section class ="section " id ="BibTeX ">
260
- < div class ="container content is-max-desktop ">
261
- < h2 class ="title "> BibTeX</ h2 >
262
- < pre > < code > @article{narayan2024facexformer,
257
+ < section class ="section " id ="BibTeX ">
258
+ < div class ="container content is-max-desktop ">
259
+ < h2 class ="title "> BibTeX</ h2 >
260
+ < pre > < code > @article{narayan2024facexformer,
263
261
title={FaceXFormer: A Unified Transformer for Facial Analysis},
264
262
author={Narayan, Kartik and VS, Vibashan and Chellappa, Rama and Patel, Vishal M},
265
263
journal={arXiv preprint arXiv:2403.12960},
266
264
year={2024}
267
265
}
268
266
</ code > </ pre >
269
- </ div >
270
- </ section >
267
+ </ div >
268
+ </ section >
271
269
272
- < section class ="section ">
273
- < div class ="container is-max-desktop content ">
274
- < h5 class ="title " style ="font-size: 10px; "> Acknowledgement: The website template is taken from
275
- < span class ="author-block ">
276
- < a href ="https://nerfies.github.io/ " target ="_blank "> Nerfies</ a >
277
- </ h5 >
270
+ < section class ="section ">
271
+ < div class ="container is-max-desktop content ">
272
+ < h5 class ="title " style ="font-size: 10px; "> Acknowledgement: The website template is taken from
273
+ < span class ="author-block ">
274
+ < a href ="https://nerfies.github.io/ " target ="_blank "> Nerfies</ a >
275
+ </ h5 >
278
276
279
- </ div >
280
- </ section >
277
+ </ div >
278
+ </ section >
281
279
282
- < script >
283
- const viewers = document . querySelectorAll ( ".image-compare" ) ;
284
- viewers . forEach ( ( element ) => {
285
- let view = new ImageCompare ( element , {
286
- hoverStart : true ,
287
- addCircle : true
288
- } ) . mount ( ) ;
289
- } ) ;
290
-
291
- $ ( document ) . ready ( function ( ) {
292
- var editor = CodeMirror . fromTextArea ( document . getElementById ( "bibtex" ) , {
293
- lineNumbers : false ,
294
- lineWrapping : true ,
295
- readOnly : true
296
- } ) ;
297
- $ ( function ( ) {
298
- $ ( '[data-toggle="tooltip"]' ) . tooltip ( )
299
- } )
300
- } ) ;
301
- </ script >
280
+ < script >
281
+ const viewers = document . querySelectorAll ( ".image-compare" ) ;
282
+ viewers . forEach ( ( element ) => {
283
+ let view = new ImageCompare ( element , {
284
+ hoverStart : true ,
285
+ addCircle : true
286
+ } ) . mount ( ) ;
287
+ } ) ;
288
+
289
+ $ ( document ) . ready ( function ( ) {
290
+ var editor = CodeMirror . fromTextArea ( document . getElementById ( "bibtex" ) , {
291
+ lineNumbers : false ,
292
+ lineWrapping : true ,
293
+ readOnly : true
294
+ } ) ;
295
+ $ ( function ( ) {
296
+ $ ( '[data-toggle="tooltip"]' ) . tooltip ( )
297
+ } )
298
+ } ) ;
299
+ </ script >
302
300
</ body >
303
301
304
- </ html >
302
+ </ html >
0 commit comments