dutran.github.io/pubs.html at main · dutran/dutran.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
<!DOCTYPE html>
<html lang="en" dir="ltr">
  <head>
    <meta charset="utf-8">
    <title>Du Tran | Publications</title>
    <link href="css/style.css" rel="stylesheet" type="text/css">
    <link href="favicon.ico" rel="shortcut icon" type="img/png">
    <style>
    </style>
  </head>
  <header>
    <nav>
      <ul>
        <li><a href="index.html">Home</a></li>
        <li class="selected">Publications</li>
      </ul>
    </nav>
  </header>
  <body>
  <h1 class="title">Publications</h1>
  <p><b>2025</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/seal.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top">
        SEAL: Semantic Attention Learning for Long Video Representation<br>
      <span class="author">Lan Wang</span>, <span class="author">Yujia Chen</span>, <span class="author">Du Tran</span>, <span class="author">Vishnu Boddeti</span>, <span class="author">Wen-Sheng Chu</span><br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2025. <br>
      (<b><span style="color:#a61111">oral</span></b>: <span style="color:#a61111">acceptance rate 0.7% [96 / 13008], top 3.3% of accepted papers [96 / 2878]</span>). <br>
      [<a href="https://arxiv.org/pdf/2412.01798">paper</a>] [code] [<a href="https://seal-lvu.github.io/">project</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2024</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/UDOS.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top">
        Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision<br>
      <span class="author">Tarun Kalluri</span>, <span class="author">Weiyao Wang</span>, <span class="author">Heng Wang</span>, <span class="author">Manmohan Chandraker</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Du Tran</span><br>
      <i>CVPR 2024 L3D-IVU Workshop</i>, 2024. <br>
      [<a href="https://arxiv.org/pdf/2303.05503.pdf">paper</a>] [<a href="https://tarun005.github.io/UDOS/">project</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2023</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/ANT.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top">
        Learning Space-Time Semantic Correspondences<br>
      <span class="author">Du Tran</span>, <span class="author">Jitendra Malik</span><br>
      <i>arXiv</i>, 2023. <br>
      [<a href="https://arxiv.org/pdf/2306.10208.pdf">paper</a>] [datesets] [project]<br>
      </td>
    </tr>
  <tr>
    <td width="25%" valign="center" align="left"><a href=""><img src="images/MINOTAUR.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
    <td valign="top">
      MINOTAUR: Multi-task Video Grounding From Multimodal Queries<br>
    <span class="author">Raghav Goyal</span>, <span class="author">Effrosyni Mavroudi</span>, <span class="author">Xitong Yang</span>, <span class="author">Sainbayar Sukhbaatar</span>, <span class="author">Leonid Sigal</span>, <span class="author">Matt Feiszli</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Du Tran</span><br>
    <i>arXiv</i>, 2023. <br>
    [<a href="https://arxiv.org/pdf/2302.08063.pdf">paper</a>] [project]<br>
    </td>
  </tr>
  <tr>
    <td width="25%" valign="center" align="left"><a href=""><img src="images/ReST.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
    <td valign="top">
      Relational Space-Time Query in Long-Form Videos<br>
    <span class="author">Xitong Yang</span>, <span class="author">Fu-Jen Chu</span>, <span class="author">Matt Feiszli</span>, <span class="author">Raghav Goyal</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Du Tran</span><br>
    <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2023. <br>
    (<b><span style="color:#a61111">highlight</span></b>: <span style="color:#a61111">acceptance rate 2.5%</span>). <br>
    [<a href="https://openaccess.thecvf.com/content/CVPR2023/papers/Yang_Relational_Space-Time_Query_in_Long-Form_Videos_CVPR_2023_paper.pdf">paper</a>] [<a href="https://drive.google.com/drive/folders/1MgiNyGANDukGpKe2KOxy5CBYd1bLvpP4?usp=share">datesets</a>] [project]<br>
    </td>
  </tr>
  <tr>
    <td width="25%" valign="center" align="left"><a href="https://tarun005.github.io/FLAVR/"><img src="images/flavr.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
    <td valign="top">
    FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation<br>
    <span class="author">Tarun Kalluri</span>, <span class="author">Deepak Pathak</span>, <span class="author">Manmohan Chandraker</span>, <span class="author">Du Tran</span><br>
    <i>IEEE Winter Conference on Applications of Computer Vision (WACV)</i>, 2023. <br>
    (<b><span style="color:#a61111">Best Paper Finalist</span></b>: <span style="color:#a61111">12 out of 641 accepted papers</span>). <br>
    [<a href="https://arxiv.org/pdf/2012.08512.pdf">paper</a>] [<a href="https://youtu.be/TcQd0LCLCzo">demo</a>] [<a href="https://tarun005.github.io/FLAVR/">project</a>]<br>
    </td>
  </tr>
</table>
  <p><b>2022</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href="https://sites.google.com/view/generic-grouping/"><img src="images/PA.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top">
        Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity<br>
      <span class="author">Weiyao Wang</span>, <span class="author">Matt Feiszli</span>, <span class="author">Heng Wang</span>, <span class="author">Jitendra Malik</span>, <span class="author">Du Tran</span><br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2022. <br>
      [<a href="https://arxiv.org/pdf/2204.06107.pdf">paper</a>]
      [<a href="https://sites.google.com/view/generic-grouping/">project</a>]
      [<a href="https://github.com/facebookresearch/Generic-Grouping">code</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/LSTCL.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top">
      	Long-short Temporal Contrastive Learning of Video Transformers<br>
      <span class="author">Jue Wang</span>, <span class="author">Gedas Bertasius</span>, <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span><br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2022. <br>
      [<a href="https://arxiv.org/pdf/2106.09212.pdf">paper</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2021</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href="https://sites.google.com/view/unidentified-video-object"><img src="images/uvo.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top">
        Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation<br>
      <span class="author">Weiyao Wang</span>, <span class="author">Matt Feiszli</span>, <span class="author">Heng Wang</span>, <span class="author">Du Tran</span><br>
      <i>International Conference on Computer Vision (ICCV)</i>, 2021. <br>
      [<a href="https://arxiv.org/pdf/2104.04691.pdf">paper</a>] [<a href="https://youtu.be/7NFfEYZEyyY">video</a>] [<a href="https://sites.google.com/view/unidentified-video-object">project</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2020</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href="http://humamalwassel.com/publication/xdc/"><img src="images/xdc.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Self-Supervised Learning by Cross-Modal Audio-Video Clustering.<br>
      <span class="author">Humam Alwassel</span>, <span class="author">Dhruv Mahajan</span>, <span class="author">Bruno Korbar</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Bernard Ghanem</span>, <span class="author">Du Tran</span>.<br>
      <i>Neural Information Processing Systems (NeurIPS)</i>, 2020. <br>
      (<b><span style="color:#a61111">spotlight</span></b>: <span style="color:#a61111">acceptance rate 4.1%</span>). <br>
      [<a href="https://arxiv.org/pdf/1911.12667.pdf">paper</a>] [<a href="https://github.com/HumamAlwassel/XDC">models</a>] [<a href="http://humamalwassel.com/publication/xdc/">project</a>]<br>
      </td>
    </tr>
    <tr>
    <td width="25%" valign="center" align="left"><a href=""><img src="images/gblend.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
        What Makes Training Multi-modal Classification Networks Hard?.<br>
        <span class="author">Weiyao Wang</span>, <span class="author">Du Tran</span>, <span class="author">Matt Feiszli</span>.<br>
        <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2020. <br>
        [<a href="https://arxiv.org/pdf/1905.12681.pdf">paper</a>]
        [<a href="https://github.com/facebookresearch/VMZ">code</a>]<br>
      </td>
    </tr>
    <td width="25%" valign="center" align="left"><a href=""><img src="images/corr.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
        Video Modeling with Correlation Networks.<br>
        <span class="author">Heng Wang</span>, <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Matt Feiszli</span>.<br>
        <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2020. <br>
        [<a href="https://arxiv.org/pdf/1906.03349.pdf">paper</a>]
        [code]<br>
      </td>
    </tr>
    <tr>
    <td width="25%" valign="center" align="left"><a href=""><img src="images/faster.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
        FASTER Recurrent Networks for Efficient Video Classification.<br>
        <span class="author">Linchao Zhu</span>, <span class="author">Laura Sevilla-Lara</span>, <span class="author">Du Tran</span>, <span class="author">Matt Feiszli</span>, <span class="author">Heng Wang</span>.<br>
        <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, 2020. <br>
        [<a href="https://arxiv.org/pdf/1906.04226.pdf">paper</a>]
        [code]<br>
      </td>
    </tr>
  </table>
  <p><b>2019</b></p>
  <table>
    <tr>
      <td width="26%" valign="center" align="left"><a href=""><img src="images/csn.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
        Video Classification with Channel-Separated Convolutional Networks.<br>
        <span class="author">Du Tran</span>, <span class="author">Heng Wang</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Matt Feiszli</span>.<br>
        <i>International Conference on Computer Vision (ICCV)</i>, 2019. <br>
        [<a href="https://arxiv.org/pdf/1904.02811.pdf">paper</a>]
        [<a href="https://github.com/facebookresearch/VMZ">code</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/scsampler.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition.<br>
      <span class="author">Bruno Korbar</span>, <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span>.<br>
      <i>International Conference on Computer Vision (ICCV)</i>, 2019. <br>
      (<b><span style="color:#a61111">oral</span></b>: <span style="color:#a61111">acceptance rate 4.3%</span>).<br>
      [<a href="https://arxiv.org/pdf/1904.04289.pdf">paper</a>]
      [code]<br>
      </td>
    </tr>
    <tr>
    <td width="25%" valign="center" align="left"><a href=""><img src="images/distinit.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
    <td valign="top" align="left">
      DistInit: Learning Video Representations without a Single Labeled Video.<br>
      <span class="author">Rohit Girdhar</span>, <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Deva Ramanan</span>.<br>
      <i>International Conference on Computer Vision (ICCV)</i>, 2019. <br>
      [<a href="https://arxiv.org/pdf/1901.09244.pdf">paper</a>]
      [code]<br>
    </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/posewarper.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Learning Temporal Pose Estimation from Sparsely-Labeled Videos.<br>
      <span class="author">Gedas Bertasius</span>, <span class="author">Christoph Feichtenhofer</span>, <span class="author">Du Tran</span>, <span class="author">Jianbo Shi</span>, <span class="author">Lorenzo Torresani</span>.<br>
      <i>Neural Information Processing Systems (NeurIPS)</i>, 2019. <br>
      [<a href="https://arxiv.org/pdf/1906.04016.pdf">paper</a>]
      [<a href="https://github.com/facebookresearch/PoseWarper">code</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/actionanticipation.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Leveraging the Present to Anticipate the Future in Videos.<br>
      <span class="author">Antoine Miech</span>, <span class="author">Ivan Laptev</span>, <span class="author">Josef Sivic</span>, <span class="author">Heng Wang</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Du Tran</span>.<br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR) Precognition Workshop</i>, 2019. <br>
      (<span style="color:#a61111">2nd place at CVPR'19 EPIC-KITCHEN Challenge</span>). <br>
      [<a href="https://research.fb.com/wp-content/uploads/2019/05/Leveraging-the-Present-to-Anticipate-the-Future-in-Videos.pdf">paper</a>]
      [code]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/igkinetics.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Large-scale Weakly-Supervised Pre-training for Video Action Recognition.<br>
      <span class="author">Deepti Ghadiyaram</span>, <span class="author">Matt Feiszli</span>, <span class="author">Du Tran</span>, <span class="author">Xueting Yan</span>, <span class="author">Heng Wang</span>, <span class="author">Dhruv Mahajan</span>.<br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2019. <br>
      (<span style="color:#a61111">2nd place at CVPR'19 EPIC-KITCHEN Challenge</span>). <br>
      [<a href="https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf">paper</a>]
      [<a href="https://github.com/facebookresearch/VMZ">code</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2018</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/avts.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization.<br>
      <span class="author">Bruno Korbar</span>, <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span>.<br>
      <i>Neural Information Processing Systems (NeurIPS)</i>, 2018. <br>
      [<a href="https://arxiv.org/pdf/1807.00230.pdf">paper</a>] [code]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/soa.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Scenes-Objects-Actions: A Multi-Task, Multi-Label Video Dataset.<br>
      <span class="author">Jamie Ray</span>, <span class="author">Heng Wang</span>, <span class="author">Du Tran</span>, <span class="author">Yufei Wang</span>, <span class="author">Matt Feiszli</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Manohar Paluri</span>.<br>
      <i>European Conference on Computer Vision (ECCV)</i>, 2018. <br>
      [<a href="http://openaccess.thecvf.com/content_ECCV_2018/html/Heng_Wang_Scenes-Objects-Actions_A_Multi-Task_ECCV_2018_paper.html">paper</a>]
      [code]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/closerlook.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      A Closer Look at Spatiotemporal Convolutions for Action Recognition.<br>
      <span class="author">Du Tran</span>, <span class="author">Heng Wang</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Jamie Ray</span>, <span class="author">Yann LeCun</span>, <span class="author">Manohar Paluri</span>.<br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2018. <br>
      [<a href="https://arxiv.org/pdf/1711.11248.pdf">paper</a>]
      [<a href="https://github.com/facebookresearch/VMZ">code</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href="https://github.com/facebookresearch/DetectAndTrack"><img src="images/detect_track.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
        Detect-and-Track: Efficient Pose Estimation in Videos.<br>
      <span class="author">Rohit Girdhar</span>, <span class="author">Georgia Gkioxari</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Manohar Paluri</span>, <span class="author">Du Tran</span>.<br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2018. <br>
      (<span style="color:#a61111">1st place at ICCV'17 PoseTrack Challenge</span>). <br>
      [<a href="https://arxiv.org/pdf/1712.09184.pdf">paper</a>]
      [<a href="https://github.com/facebookresearch/DetectAndTrack">code</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2017</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href="https://github.com/facebookresearch/DetectAndTrack"><img src="images/detect_and_track.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Simple, Efficient and Effective Keypoint Tracking.<br>
      <span class="author">Rohit Girdhar</span>, <span class="author">Georgia Gkioxari</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Deva Ramanan</span>, <span class="author">Manohar Paluri</span>, <span class="author">Du Tran</span>.<br>
      <i>International Conference on Computer Vision (ICCV) PoseTrack Workshop</i>, 2017. <br>
      [<a href="https://posetrack.net/workshops/iccv2017/pdfs/ProTracker.pdf">paper</a>]
      [<a href="https://github.com/facebookresearch/DetectAndTrack">code</a>]<br>
    </td>
  </tr>
  </table>
  <p><b>2016</b></p>
  <table>
    <tr>
      <td width="26%" valign="center" align="left"><a href=""><img src="images/v2v.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Deep End2End Voxel2Voxel Prediction.<br>
      <span class="author">Du Tran</span>, <span class="author">Lubomir Bourdev</span>, <span class="author">Rob Fergus</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Manohar Paluri</span>.<br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR) DeepVision Workshop</i>, 2016. <br>
      [<a href="papers/cvpr16w_voxel.pdf">paper</a>]
      [<a href="https://github.com/facebook/C3D">code</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="26%" valign="center" align="left"><a href=""><img src="images/exmove.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis.<br>
      <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span>.<br>
      <i>International Journal on Computer Vision (IJCV)</i>, 2016. <br>
      [<a href="papers/ijcv16.pdf">paper</a>]
      [<a href="http://vlg.cs.dartmouth.edu/exmoves">code</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2015</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/c3d.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Learning Spatiotemporal Features with 3D Convolutional Networks.<br>
      <span class="author">Du Tran</span>, <span class="author">Lubomir Bourdev</span>, <span class="author">Rob Fergus</span>, <span class="author">Lorenzo Torresani</span>, <span class="author">Manohar Paluri</span>.<br>
      <i>International Conference on Computer Vision (ICCV)</i>, 2015. <br>
      (<b><span style="color:#a61111">the 3rd most cited paper of ICCV'15</span></b> <a href="https://www.aminer.org/bestpaper/5eeb1307b5261c744f15bd9c">link</a>, <a href="https://scholar.google.com/citations?hl=en&vq=en&view_op=list_hcore&venue=uDnJSYNMB80J.2020">link</a>). <br>
      [<a href="papers/c3d_video.pdf">paper</a>]
      [<a href="https://github.com/facebook/C3D">code</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2014</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/exmove.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      EXMOVES: Classifier-based Features for Scalable Action Recognition.<br>
      <span class="author">Du Tran</span>, <span class="author">Lorenzo Torresani</span>.<br>
      <i>International Conference on Learning Representations (ICLR)</i>, 2014. <br>
      [<a href="https://arxiv.org/pdf/1312.5785.pdf">paper</a>]
      [<a href="http://vlg.cs.dartmouth.edu/exmoves">code</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/max_path_pami.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Video Event Detection: from Subvolume Localization to Spatio-Temporal Path Search.<br>
      <span class="author">Du Tran</span>, <span class="author">Junsong Yuan</span>, <span class="author">David Forsyth</span>.<br>
      <i>IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</i>, 2014. <br>
      [<a href="papers/pami13.pdf">paper</a>]
      [<a href="code/MaxPath_Cpp.zip">code</a>]<br>
      </td>
    </tr>
  </table>
  <p><b>2012 and before</b></p>
  <table>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/detect_kiss.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Max-Margin Structured Output Regression for Spatio-Temporal Action Localization.<br>
      <span class="author">Du Tran</span>, <span class="author">Junsong Yuan</span>.<br>
      <i>Neural Information Processing Systems (NIPS)</i>, 2012. <br>
      [<a href="papers/nips12.pdf">paper</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/maxpath_cvpr.gif" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Optimal Spatio-Temporal Path Discovery for Video Event Detection.<br>
      <span class="author">Du Tran</span>, <span class="author">Junsong Yuan</span>.<br>
      <i>IEEE Computer Vision and Pattern Recognition (CVPR)</i>, 2011. <br>
      [<a href="papers/cvpr11.pdf">paper</a>]
      [<a href="code/MaxPath_Cpp.zip">code</a>]
      [<a href="https://www.dropbox.com/s/8moajye83bkstv8/YoutubeWalking.zip">data</a>]<br>
      </td>
    </tr>
    <tr>
      <td width="25%" valign="center" align="left"><a href=""><img src="images/motion_context.png" alt="sym" width="200px" height="110px" style="padding-top:0px;padding-bottom:0px;border-radius:15px;"></a></td>
      <td valign="top" align="left">
      Human Activity Recognition with Metric Learning.<br>
      <span class="author">Du Tran</span>, <span class="author">Alexander Sorokin</span>.<br>
      <i>European Conference on Computer Vision (ECCV)</i>, 2008. <br>
      [<a href="papers/eccv08.pdf">paper</a>]
      [<a href="https://www.dropbox.com/s/puoo2o8huekcpu8/mcontext.zip">code</a>]<br>
      </td>
    </tr>
  </table>
</div>
<footer>
  <h5>This website was designed by my son</a>.</h5>
</footer>
  </body>
</html>