Gradient Descent

Jump to bottom Edit New page

Jinho D. Choi edited this page Jan 1, 2017 · 5 revisions

Gradient Descent

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, Collins, EMNLP, 2002.
Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms, Zhang, ICML, 2004.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Duchi et. al., JMLR, 2011.

Supplementary

Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty, Tsuruoka et. al., ACL, 2009.
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization, Xiao, JMLR, 2010.
AdaDelta: An Adaptive Learning Rate Method, Zeiler, arXiv:1212.5701, 2012.
ADAM: A Method For Stochastic Optimization, Kingma and Ba, ICLR, 2015.†

Copyright © 2015-2019 Emory University - All Rights Reserved.