Structured Perceptron (Collins, 2002). |
Loss |
- |
Update Rule |
$w_{t+1} = w_t + \phi(x_t, y_t) - \phi(x_t,\hat{y}^0_t)$ |
Parameters |
- |
Structured SVM (Tsochantaridis et al., 2005). |
Loss |
$max_{\hat{y}} ~[ \ell(y,\hat{y}) - w^\top\phi(x,y) + w^\top\phi(x,\hat{y})]$ |
Update Rule |
$w_{t+1} = (1-\eta_t \lambda)w_t + \eta_t( \phi(x_t, y_t) - \phi(x_t,\hat{y}^1_t))$ |
Parameters |
- eta: learning rate
- lambda: regularization
|
Passive Aggressive (Crammer et al., 2006). |
Loss |
- |
Update Rule |
$w_{t+1} = w_{t} + \tau_{t}(\phi(x_t,y_t)) - \phi(x_t,\hat{y}^1_t))$ |
Parameters |
- C: trade-off parameter
|
CRF (Lafferty et al., 2001). |
Loss |
$-\ln P_{w}(y \,|\, x)$, where $P_{w}(y \,|\, x) = \frac{1}{Z_{w}(x)} \exp\{w^\top \phi(x, y)\}$ |
Update Rule |
$w_{t+1} = (1-\eta_t \lambda)\,w_{t} + \eta_t\Big( \phi(x_{j_t},y_{j_t}) - \mathbb{E}_{y'\sim P_{w}(y' \,|\, x)}[ \phi(x_{j_t},y')] \Big)$ |
Parameters |
- eta: learning rate
- lambda: regularization
|
Direct Loss Minimization (McAllester et al., 2010). |
Loss |
$\ell (y,\hat{y})$ |
Update Rule |
$w_{t+1} = w_{t} + \frac{\eta_t}{\epsilon}(\phi(x_t,\hat{y}^{-\epsilon}_t) - \phi(x_t,\hat{y}^0_{t}))$ |
Parameters |
- eta: learning rate
- epsilon
|
Structured Ramp Loss (McAllester and Keshet, 2011). |
Loss |
$\max_{\hat{y}} [ \ell(y,\hat{y}) + w^\top \phi(x,\hat{y})] - \max_{\tilde{y}} [w^\top \phi(x,\tilde{y})]$ |
Update Rule |
$w_{t+1} = (1-\eta_t\lambda)\,w_{t} + \eta_t(\phi(x_t,\hat{y}^0_t) - \phi(x_t,\hat{y}^1_t))$ |
Parameters |
- eta: learning rate
- lambda: regularization
|
Probit Loss (Keshet et al., 2011). |
Loss |
$\mathbb{E}_{\gamma \sim \mathcal{N}(0,I)}[\ell (y,\hat{y}_{w+\gamma})]$ |
Update Rule |
$w_{t+1} = (1-\eta_t\lambda)\,w_t + \eta_t \mathbb{E}_{\gamma \sim \mathcal{N}(0,I)}[ \gamma \, \ell (y_t,\hat{y}^0_{w+\gamma})]$ |
Parameters |
- eta: learning rate
- lambda: regularization
- num_of_iteration: the number of times to generation noise
- noise_all_vector: boolean (1/0) indicates whether to generate noise through all the weight vector or just in one random place
- noise_mean: the mean of the noise to be generated(we draw the noise from a normal distribution)
- noise_std: the standard deviation of the noise to be generated(we draw the noise from a normal distribution)
|