StructED - Introduction

Here we describe the learning algorithms we implemented in StructED package and their mandatory parameters. Those parameters needs to be set before using any StructED model. In order to set these parameter you need to add them the StructED model constructor. The parameters should be at the following order:


Structured Perceptron (Collins, 2002).
Loss	-
Update Rule	$w_{t+1} = w_t + \phi(x_t, y_t) - \phi(x_t,\hat{y}^0_t)$
Parameters	-
Structured SVM (Tsochantaridis et al., 2005).
Loss	$max_{\hat{y}} ~[ \ell(y,\hat{y}) - w^\top\phi(x,y) + w^\top\phi(x,\hat{y})]$
Update Rule	$w_{t+1} = (1-\eta_t \lambda)w_t + \eta_t( \phi(x_t, y_t) - \phi(x_t,\hat{y}^1_t))$
Parameters	eta: learning rate lambda: regularization
Passive Aggressive (Crammer et al., 2006).
Loss	-
Update Rule	$w_{t+1} = w_{t} + \tau_{t}(\phi(x_t,y_t)) - \phi(x_t,\hat{y}^1_t))$
Parameters	C: trade-off parameter
CRF (Lafferty et al., 2001).
Loss	$-\ln P_{w}(y \,\|\, x)$, where $P_{w}(y \,\|\, x) = \frac{1}{Z_{w}(x)} \exp\{w^\top \phi(x, y)\}$
Update Rule	$w_{t+1} = (1-\eta_t \lambda)\,w_{t} + \eta_t\Big( \phi(x_{j_t},y_{j_t}) - \mathbb{E}_{y'\sim P_{w}(y' \,\|\, x)}[ \phi(x_{j_t},y')] \Big)$
Parameters	eta: learning rate lambda: regularization
Direct Loss Minimization (McAllester et al., 2010).
Loss	$\ell (y,\hat{y})$
Update Rule	$w_{t+1} = w_{t} + \frac{\eta_t}{\epsilon}(\phi(x_t,\hat{y}^{-\epsilon}_t) - \phi(x_t,\hat{y}^0_{t}))$
Parameters	eta: learning rate epsilon
Structured Ramp Loss (McAllester and Keshet, 2011).
Loss	$\max_{\hat{y}} [ \ell(y,\hat{y}) + w^\top \phi(x,\hat{y})] - \max_{\tilde{y}} [w^\top \phi(x,\tilde{y})]$
Update Rule	$w_{t+1} = (1-\eta_t\lambda)\,w_{t} + \eta_t(\phi(x_t,\hat{y}^0_t) - \phi(x_t,\hat{y}^1_t))$
Parameters	eta: learning rate lambda: regularization
Probit Loss (Keshet et al., 2011).
Loss	$\mathbb{E}_{\gamma \sim \mathcal{N}(0,I)}[\ell (y,\hat{y}_{w+\gamma})]$
Update Rule	$w_{t+1} = (1-\eta_t\lambda)\,w_t + \eta_t \mathbb{E}_{\gamma \sim \mathcal{N}(0,I)}[ \gamma \, \ell (y_t,\hat{y}^0_{w+\gamma})]$
Parameters	eta: learning rate lambda: regularization num_of_iteration: the number of times to generation noise noise_all_vector: boolean (1/0) indicates whether to generate noise through all the weight vector or just in one random place noise_mean: the mean of the noise to be generated(we draw the noise from a normal distribution) noise_std: the standard deviation of the noise to be generated(we draw the noise from a normal distribution)

Algorithms