\( \newcommand{\oneinf}{\ell_{1,\infty}} \newcommand{\onetwo}{\ell_{1,2}} \)

Algorithms


Here we describe the learning algorithms we implemented in StructED package and their mandatory parameters. Those parameters needs to be set before using any StructED model. In order to set these parameter you need to add them the StructED model constructor. The parameters should be at the following order:

Structured Perceptron (Collins, 2002).
Loss -
Update Rule $w_{t+1} = w_t + \phi(x_t, y_t) - \phi(x_t,\hat{y}^0_t)$
Parameters -
Structured SVM (Tsochantaridis et al., 2005).
Loss $max_{\hat{y}} ~[ \ell(y,\hat{y}) - w^\top\phi(x,y) + w^\top\phi(x,\hat{y})]$
Update Rule $w_{t+1} = (1-\eta_t \lambda)w_t + \eta_t( \phi(x_t, y_t) - \phi(x_t,\hat{y}^1_t))$
Parameters
  1. eta: learning rate
  2. lambda: regularization
Passive Aggressive (Crammer et al., 2006).
Loss -
Update Rule $w_{t+1} = w_{t} + \tau_{t}(\phi(x_t,y_t)) - \phi(x_t,\hat{y}^1_t))$
Parameters
  1. C: trade-off parameter
CRF (Lafferty et al., 2001).
Loss $-\ln P_{w}(y \,|\, x)$, where $P_{w}(y \,|\, x) = \frac{1}{Z_{w}(x)} \exp\{w^\top \phi(x, y)\}$
Update Rule $w_{t+1} = (1-\eta_t \lambda)\,w_{t} + \eta_t\Big( \phi(x_{j_t},y_{j_t}) - \mathbb{E}_{y'\sim P_{w}(y' \,|\, x)}[ \phi(x_{j_t},y')] \Big)$
Parameters
  1. eta: learning rate
  2. lambda: regularization
Direct Loss Minimization (McAllester et al., 2010).
Loss $\ell (y,\hat{y})$
Update Rule $w_{t+1} = w_{t} + \frac{\eta_t}{\epsilon}(\phi(x_t,\hat{y}^{-\epsilon}_t) - \phi(x_t,\hat{y}^0_{t}))$
Parameters
  1. eta: learning rate
  2. epsilon
Structured Ramp Loss (McAllester and Keshet, 2011).
Loss $\max_{\hat{y}} [ \ell(y,\hat{y}) + w^\top \phi(x,\hat{y})] - \max_{\tilde{y}} [w^\top \phi(x,\tilde{y})]$
Update Rule $w_{t+1} = (1-\eta_t\lambda)\,w_{t} + \eta_t(\phi(x_t,\hat{y}^0_t) - \phi(x_t,\hat{y}^1_t))$
Parameters
  1. eta: learning rate
  2. lambda: regularization
Probit Loss (Keshet et al., 2011).
Loss $\mathbb{E}_{\gamma \sim \mathcal{N}(0,I)}[\ell (y,\hat{y}_{w+\gamma})]$
Update Rule $w_{t+1} = (1-\eta_t\lambda)\,w_t + \eta_t \mathbb{E}_{\gamma \sim \mathcal{N}(0,I)}[ \gamma \, \ell (y_t,\hat{y}^0_{w+\gamma})]$
Parameters
  1. eta: learning rate
  2. lambda: regularization
  3. num_of_iteration: the number of times to generation noise
  4. noise_all_vector: boolean (1/0) indicates whether to generate noise through all the weight vector or just in one random place
  5. noise_mean: the mean of the noise to be generated(we draw the noise from a normal distribution)
  6. noise_std: the standard deviation of the noise to be generated(we draw the noise from a normal distribution)