ATRCell
RecurrentLayers.ATRCell
— TypeATRCell(input_size => hidden_size;
init_kernel = glorot_uniform,
init_recurrent_kernel = glorot_uniform)
Addition-subtraction twin-gated recurrent cell [Zhang2018]. See ATR
for a layer that processes entire sequences.
Arguments
input_size => hidden_size
: input and inner dimension of the layer.
Keyword arguments
init_kernel
: initializer for the input to hidden weights. Default isglorot_uniform
.init_recurrent_kernel
: initializer for the hidden to hidden weights. Default isglorot_uniform
.bias
: include a bias or not. Default istrue
.
Equations
\[\begin{aligned} \mathbf{p}(t) &= \mathbf{W}_{ih} \mathbf{x}(t) + \mathbf{b}, \\ \mathbf{q}(t) &= \mathbf{W}_{hh} \mathbf{h}(t-1), \\ \mathbf{i}(t) &= \sigma\left( \mathbf{p}(t) + \mathbf{q}(t) \right), \\ \mathbf{f}(t) &= \sigma\left( \mathbf{p}(t) - \mathbf{q}(t) \right), \\ \mathbf{h}(t) &= \mathbf{i}(t) \circ \mathbf{p}(t) + \mathbf{f}(t) \circ \mathbf{h}(t-1). \end{aligned} \]
Forward
atrcell(inp, state)
atrcell(inp)
Arguments
inp
: The input to the atrcell. It should be a vector of sizeinput_size
or a matrix of sizeinput_size x batch_size
.state
: The hidden state of the ATRCell. It should be a vector of sizehidden_size
or a matrix of sizehidden_size x batch_size
. If not provided, it is assumed to be a vector of zeros, initialized byFlux.initialstates
.
Returns
- A tuple
(output, state)
, where both elements are given by the updated statenew_state
, a tensor of sizehidden_size
orhidden_size x batch_size
.
- Zhang2018Zhang, B. et al. Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks EMNLP 2018.