Functions

Activation

Merlin.crelu — Method.

crelu(x::Var)

Concatenated Rectified Linear Unit. The output is twice the size of the input.

\[f(x) = (\max(0,x), \max(0,-x))\]

References

Shang et al., "Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units", arXiv 2016.

Merlin.elu — Method.

elu(x::Var)

Exponential Linear Unit.

References

Clevert et al., "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)", arXiv 2015.

\[f(x) = \begin{cases} x & x > 0 \\ \alpha (e^{x}-1) & x\leq0 \end{cases}\]

where $\alpha=1$.

Merlin.leaky_relu — Function.

leaky_relu(x::Var, alpha::Float64=0.2)

Leaky Rectified Linear Unit.

\[f(x) = \begin{cases} x & x > 0 \\ \alpha x & x \leq 0 \end{cases}\]

References

Maas et al., "Rectifier Nonlinearities Improve Neural Network Acoustic Models", ICML 2013.

Merlin.relu — Method.

relu(x::Var)

Rectified Linear Unit.

\[f(x) = \max(0, x)\]

Merlin.selu — Method.

selu(x::Var)

Scaled Exponential Linear Unit.

\[f(x) = \lambda \begin{cases} x & x > 0 \\ \alpha e^{x}-\alpha & x\leq0 \end{cases}\]

where $\lambda=1.0507$ and $\alpha=1.6733$.

References

Klambauer et al., "Self-Normalizing Neural Networks", NIPS 2017.

Merlin.sigmoid — Method.

sigmoid(x)

Sigmoid logistic function.

\[f(x) = (1 + \exp(-x))^{-1}\]

Merlin.Swish — Type.

Swish

Swish activation function.

\[f(x) = x \cdot \sigma (\beta x)\]

where $\beta$ is a leanable parameter.

References

Ramachandran et al. "Searching for Activation Functions", arXiv 2017.

Base.tanh — Method.

tanh(x::Var)

Hyperbolic tangent function.

Convolution

Merlin.Conv1D — Type.

Conv1D(T, ksize, insize, outsize, pad, stride, [dilation=1, init_W=Xavier(), init_b=Fill(0)])

1-dimensional convolution function.

T = Float32
x = Var(rand(T,10,5))
f = Conv1D(T, 5, 10, 3, 2, 1)
y = f(x)

Loss

Merlin.l2 — Function.

l2(x::Var, lambda::Float64)

L2 regularization.

\[y = \frac{\lambda}{2}\left\Vert \mathbf{x} \right\Vert ^{2}\]

x = Var(rand(Float32,10,5))
y = l2(x, 0.01)

Merlin.crossentropy — Function.

crossentropy(p, q)

Cross-entropy function between p and q.

\[f(x) = -\sum_{x} p(x) \log q(x)\]

p::Var: Var of Vector{Int} or Matrix{Float}. If p is Vector{Int} and p[i] == 0, returns 0.
q::Var: Var of Matrix{Float}

p = Var(rand(0:10,5))
q = softmax(Var(rand(Float32,10,5)))
y = crossentropy(p, q)

Merlin.mse — Function.

mse(x1, x2)

Mean Squared Error function between x1 and x2. The mean is calculated over the minibatch. Note that the error is not scaled by 1/2.

Merlin.softmax_crossentropy — Function.

softmax_crossentropy(p, x)

Cross-entropy function between p and $softmax(x)$.

\[f(x) = -\sum_{x} p(x) \log q(x)\]

where $q = softmax(x)$

p: Var of Vector{Int} or Matrix{Float}
q: Var of Matrix{Float}

p = Var(rand(0:10,5))
q = Var(rand(Float32,10,5))
y = softmax_crossentropy(p, x)

Math

Base.broadcast — Function.

.+(x1::Var, x2::Var)

.-(x1::Var, x2::Var)

\.\*(x1::Var, x2::Var)

Base.:+ — Function.

+(x1::Var, x2::Var)
+(a::Number, x::Var)
+(x::Var, a::Number)

Base.:- — Function.

-(x1, x2)

Base.:* — Function.

\*(A::Var, B::Var)

Base.:/ — Function.

/(x1::Var, a)

Base.:^ — Function.

^(x::Var, a::Number)

Base.transpose — Function.

transpose(x)

Random

Merlin.dropout — Function.

dropout(x::Var, rate::Float64, train::Bool)

If train is true, drops elements randomly with probability $rate$ and scales the other elements by factor $1 / (1 - rate)$. Otherwise, it just returns x.

Recurrent

Merlin.BiLSTM — Type.

BiLSTM(::Type{T}, insize::Int, outsize::Int, [init_W=Uniform(0.001), init_U=Orthogonal()])

Bi-directional Long Short-Term Memory network. See LSTM for more details.

Merlin.LSTM — Type.

LSTM(::Type{T}, insize::Int, outsize::Int, [init_W=Uniform(0.001), init_U=Orthogonal()])

Long Short-Term Memory network.

\[\begin{align*} \mathbf{f}_{t} & =\sigma_{g}(W_{f}\mathbf{x}_{t}+U_{f}\mathbf{h}_{t-1}+\mathbf{b}_{f})\\ \mathbf{i}_{t} & =\sigma_{g}(W_{i}\mathbf{x}_{t}+U_{i}\mathbf{h}_{t-1}+\mathbf{b}_{i})\\ \mathbf{o}_{t} & =\sigma_{g}(W_{o}\mathbf{x}_{t}+U_{o}\mathbf{h}_{t-1}+\mathbf{b}_{o})\\ \mathbf{c}_{t} & =\mathbf{f}_{t}\odot\mathbf{c}_{t-1}+\mathbf{i}_{t}\odot\sigma_{c}(W_{c}\mathbf{x}_{t}+U_{c}\mathbf{h}_{t-1}+\mathbf{b}_{c})\\ \mathbf{h}_{t} & =\mathbf{o}_{t}\odot\sigma_{h}(\mathbf{c}_{t}) \end{align*}\]

$x_t \in R^{d}$: input vector to the LSTM block
$f_t \in R^{h}$: forget gate's activation vector
$i_t \in R^{h}$: input gate's activation vector
$o_t \in R^{h}$: output gate's activation vector
$h_t \in R^{h}$: output vector of the LSTM block
$c_t \in R^{h}$: cell state vector
$W \in R^{h \times d}$, $U \in R^{h \times h}$ and $b \in R^{h}$: weight matrices and bias vectors
$\sigma_g$: sigmoid function
$\sigma_c$: hyperbolic tangent function
$\sigma_h$: hyperbolic tangent function

👉 Example

T = Float32
x = Var(rand(T,100,10))
f = LSTM(T, 100, 100)
h = f(x)

Reduction

Base.max — Function.

max(x::Var, dim::Int)

Returns the maximum value over the given dimension.

👉 Example

x = Var(rand(Float32,10,5))
y = max(x, 1)

Merlin.max_batch — Function.

max_batch(x::Var, dims::Vector{Int})

Misc

argmax
batchsort
concat
getindex
Linear
logsoftmax
lookup
reshape
softmax
standardize
window1d