Functions

Functions

Activation

Merlin.creluMethod.
crelu(x::Var)

Concatenated Rectified Linear Unit. The output is twice the size of the input.

\[f(x) = (\max(0,x), \max(0,-x))\]

References

source
Merlin.eluMethod.
elu(x::Var)

Exponential Linear Unit.

References

\[f(x) = \begin{cases} x & x > 0 \\ \alpha (e^{x}-1) & x\leq0 \end{cases}\]

where $\alpha=1$.

source
Merlin.leaky_reluFunction.
leaky_relu(x::Var, alpha::Float64=0.2)

Leaky Rectified Linear Unit.

\[f(x) = \begin{cases} x & x > 0 \\ \alpha x & x \leq 0 \end{cases}\]

References

source
Merlin.reluMethod.
relu(x::Var)

Rectified Linear Unit.

\[f(x) = \max(0, x)\]
source
Merlin.seluMethod.
selu(x::Var)

Scaled Exponential Linear Unit.

\[f(x) = \lambda \begin{cases} x & x > 0 \\ \alpha e^{x}-\alpha & x\leq0 \end{cases}\]

where $\lambda=1.0507$ and $\alpha=1.6733$.

References

Klambauer et al., "Self-Normalizing Neural Networks", NIPS 2017.

source
Merlin.sigmoidMethod.
sigmoid(x)

Sigmoid logistic function.

\[f(x) = (1 + \exp(-x))^{-1}\]
source
Merlin.SwishType.
Swish

Swish activation function.

\[f(x) = x \cdot \sigma (\beta x)\]

where $\beta$ is a leanable parameter.

References

source
Base.tanhMethod.
tanh(x::Var)

Hyperbolic tangent function.

source

Convolution

Merlin.Conv1DType.
Conv1D(T, ksize, insize, outsize, pad, stride, [dilation=1, init_W=Xavier(), init_b=Fill(0)])

1-dimensional convolution function.

T = Float32
x = Var(rand(T,10,5))
f = Conv1D(T, 5, 10, 3, 2, 1)
y = f(x)
source

Loss

Merlin.l2Function.
l2(x::Var, lambda::Float64)

L2 regularization.

\[y = \frac{\lambda}{2}\left\Vert \mathbf{x} \right\Vert ^{2}\]
x = Var(rand(Float32,10,5))
y = l2(x, 0.01)
source
Merlin.crossentropyFunction.
crossentropy(p, q)

Cross-entropy function between p and q.

\[f(x) = -\sum_{x} p(x) \log q(x)\]
  • p::Var: Var of Vector{Int} or Matrix{Float}. If p is Vector{Int} and p[i] == 0, returns 0.

  • q::Var: Var of Matrix{Float}

p = Var(rand(0:10,5))
q = softmax(Var(rand(Float32,10,5)))
y = crossentropy(p, q)
source
Merlin.mseFunction.
mse(x1, x2)

Mean Squared Error function between x1 and x2. The mean is calculated over the minibatch. Note that the error is not scaled by 1/2.

source
softmax_crossentropy(p, x)

Cross-entropy function between p and $softmax(x)$.

\[f(x) = -\sum_{x} p(x) \log q(x)\]

where $q = softmax(x)$

  • p: Var of Vector{Int} or Matrix{Float}

  • q: Var of Matrix{Float}

p = Var(rand(0:10,5))
q = Var(rand(Float32,10,5))
y = softmax_crossentropy(p, x)
source

Math

Base.broadcastFunction.
.+(x1::Var, x2::Var)
source
.-(x1::Var, x2::Var)
source
\.\*(x1::Var, x2::Var)
source
Base.:+Function.
+(x1::Var, x2::Var)
+(a::Number, x::Var)
+(x::Var, a::Number)
source
Base.:-Function.
-(x1, x2)
source
Base.:*Function.
\*(A::Var, B::Var)
source
Base.:/Function.
/(x1::Var, a)
source
Base.:^Function.
^(x::Var, a::Number)
source
Base.transposeFunction.
transpose(x)
source

Random

Merlin.dropoutFunction.
dropout(x::Var, rate::Float64, train::Bool)

If train is true, drops elements randomly with probability $rate$ and scales the other elements by factor $1 / (1 - rate)$. Otherwise, it just returns x.

source

Recurrent

Merlin.BiLSTMType.
BiLSTM(::Type{T}, insize::Int, outsize::Int, [init_W=Uniform(0.001), init_U=Orthogonal()])

Bi-directional Long Short-Term Memory network. See LSTM for more details.

source
Merlin.LSTMType.
LSTM(::Type{T}, insize::Int, outsize::Int, [init_W=Uniform(0.001), init_U=Orthogonal()])

Long Short-Term Memory network.

\[\begin{align*} \mathbf{f}_{t} & =\sigma_{g}(W_{f}\mathbf{x}_{t}+U_{f}\mathbf{h}_{t-1}+\mathbf{b}_{f})\\ \mathbf{i}_{t} & =\sigma_{g}(W_{i}\mathbf{x}_{t}+U_{i}\mathbf{h}_{t-1}+\mathbf{b}_{i})\\ \mathbf{o}_{t} & =\sigma_{g}(W_{o}\mathbf{x}_{t}+U_{o}\mathbf{h}_{t-1}+\mathbf{b}_{o})\\ \mathbf{c}_{t} & =\mathbf{f}_{t}\odot\mathbf{c}_{t-1}+\mathbf{i}_{t}\odot\sigma_{c}(W_{c}\mathbf{x}_{t}+U_{c}\mathbf{h}_{t-1}+\mathbf{b}_{c})\\ \mathbf{h}_{t} & =\mathbf{o}_{t}\odot\sigma_{h}(\mathbf{c}_{t}) \end{align*}\]
  • $x_t \in R^{d}$: input vector to the LSTM block

  • $f_t \in R^{h}$: forget gate's activation vector

  • $i_t \in R^{h}$: input gate's activation vector

  • $o_t \in R^{h}$: output gate's activation vector

  • $h_t \in R^{h}$: output vector of the LSTM block

  • $c_t \in R^{h}$: cell state vector

  • $W \in R^{h \times d}$, $U \in R^{h \times h}$ and $b \in R^{h}$: weight matrices and bias vectors

  • $\sigma_g$: sigmoid function

  • $\sigma_c$: hyperbolic tangent function

  • $\sigma_h$: hyperbolic tangent function

👉 Example

T = Float32
x = Var(rand(T,100,10))
f = LSTM(T, 100, 100)
h = f(x)
source

Reduction

Base.maxFunction.
max(x::Var, dim::Int)

Returns the maximum value over the given dimension.

👉 Example

x = Var(rand(Float32,10,5))
y = max(x, 1)
source
Merlin.max_batchFunction.
max_batch(x::Var, dims::Vector{Int})
source

Misc

argmax
batchsort
concat
getindex
Linear
logsoftmax
lookup
reshape
softmax
standardize
window1d