Artificial neuron is simply a computational unit, which makes a particular computation based on other units it is connected to.

Pre-activation (or input activation): a(x) = b + ∑wi + xi

Neuron (output) activation: h(x) = g(a(x))

where, g(.) is called activation function, b is called neuron bias and w is the connection weights.

Activation functions:

  • Linear activation function

    g(a) = a

    Performs no input squashing

  • Sigmoid activation function

    g(a) = sigm(a) = 1/1 + exp(-a)

    Squashes the neuron’s pre-activation between 0 and 1.

    Always positive

    Bounded

    Strictly increasing

  • Hyperbolic tangent(tanh) activation function

    g(a) = tanh(a)

    Squashes the neuron’s pre-activation between -1 and 1

    Can be positive or negative

    Bounded

    Strictly increasing

  • Rectified Linear Activation function

    g(a) = reclin(a) = max(0,a)

    Bounded below by 0 and always non-negative

    Not upper bounded

    Monotomically increasing

    Tends to give neurons sparse activities. That is, the value is zero across many different sets of inputs.

Capacity of a single neuron

Capacity essentially means, the complexity of the computation that a neuron can perform.

  • It could do binary classification with Sigmoid, which can interpret neuron as estimating p(y=1|x). That is, the probability of input ‘x’ belonging to class 1. (Since, we are using sigmoid activation function the output is bounded b/w 0 and 1). It is also known as logistic regression classifier. If the value is greater than 0.5, predict class 1, otherwise, predict 0.

  • The classifier is performing linear classification, so, if we have a problem where we want to classify objects described by input vectors into different classes, if we can draw a hyperplane or a line (in case of 2d) between these two types of objects, then a single artificial neuron could do that for us!

  • However, a single neuron cannot model a non-linearly seperable output.

If the input is transformed in a better representation, we can actually have a linear seperable.

Thereby, complicated computations cannot be modelled by single neuron. This is the main reason behind why we go for multilayer neural network.