Automatic Differentiation with autograd

MXNet crash course

this page replicates

"Automatic Differentiation ..." by MXNet
YouTube video by Thom Lane (AWS AI)
using AI::MXNet by Sergey Kolychev

script for this page

mx-03_autograd.pl

outline of this page

To minimize a loss function during model training, we iteratively compute the gradient of the loss function with respect to the weights and then update those weights. For that purpose, this page explains how to compute the gradient with autograd.

Then to get started with autograd, the only module that we must explicitly list is:

use AI::MXNet qw(mx);

which imports the AI::MXNet::AutoGrad module, so it is not necessary to list it separately.

basic usage

For a simple example, we will differentiate the function $f (x) = 2 x^{2}$ with respect to $x$ . So let's start by assigning an initial value to $x$ :

my $xx = nd->array([[1, 2], [3, 4]]);

print $xx->aspdl;

which returns:

[
[1 2]
[3 4]
]

Next, let's invoke the attach_grad method, which informs MXNet of our plans to store the gradient of $f (x)$ with respect to $x$ :

$xx->attach_grad;

And let's record the definition $y = f (x)$ inside an autograd.record() scope, so that we can compute its gradients later.

my $yy;

mx->autograd->record(sub {
$yy = 2 * $xx * $xx ;

});

Next, we invoke back propagation:

$yy->backward;

and check the result. Noting that $y = 2 x^{2}$ , the derivative $\frac{d y}{d x} = 4 x$ should be: [[ 4, 8],[12,16]].

print $xx->grad->aspdl;

yields:

[
[ 4 8]
[12 16]
]

which matches our expectation.

using control flows

Sometimes we want to write dynamic programs where the execution depends on some real-time values. MXNet will record the execution trace and compute the gradient.

For example, the function below doubles the inputs until its Euclidean norm reaches 1000. Then it selects one element depending on the sum of its elements.

sub ff {
my $aa = $_[0];
my $bb = $aa * 2;

while ( pdlmnorm($bb) < 1000) {
  $bb = $bb *2;
}

my $cc;
if ( $bb->sum >= 0 ) {
  $cc = $bb->slice('0');
} else {
  $cc = $bb->slice('1');
}
return nd->array($cc);

}

To keep the code concise, we can separately define a pdlmnorm subroutine:

sub pdlmnorm {
require PDL::LinearAlgebra;
my $invec = $_[0] ;
my $mnorm = PDL::LinearAlgebra::mnorm( $invec->aspdl);
$mnorm = PDL::sclr($mnorm);
return $mnorm;

}

As before, we record the trace, feed in random values and invoke back propagation:

my $aa = nd->random->uniform(shape=>2);

$aa->attach_grad;

my $cc;

mx->autograd->record( sub{
$cc = ff($aa);

});

$cc->backward;

Because $bb is a linear function of $aa and because $cc is chosen from $bb, the gradient with respect to $aa will either be [$cc/$aa->slice(0), 0] or [0, $cc/$aa->slice(1)], depending on which element from $bb was picked.

print $aa->grad->aspdl;

my $alt = $cc/$aa;

print $alt->aspdl;

which yields:

[2048 0]
[2048 1895.89]

and shows that the first element of $bb was picked.

:: previous ::
create a neural network with Gluon

table of contents

:: next ::
train the neural network