MXNet in Perlautomatic differentiation with autogradthis page replicates
script for this pageoutline of this pageTo minimize a loss function during model training, we iteratively compute the gradient of the loss function with respect to the weights and then update those weights. For that purpose, this page explains how to compute the gradient with autograd. Then to get started with autograd, the only module that we must explicitly list is: use AI::MXNet qw(mx); which imports the AI::MXNet::AutoGrad module, so it is not necessary to list it separately. basic usageFor a simple example, we will differentiate the function $f\left(x\right)=2{x}^{2}$ with respect to $x$ . So let's start by assigning an initial value to $x$ : my $xx = nd>array([[1, 2], [3, 4]]); print $xx>aspdl; which returns:
Next, let's invoke the attach_grad method, which informs MXNet of our plans to store the gradient of $f\left(x\right)$ with respect to $x$ : $xx>attach_grad; And let's record the definition $y=f\left(x\right)$ inside an autograd.record() scope, so that we can compute its gradients later. my $yy; mx>autograd>record(sub { }); Next, we invoke back propagation: $yy>backward; and check the result. Noting that $y=2{x}^{2}$ , the derivative $\frac{dy}{dx}=4x$ should be: [[ 4, 8],[12,16]]. print $xx>grad>aspdl; yields:
which matches our expectation. using control flowsSometimes we want to write dynamic programs where the execution depends on some realtime values. MXNet will record the execution trace and compute the gradient. For example, the function below doubles the inputs until its Euclidean norm reaches 1000. Then it selects one element depending on the sum of its elements. sub ff { } To keep the code concise, we can separately define a pdlmnorm subroutine: sub pdlmnorm { } As before, we record the trace, feed in random values and invoke back propagation: my $aa = nd>random>uniform(shape=>2); $aa>attach_grad; my $cc; mx>autograd>record( sub{ }); $cc>backward; Because $bb is a linear function of $aa and because $cc is chosen from $bb, the gradient with respect to $aa will either be [$cc/$aa>slice(0), 0] or [0, $cc/$aa>slice(1)], depending on which element from $bb was picked. print $aa>grad>aspdl; my $alt = $cc/$aa; print $alt>aspdl; which yields:
and shows that the first element of $bb was picked. Copyright © 20022020 Eryk Wdowiak :: previous :: :: next :: 
