Train the Neural Network

MXNet crash course

this page replicates

script for this page

mx-04_train-nn.pl

outline of this page

This page of the MXNet tutorial explains how to trains a neural network on the Fashion-MNIST dataset. It also shows how to use the Gluon Data API to load data and return it in batches.

The script for this page is partly based on Sergey Kolychev's mnist.pl. It has been rewritten to serve as a template for new projects.

To get started, we need to use a few modules:

use AI::MXNet qw(mx);

use AI::MXNet::Gluon qw(gluon);

use AI::MXNet::Gluon::NN qw(nn);

use AI::MXNet::AutoGrad qw(autograd);

use AI::MXNet::Base;

use Getopt::Long;

Getopt::Long allows us to define the model parameters at the command line. For example, to set the learning rate:

./mx-04_train-nn.pl --lr 0.3

The script also has the following default values:

my $lr = 0.1;

my $momentum = 0.0;

my $batch_size = 256;

my $epochs = 10;

my $cuda = 0;

get the data

Our first step is to download the data with Gluon:

my $mnist_train = gluon->data->vision->FashionMNIST(
root=>'./data/fashion-mnist', train=>1, transform=>\&transformer);

my $mnist_valid = gluon->data->vision->FashionMNIST(
root=>'./data/fashion-mnist', train=>0, transform=>\&transformer);

Each example in the dataset is a [28,28,1] NDarray, representing the height, width and channel of a 28x28 greyscale image. Each element of the image is an integer between 0 and 255. Each label is a scalar that takes integer values between 0 and 9.

The data will remain in that shape until it is loaded. At that time, the transformer subroutine will reshape the examples to [1,28,28], convert the elements to floating point values between 0 and 1. Then it will normalize the elements using the known mean of 0.13 and standard deviation of 0.31.

sub transformer {
my ($data, $label) = @_;
$data = $data->reshape([1,28,28]);
$data = $data->astype('float32')/255;
$data = ( $data - 0.31 ) / 0.31;
return( $data , $label);

}

Our final data preparation step is to load the data. Here, we use Gluon's DataLoader to create a pair of references to arrays of arrays, which will sequentially recall the transformed training and validation data.

my $train_data = gluon->data->DataLoader($mnist_train,
batch_size=>$batch_size, shuffle=>1);

my $valid_data = gluon->data->DataLoader($mnist_valid,
batch_size=>$batch_size, shuffle=>0);

define the model

Here, we train the same LeNet network that we saw before:

my $net = nn->Sequential();

$net->name_scope(sub {
$net->add(nn->Conv2D(channels=>6, kernel_size=>5, activation=>'relu'));
$net->add(nn->MaxPool2D(pool_size=>2, strides=>2));
$net->add(nn->Conv2D(channels=>16, kernel_size=>3, activation=>'relu'));
$net->add(nn->MaxPool2D(pool_size=>2, strides=>2));
$net->add(nn->Flatten());
$net->add(nn->Dense(120, activation=>"relu"));
$net->add(nn->Dense(84, activation=>"relu"));
$net->add(nn->Dense(10));
});

Next, we must define an initialization method. Here, we will use Xavier.

$net->initialize( mx->init->Xavier(), ctx=>$ctx);

And we must define a loss funtion. Instead of squared error, we use SoftmaxCrossEntropyLoss, which first uses the softmax function to obtain predicted probabilities that are then the compared with the labels to compute the cross entropy loss.

my $loss = gluon->loss->SoftmaxCrossEntropyLoss();

Finally, the optimization method that we will use is sgd, stochastic gradient descent.

my $trainer = gluon->Trainer(
$net->collect_params(), 'sgd',
{learning_rate => $lr, momentum => $momentum});

Note that $net collects the parameters for $trainer. To update those weights, we will call the step method.

train and save the model

We will need an auxiliary function to calculate model accuracy:

sub test {

my $ctx = shift;
my $metric = mx->metric->Accuracy();

while( defined( my $d = <$valid_data>)) {

    my ($data, $label) = @$d;
    $data = $data->as_in_context($ctx);
    $label = $label->as_in_context($ctx);

    ## run data through the network
    my $output = $net->($data);
    $metric->update( [$label], [$output]);
}
return $metric->get;

}

And we will need a function to implement the training loop:

sub train {

my ($epochs, $ctx) = @_;

## Collect all parameters from net and its children, then initialize them.
$net->initialize( mx->init->Xavier(), ctx=>$ctx);

## Trainer is for updating parameters with gradient.
my $trainer = gluon->Trainer(
    $net->collect_params(), 'sgd',
    {learning_rate => $lr, momentum => $momentum});

## metric and loss
my $metric = mx->metric->Accuracy();
my $loss = gluon->loss->SoftmaxCrossEntropyLoss();

for my $epoch (0..$epochs-1) {

    ## set scalars to hold time and mean loss
    my $time = time();
    my $llm;

    ## reset data iterator and metric at begining of epoch
    $metric->reset();
    enumerate( sub {
        my ($i, $d) = @_;
        my ($data, $label) = @$d;
        $data = $data->as_in_context($ctx);
        $label = $label->as_in_context($ctx);

        ## Start recording computation graph with record() section.
        ## Recorded graphs can then be differentiated with backward.
        my $output;
        my $LL;
        autograd->record( sub{
            $output = $net->($data);
            $LL = $loss->($output, $label);});
        $LL->backward;

        ## capture the mean loss
        $llm = PDL::sclr($LL->mean->aspdl);

        ## take a gradient step with batch_size equal to data.shape[0]
        $trainer->step($data->shape->[0]);

        ## update metric
        $metric->update( [$label], [$output]);

    }, \@{ $train_data });
    ## end of epoch
    ## the argument to the subroutine is: \@{$train_data}

    ## get training accuracy
    my ($trn_name, $trn_acc) = $metric->get();

    ## get validation accuracy
    my ($val_name, $val_acc) = test($ctx);

    ## capture time interval
    my $nowtime = time();
    my $interval = $nowtime - $time;

    ## print the result
    my $ottxt;
    $ottxt .= "Epoch ". $epoch .":  ";
    $ottxt .= "loss ". sprintf("%.3f", $llm) .",  ";
    $ottxt .= "train acc ". sprintf("%.3f", $trn_acc) .",  ";
    $ottxt .= "test acc ". sprintf("%.3f", $val_acc) ."  ";
    $ottxt .= "in " . $interval ." secs";
    print $ottxt ."\n";
}

## done training, so save parameters
print "done training. saving parameters."."\n";
$net->save_parameters($param_file);
print "\n";

}

Finally, we run the train subroutine, which trains the model and saves the parameters of the trained model.

my $ctx = $cuda ? mx->gpu(0) : mx->cpu();

train($epochs, $ctx);

which prints:

Epoch 0: loss 0.475, train acc 0.683, test acc 0.793 in 72 secs
Epoch 1: loss 0.370, train acc 0.814, test acc 0.830 in 73 secs
Epoch 2: loss 0.295, train acc 0.839, test acc 0.850 in 73 secs
Epoch 3: loss 0.359, train acc 0.853, test acc 0.842 in 74 secs
Epoch 4: loss 0.393, train acc 0.863, test acc 0.850 in 75 secs
Epoch 5: loss 0.392, train acc 0.867, test acc 0.869 in 76 secs
Epoch 6: loss 0.216, train acc 0.872, test acc 0.870 in 75 secs
Epoch 7: loss 0.347, train acc 0.878, test acc 0.858 in 77 secs
Epoch 8: loss 0.216, train acc 0.880, test acc 0.881 in 83 secs
Epoch 9: loss 0.309, train acc 0.885, test acc 0.878 in 109 secs
done training. saving parameters.

And on the next page, we will use those saved parameters to make predictions.

:: previous ::
automatic differentiation with autograd

table of contents

:: next ::
predict with a pre-trained model