Quickstart Tutorial

This tutorial will guide you through the basic functionalities of ForwardModeAD. The library allows to compute derivatives of functions using forward mode automatic differentiation. Particularly, it achieves this by implementing dual numbers and overloading arithmetic operations and elementary functions for them.

Setup

Using Mason

If you are writing you application with Mason, all you have to do is run

mason add ForwardModeAD

to add the library as dependency.

To use the library you will need to import it with

use ForwardModeAD;

and you are ready to go.

Installing manually

First, clone the repository.

You can use the library in your chapel code with

use ForwardModeAD;

Finally, compile your code as

chpl mycode.chpl -M $prefix/ForwardModeAD/src/

where prefix is the path where you cloned the repository in.

Derivatives in one dimension

Suppose we have a function

proc f(x) {
    return x**2 + 2*x + 1;
}

And we want to evaluate the function at a point \(x_0=0\). We can convert the point to the appropriate dual number using the initdual function and evaluate f at the resulting dual number.

var valder = f(initdual(0.0));

The resulting variable valder is an object of type dual and the function value and derivative can be accessed with the functions value and derivative, respectively.

writeln(value(valder)); // value of f(0)
writeln(derivative(valder)) // value of f'(0)
1.0
2.0

Alternatively, you can pass to derivative a function and value (the point where to evaluate the derivative) directly. It is important to notice that the function passed to derivative must be a concrete function (type signature specified) that takes as input a dual. If your function is generic (as in the example above), you can achieve this passing a lambda function, as the example below demonstrates.

var dfx0 = derivative(lambda(x : dual) {return f(x);}, 0.0);
writeln(dfx0);
2.0

Computing the gradient

The gradient of a multivariate function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\), can be computed the same way of the derivative using initdual. The only difference is that the input is now initialized to an array of multidual and the gradient is extracted using gradient. In the following example, we compute the gradient of \(h(x, y) = x^2 + 3xy+1\) at the point \((1, 2)\). Note in the implementation below that the function should accept a single array as input.

proc h(x) {
    return x[0] ** 2 + 3 * x[0] * x[1];
}

var valgrad = h(initdual([1.0, 2.0]));
writeln(value(valgrad) // prints the value of h(1.0, 2.0)
writeln(gradient(valgrad)) // prints the value of ∇h(1.0, 2.0)
7.0
8.0 3.0

Similarly to the previous example, gradient can also take a function as input. In this case, you will need to first specify the domain as a type alias. If your function has \(n\) variables, then this can be achieved with the line

type D = [0..#2] multidual

Next, we can compute the gradient similarly to before

var dh = gradient(lambda(x : D){return h(x);}, [1.0, 2.0]);
writeln(dh);
8.0 3.0

Computing the Jacobian

For many-variables manyvalued functions \(F:\mathbb{R}^m\rightarrow\mathbb{R}^n\) we can compute the Jacobian \(J_F\). Both methods described so far still apply.

Using initdual the strategy is very similar to before, except that now the Jacobian should be extracted using the jacobian function.

proc F(x) {
 return [x[0] ** 2 + x[1] + 1, x[0] + x[1] ** 2 + x[0] * x[1]];
}

var valjac = F(initdual([1.0, 2.0]));
writeln(value(valjac), "\n");
writeln(jacobian(valjac));
4.0 7.0

2.0 1.0
3.0 5.0

Note that the function should take an array an input and return an array as output.

Alternatively, jacobian can take as input the function and the point and it returns the jacobian at that point. The same restrictions of gradient apply:

  • The function should be concrete with input [D] multidual

  • The domain [D] multidual should be explicitly written as type alias.

Using the example function above

type D = [0..#2] multidual

var J = jacobian(lambda(x : D){return F(x);}, [1.0, 2.0]);
writeln(J);
2.0 1.0
3.0 5.0

Computing directional derivative and Jacobian-vector product

In some applications, instead of the gradient, one may need to compute the directional derivative, that is, the dot product \(\nabla f(\mathbf{x})\cdot \mathbf{v}\), where \(\mathbf{v}\) is the direction vector. Instead of computing the gradient and dot product separately, one can directly compute the directional derivative by evaluating \(f\) over the vector of dual numbers \([x_1+v_1\epsilon,\ldots,x_n+v_n\epsilon]^T\) and taking the dual number of the result.

In practice, this is achieved by passing both the point x and the direction v to initdual. The directional derivative can be extracted using directionalDerivative.

proc f(x) {
 return x[0] ** 2 + 3 * x[0] * x[1];
}

var dirder = f(initdual([1, 2], [0.5, 2.0]));

writeln(value(dirder));
writeln(directionalDerivative(dirder));
7.0
10.0

Similarly, for many-valued functions, one may compute the Jacobian-vector product (VJP) \(J_F\mathbf{v}\) directly using the same strategy. The JVP can be extracted using the jvp function.

proc F(x) {
  return [x[0] ** 2 + x[1] + 1, x[0] + x[1] ** 2 + x[0] * x[1]];
}

var valjvp = F(initdual([1, 2], [0.5, 2.0]));

writeln(value(valjvp), "\n");
writeln(jvp(valjvp));
4.0 7.0

3.0 11.5

As for the previous cases, directionalDerivative and jvp can also take a function as input. Similarly to gradient and jacobian, the function passed cannot be generic and the domain must be written down as a type variable.

type D = [0..#2] dual;

var dirder = directionalDerivative(lambda(x: D) {return f(x);}, [1, 2], [0.5, 2.0]);
    Jv     = jvp(lambda(x: D) {return F(x);}, [1, 2], [0.5, 2.0]);

writeln(dirder, "\n");
writeln(Jv);
10.0

3.0 11.5