6  Approximate derivatives in Julia


A notebook for this material: ipynb (Pluto html) (With commentary)

6.1 Introduction

We load our packages to begin.

using MTH229
using Plots
plotly()
Plots.PlotlyBackend()

Single-variable calculus has two main concepts: the derivative and the integral, both defined in terms of a third important concept: the limit. The derivative is what we discuss here. The derivative allows us to talk about a tangent line to a curve. This is often called a line that touches the curve at just one point, but that isn’t quite correct, though often the case locally. A better idea would be to say it is the line that best approximates the curve at that point. That is, if we were to zoom in to the graph it might appear to look straight, and that straightness would have a slope that would match the tangent line. This idea leads to a primary use of the derivative – to approximate functions with simpler lines. Of course, this intuition is informal, a definition is in terms of the slope of approximating secant lines.

Historically, Fermat, in a treatise on finding maxima and minima, approached the problem of finding the tangent line by comparing the value of \(f(x)\) to a nearby value \(f(x+h)\). Working with polynomials, meant that subtracting \(f(x+h) - f(x)\) led to a polynomial in \(h\). Dividing by \(h\) and then setting \(h=0\) yields an answer for the slope.

The more modern approach (well Cauchy in the 1820s) defines the derivative of a function, \(f\), at a point \(c\) as the slope of the tangent line, defined in terms of the limit of the slopes of approximating secant lines. The notation is:

\[ f'( c ) = \lim_{h \rightarrow 0} \frac{f(c + h) - f( c)}{h}.\quad\text{Derivative at a point} \]

As mentioned, intuitively, the tangent line is the best straight-line approximation to a function near the point \((c, f( c))\) and would have slope given by the derivative.

This graph shows \(f(x) = 2 - x^2\) and various secant lines when \(c=-0.75\).

f(x) = 2 - x^2
c = -0.75
sec_line(h) = secant(f, c, c+h)                # secant in MTH229 package
plot(f, -1, 1, legend=false)
plot!(sec_line(1))
plot!(sec_line(0.75))
plot!(sec_line(0.5))
plot!(sec_line(0.25))

As the value of \(h\) goes towards 0 along the path \(1\), \(0.75\), \(0.5\), \(0.25\), … the slope of the secant line heads towards the slope of the tangent line, in this case \(2\cdot 0.75 = 1.5\).

Using the idea of a derivative at a point, one defines the derivative of the function \(f\) to be the function which for each \(x\) returns the derivative of \(f\) at the point \(x\). Notationally, this just replaces \(c\) above with \(x\), but conceptually there is a bit more to it.

6.2 Forward difference quotients, Slope of the tangent line

The rules of derivatives allow for a mechanical approach to taking derivatives with the power rule, chain rule and some special cases. In this project we look at approaching the derivative numerically. We start by investigating approximate derivatives computed with finite difference quotients.

The most naive approximation is simply to assume \(h\) is some small number and use that to approximate the limit above:

f(x) = x^2 - 2x; fp(x) = 2x - 2
h = .001
c = 3
( f(c + h) - f(c) ) / h
4.000999999999699

This is known as the forward difference quotient approximation to the derivative. For this example, the difference between the approximation and the actual slope is:

( f(c + h) - f(c) ) / h - fp(c)
0.0009999999996992415

That is not too close, just to 3 digits of accuracy.

Notationally, we may write this as:

\[ f'( c) \approx \frac{f(c + h) - f( c) }{h} \]

For some small value of \(h\). The equals sign is replaced by an approximation symbol, as there is no limit expression written.

Example Derivative at a point

Let’s try the derivative of some well known function. We know the derivative of \(f(x) = \sin(x)\) is \(f'(x) = \cos(x)\). Let’s see if the above works well:

f(x) = sin(x); fp(x) = cos(x)
c = pi/4; h = 0.0001
( f(c + h) - f(c) )/h - fp(c)
-3.535651724428934e-5

Not as good as we can get – the error is like 1e-5, but not too bad already.

Example Finding profit

Suppose the tractor company John Deere models its profit per units sold through the function \(P(x) = 50 e^{x/250}/(1 - e^{x/250})\). Find the marginal profit for \(x=200\).

The marginal profit is the change in profit per unit sold – or the derivative – at \(x=200\). We can find this quite simply by either differentiating directly, or as we do here, approximating the answer with julia. We first note that the function above, is a composition, so it may be written using two functions:

f1(x) = exp(x)/(1 - exp(x))
f(x) = 50 * f1(x/250)
c = 200; h = 0.0001;
res = (f(c+h) - f(c))/h
0.29635326299626286

If \(50\) is the maximum profit, this is a percentage increase of:

(res/50) * 100
0.5927065259925257

or a bit more than half a percent.

Example Finding the tangent line at a point

Let \(f(x) = x^x\). Find the tangent line at \(c=2\). Compare the difference between the value of the tangent line and the function at \(x=2.1\).

The tangent line is most easily expressed in terms of the point-slope formula for a line where the point is \((c, f( c))\) and the slope is \(f'( c)\). This gives:

\[ y = f( c) + f'( c)\cdot(x - c) \]

In the following, the slope of the tangent line will be approximated using a numeric derivative. We use \(h=0.0001\):

f(x) = x^x
c = 2; h = 0.0001
m = ( f(c + h) - f(c) ) / h
tangent_line(x) = f(c) + m * (x - c)
tangent_line (generic function with 1 method)

To compare the difference, we have:

f(2.1) - tangent_line(2.1)
0.07231187980696063

A graph shows this difference:

plot(f, 1.85, 2.15)
plot!(tangent_line)

6.2.1 Practice

Question

Find the slope of the tangent line of \(f(x) = x\log(x)\) at the value \(x=1\) using \(h=0.0001\) and the forward difference quotient method:


Question

Find the slope of the tangent line of \(f(x) = x-\log(x)\) at the value \(x=1\) using \(h=0.0001\) and the forward difference quotient method:


Question

The built-in airyai function, when called as airyai(x), returns the so-called Airy function, a special function named after

George Biddell Airy.

Compute the derivative at \(c=-3\) using \(h=0.0001\).


Question

Let \(f(x) = \sin(x)\). The slope of the tangent line at \(c=\pi/4\) is well known to have slope \(\sqrt{2}/2\).

Compute the forward difference quotient for \(f\) at \(c\) using \(h=0.01\). What is the order of the difference between the two values?

Select an item

Question

For the function \(f(x) = x^3 - 5x\) use a forward difference quotient with \(h=0.0001\) to approximate the slope of the tangent line at \(c=1\). Use this value to create a tangent line at (c, f( c)). Plot both the function and the tangent. Estimate graphically where the tangent line crosses the \(x\) axis. The value is near:

Select an item

6.3 Derivative of a function

We might be interested not in the derivative at a point, but in the derivative as a function. This is how we think of the derivative when we say if \(f(x) = \sin(x)\), the \(f'(x) = \cos(x)\). Mathematically, from this viewpoint, the derivative is referred to as an operator, as it takes a function and returns a function.

We can reproduce this behavior easily enough with julia, as functions are first class objects: they can be passed as arguments and returned as values.

First, let’s define a function to find the derivative at a point using the “forward difference quotient”:

forward_difference(f, x0, h) = (f(x0 + h) - f(x0))/h
forward_difference (generic function with 1 method)

We need three arguments of course as we have three ingredients to worry about: the function, the point and the size of \(h\).

To make an operator that takes \(f\) and returns a function (an anonymous function in this case) that computes our approximation to \(f'\) we can do the following:

Df(f; h=1e-8) = x -> forward_difference(f, x, h)
Df (generic function with 1 method)

We specified a default value of \(h\) for convenience, but allow it to be varied if desired.

To see this work, we can find and plot the derivative of \(\sin(x)\):

f(x) = sin(x)
fp(x) = Df(f)(x)
plot(f, 0, 2pi)
plot!(fp)

Well, we already knew that should just be \(\cos(x)\). The point is we can easily make the an approximate derivative function from f with the definition fp(x) = Df(f)(x).

Parsing Df(f)(x)

Seeing the pair of parenthesis can lead to confusion, as it isn’t common math notation. Here parentheses are not used for grouping, but rather function application. There are two function applications and the associativity rules for function application are left to right, as with most – but not all – operations. So Df(f) is done first. This returns a function (the approximate derivative) which is then evaluated at x.

Defining fp or fp(x)

We could also have just plotted f and Df(f) without naming fp. As well, could have written fp = Df(f), but that creates fp as an anonymous function and we then couldn’t redefine it through fp(x) = ..., which would attempt to make it a generic function.

Let’s look at a different function, where we don’t know in our heads the answer.

f(x) = exp(x)/(1 + exp(x))
plot(f, 0, 5)
plot!(Df(f))

If we look, we can see from the graph that f is increasing and Df(f) is positive – this is no coincidence, of course.

Example: Critical points

A function’s critical points are where its derivative is \(0\) or undefined. We can examine these by graphing the function’s derivative. Let’s do so for the polynomial \(f(x) = x^3 - 5x + 4\):

f(x) = x^3 -5x + 4
fp(x) = Df(f)(x)
plot(fp, -5, 5)

You can check the zeroes graphically, say by zooming in a bit and adding the line \(y=0\):

plot(fp, -2, 2)
plot!(zero)

If you want, we can find the roots numerically.

find_zero(fp, (-2, -1)), find_zero(f', (1, 2))
(-1.2909944653511047, 1.2909944487358056)
Example: When is a function increasing?

Let \(f(x) = \cos(x)\) over the interval \([0, 360]\) (in degrees). When is the function increasing?

From the relationship of the derivative and the function, we know the answer is when \(f'(x) > 0\). For this example, we can solve this directly as \(f'(x) = -\sin(x)\) and we know when that is positive (well you are supposed to anyways: when the angle is in the third and fourth quadrants). Let’s see how we would do this task with julia and compare.

First, we only need the derivative, so we just ask:

f(x) = cosd(x)          # using degrees
fp(x) = Df(f)(x)        # use default h
fp (generic function with 1 method)

Then we wish to indicate on the graph where fp(x) > 0. We can do this by defining a function that is \(0\) when that is the case and NaN otherwise (so that those points are not plotted). We do so below using the plotif function from the MTH229 package

plotif(f, fp, 0, 360)  # second color when fp > 0

6.3.1 Practice

Question

Let \(f(x) = e^x\). Using a forward difference quotient and \(h=0.001\) graph both the function and its approximate derivative over the interval \([0, 3]\). Estimate graphically the largest difference between the function and its approximate derivative.

Select an item

Question

Let \(f(x) = x^2 \exp(-x)\). Using \(h=0.0001\) and an approximate derivative, estimate graphically where \(f'(x)\) is positive on the interval \([0,3]\).

Select an item

Question

The critical points of a function are where its derivative is undefined or where its derivative is \(0\). For the function \(f(x) = 3x^4 -28x^3 + 96x^2 - 144x\) graphically find any critical points over the interval \([0,10]\)

Select an item

6.4 Improvements to the basic forward difference quotient

The error in the approximation of the derivative depends on the size of \(h\). Mathematically, as \(h\) gets smaller we get better approximations, though with the computer other complications arise. To see mathematically that this is the case, let’s look at the difference between the approximate numeric derivative from the forward difference quotient and a known derivative.

f(x) = sin(x)
fp(x) = cos(x)
[ Df(f,h=h)(.5) - fp(.5) for h=[.1, .01, .001, .0001, .00001] ]
5-element Vector{Float64}:
 -0.0254132139820491
 -0.0024117340199198978
 -0.00023985901305767499
 -2.3972739644051444e-5
 -2.3971472203898614e-6

It gets better as \(h\) gets smaller. In fact, it looks like when \(h\) gets smaller by a factor of \(1/10\) so does the error. Let’s look a little deeper though. Rather than type in the values of \(h\) as above, let’s use an expression to compute them. Here we find the powers \(10^{-1}, \dots, 10^{-16}\) at once and then compute the differences:

hs = [(1/10)^i for i in 1:16];
out = [ Df(f,h=h)(.5) - fp(.5) for h in hs ];
[hs out]
16×2 Matrix{Float64}:
 0.1      -0.0254132
 0.01     -0.00241173
 0.001    -0.000239859
 0.0001   -2.39727e-5
 1.0e-5   -2.39715e-6
 1.0e-6   -2.3969e-7
 1.0e-7   -2.4695e-8
 1.0e-8    2.85032e-10
 1.0e-9    2.85032e-10
 1.0e-10  -1.10737e-7
 1.0e-11  -1.22096e-6
 1.0e-12  -6.77208e-6
 1.0e-13   4.87391e-5
 1.0e-14  -0.000506372
 1.0e-15   0.0105959
 1.0e-16   0.23264

When we look, we see that for awhile the approximation gets better (to about 1e-9), but then does a U-turn and starts getting worse. The mathematical approximation gets better and better, but what happens is the computational error gets worse. We’ll see below how to pick the \(h\) that best balances these off, but first lets look at how using different approximations for the derivative can improve the “mathematical” error.

6.4.1 Central difference quotient

It turns out that just by looking to the left and the right we can improve the mathematical error from getting better at a rate of \(h\) to a rate of \(h^2\), provided our function is smooth enough and we don’t have issues on the boundaries. The formula, called the central difference quotient approximation to the derivative is:

\[ f'(x) \approx \frac{ f(x + h) - f(x - h) }{2h} \]

For this the mathematical error is like \(h^2\), not \(h\).

Let’s compare. To make our life easier we again create some functions, as we did with Df above.

central_difference(f, x0, h) = (f(x0 + h) - f(x0 - h)) / (2h)
Dc(f; h=0.0001) = x -> central_difference(f, x, h)
Dc (generic function with 1 method)

Now to see whether a forward difference quotient or central difference quotient works better. We can do so with a table. Again with \(f(x) = \sin(x)\)

f(x) = sin(x)
fp(x) = cos(x)
using_D =  [ Df(f,h=h)(.5) - fp(.5) for h in hs ];
using_Dc = [ Dc(f,h=h)(.5) - fp(.5) for h in hs ];
[hs using_D using_Dc]
16×3 Matrix{Float64}:
 0.1      -0.0254132    -0.00146191
 0.01     -0.00241173   -1.46263e-5
 0.001    -0.000239859  -1.46264e-7
 0.0001   -2.39727e-5   -1.46274e-9
 1.0e-5   -2.39715e-6   -1.75036e-11
 1.0e-6   -2.3969e-7     7.47635e-12
 1.0e-7   -2.4695e-8    -2.70079e-10
 1.0e-8    2.85032e-10   2.85032e-10
 1.0e-9    2.85032e-10   2.85032e-10
 1.0e-10  -1.10737e-7   -1.10737e-7
 1.0e-11  -1.22096e-6   -1.22096e-6
 1.0e-12  -6.77208e-6   -6.77208e-6
 1.0e-13   4.87391e-5    4.87391e-5
 1.0e-14  -0.000506372  -0.000506372
 1.0e-15   0.0105959     0.0105959
 1.0e-16   0.23264       0.23264

The errors for the central difference quotient are either much smaller for the same size \(h\) or the same. We see that we can use a larger \(h\) to get the most accuracy in this example.

Example: When does the tangent line hit 0?

Let \(f(x) = 10/(1+x^2) - 10\exp(-(1/2)x^2)\). The tangent line at \(x=c\) is given by

\[ y = f( c) - f'( c)(x - c) \]

and this intersects the \(x\) axis when \(y=0\). Solving this gives:

\[ c - f( c)/f'( c) \]

Our goal is to compute this value for any \(c > 0\).

Doing so is easy:

f(x) = 10/(1+x^2) - 10*exp(-(1/2)*x^2)
fp(x) = Dc(f)(x)
intersection_point(c) = c - f(c)/fp(c)
intersection_point (generic function with 1 method)

For example, when \(c=1\) we have:

c = 1;
intersection_point(c)
2.000000018978514

You can tell from the graph of \(f(x)\) that this value should be more than 1, as it is.

plot(f, .5, 2.1, legend=false)
plot!(zero)
plot!(x -> f(c) + fp(c)*(x-c))

We could have defined intersection_point to accept a function with:

intersection_point(f, c) = c  - f(c)/Dc(f)(c)

Then it could easily be reused for other problems.

6.4.2 Practice

Question

Let

f(x) = besselj(1, x)

This defines \(f\) as a Bessel function of the first kind with order \(\alpha=1\). Plot the approximate derivative over the interval \([0,5]\). There is one zero. What is its approximate value:


Question

Let \(f(x) = \sin(\cos(x))\). Plot the approximate derivative of \(f\). How many zeros are there between \(5\) and \(20\)?


Question

Let \(f(x) = |x^2 - 2|\). Plot the derivative of \(f(x)\) over the interval \([-2,2]\). How many critical points do you see? (A critical point is where the derivative is \(0\) or undefined.)


6.5 Automatic differentiation

We discuss now Forward Mode Automatic Differentiation.

Automatic differentiation avoids the numeric instability issues of finite difference quotients by using a different approach. Whereas finite difference quotients have a long history, automatic differentiation only dates back to the 50s.

The ForwardDiff package implements automatic differentiation.

The MTH229 package overloads the idea of ', to use automatic differentiation when the notation f' is encountered.

That means, we can find derivatives, as familiarly as:

f(x) = x^x
f'(1)
1.0

Here f is a function, f' is a derived function, and f'(1) is this derived function evaluated at 1 – just as with the use within a math text book.

Unlike, finite difference quotients automatic differentiation does not have issues arising from the loss of precision encountered when subtracting two like-sized numbers. Unlike symbolic derivatives (such as those found by the Wolfram Alpha website used by your iPhone’s Siri) they can be computed quickly, even if the computations defining the function involve many steps.

So what is automatic differentiation? The idea of the forward mode is quite intuitive. The tangent line approximation of a function at the point \(c\) becomes

\[ f(x) = f(c) + f'( c)(x-c) + O((x-c)^2) \]

Where the notation \(O(x-c)^2\) means that the last term is not precisely spelled out, but is like some constant times the difference \((x-c)^2\). This has many applications, but for this one it reduces the knowledge of the function values at this level to two values \((f(c), f'(c))\)

Now consider the product of two functions \(f(x)\) and \(g(x)\)? The above formula applied to \(f \cdot g\) is:

\[ (f\cdot g)(x) = f(c) \cdot g(c) + (f'( c)g( c) + f( c)g'( c))(x-c) + O((x-c)^2) \]

This follows by the product rule. Again we have that two values determine the product \(f\cdot g\) at this level, these being \((f(c) \cdot g(c)\), and \(f'( c)g( c) + f( c)g'( c)\). But, this combination can be made from the values \((f(c), f'(c))\) and \((g(c), g'(c))\) which represent \(f\) and \(g\). Hence it is enough to know the two values for \(f\) and \(g\) to compute the two values for \(f\cdot g\).

In fact, the rules of derivatives allow us to say the same for addition, subtraction, multiplication, division, powers and composition. For composition this is from the chain rule:

\[ (f \circ g)(x) = (f \circ g)(c) + f'( g(c)) g'(c) (x-c) + O((x-c)^2). \]

Automatic differentiation can be implemented by extending how we store a value, storing both \((x, dx)\) as above. With this, the rules of differentials inform us how we combine these values. For example:

  • \[ (x, dx) + (y, dy) = (x + y, d(x+y)) = (x + y, 1dx + 1dy) \]
  • \[ (x, dx) \cdot (y, dy) = (xy, d(xy)) = (xy, y dx + x dy) \]
  • \[ (x, dx)^n = (x^n, d(x^n)) = (x^n, n x^{n-1} dx) \]

etc.

For functions we need to use the chain rule:

  • \[ f((x, dx)) = (f(x), d(f(x))) = (f(x), f'(x) dx ) \]

6.5.1 Examples

Using ', we see that plotting a function and its derivative is straightforward:

f(x) = exp(-x)*sin(x)
plot(f, 0, 2pi)
plot!(f')

Second derivatives are also available:

f(x) = x^2 - 2x
plot(f, -2, 2)
plot!(f')
plot!(f'')

Here is an example where we plot the tangent line and the function. The MTH229 package provides this function to compute a tangent line function:

tangent(f, c) = x -> f(c) + f'(c)*(x-c)

This is employed as follows:

f(x) = x^x
c = 1
plot(f, 1/2, 2)
plot!(tangent(f, c))

Example: AP Calculus question

The 2014 AP Calculus exam included this question:

Grass clippings are placed in a bin, where they decompose. For \(0 \leq t \leq 30\), the amount of grass clippings remaining in the bin is modeled by \(A(t) = 6.687(0.931)^t\). Where \(A(t)\) is measured in pounds and \(t\) is measured in days.

  • Find the average rate of change of \(A(t)\) over the interval of \(0 \leq t \leq 30\). Indicate units of measure.
  • Find the value of \(A'(15)\). Using correct units, interpret the meaning of the value in the context of the problem.
  • Find the time \(t\) for which the amount of grass clippings in the bin is equal to the average amount of grass clippings in the bin over the interval \(0 \leq t \leq 30\).
  • For \(t > 30\), \(L(t)\), the linear approximation to \(A\) at \(t=30\) is a better model for the amount of grass clippings remaining in the bin. Use \(L(t)\) to predict the time at which there will be 0.5% pound of grass clippings remaining in the bin. Show the work that leads to your answer.

We can do all this relatively quickly with the find_zero function from the Roots package.

First, we define \(A(t)\):

A(t) = 6.687 * (0.931)^t
A (generic function with 1 method)

The first question is answered by the slope of the secant line connecting \((0,A(0))\) and \((30, A(30))\):

m = (A(30) - A(0)) / (30 - 0)
-0.1968024043540685

The units are in pounds per day.

The second question is done through the derivative, as follows:

A'(15)
-0.1635905803884566

This also has units in pounds per day. This is the instantaneous rate of decomposition, and is less (in absolute value) than the average decay rate.

The mean value theorem tells us that there is at least one time where the instantaneous decomposition rate equals the average rate over the interval \([0,30]\). Thus the endpoints form a bracketing interval for the equation \(A'(t) = m\), which we solve with:

find_zero(t -> A'(t) - m, (0, 30))
12.414773526372592

So at 12.4 days, the instantaneous rate is equal to the average rate of decomposition over the first 30 days.

But this doesn’t really answer the question asked, which is about the average amount of grass clippings in the bin. For this we need the integral form of the mean value theorem. Skipping ahead a few lessons to solve this problem, we have:

a, err = quadgk(A, 0, 30)   # area under A from 0,30
avg = a/(30-0)
find_zero(t -> A(t) - avg, (0, 30))
12.414773526372596

We get the same answer – think about whether that is a coincidence.

Finally, we can define \(L(t)\) using the point-slope form of the line. The point is \((30, A(30))\) and the slope is \(A'(30)\):

L(t) = A(30) + A'(30) * (t - 30) # t > 30
L (generic function with 1 method)

We could restrict \(L\) to insist that \(t > 30\), but this isn’t necessary. We are asked to solve this. Using find_zero requires a bracket. We check that \(L(30) > 0.5\) and that \(L(100)\) is negative:

L(30), L(100)
(0.7829278693779448, -3.135406990501517)

So, we can use \([30, 100]\) as a bracket for \(L(t) - 0.5 = 0\):

find_zero(t -> L(t) - 0.5, (30, 100))
35.054430405946825

So a bit over 35 days to decompose to half a pound.

Here is a picture, showing the decay:

f(t) = t <= 30 ? A(t) : L(t)
plot(f, 0, 40)

6.5.2 Practice

Question

Which definition would be right for \((x,dx) - (y, dy)\)?

Select an item

Question: find tangent line intersection

Redefine intersection_point as:

intersection_point(f, c) = c - f(c)/f'(c)
intersection_point (generic function with 2 methods)

Now for \(f(x) = 10/(1+x^2) - 10 \cdot \exp(-(1/2) \cdot x^2)\) and \(c=1\) find the intersection point.


Question: what is the size of the error?

We can compare the error at many different points with the following construct

f(x) = exp(-x)*sin(x)
fp(x) = exp(-x)*(cos(x) - sin(x))
[abs(f'(x) - fp(x) ) for x in range(0, pi, 25)]

What is the order (exponent) of the largest difference above?

Select an item

6.6 Error analysis

We end with a digression on how big h should be to have the approximate derivatives as close as can be.

The choice of \(h\) in the derivative approximations above was done somewhat arbitrarily, with only the intuition that smaller \(h\)’s should produce more accurate approximations (which of course may be way off, as we are subtracting like-sized values in the derivative formula). Here we look a bit more closely at how to choose \(h\).

As mentioned, with a bit of work from the second semester of calculus one can learn that the mathematical error in the forward difference quotient is “order \(h\)” whereas the mathematical error in the central difference quotient is “order \(h^2\). This means that as \(h\) gets small the approximation error in the first is a multple of \(h\), but for the second a multiple of \(h^2\) – a much smaller error in general.

However there is also error due to floating point approximation. Such error is different in that it gets bigger as \(h\) gets smaller. So one error gets bigger, the other gets smaller. So clearly if \(h\) gets too small, the floating point error will dominate and the overall error will not get smaller, rather larger.

So, how big is the floating point error? For any given number, it can be roughly as big as the the machine precision amount, \(\epsilon\). So in the forward-difference quotient approximation we could have errors in both terms \(f(x+h)\) and \(f(x)\) so the total error could be as big as \(2\epsilon\). But this is divided by \(h\) as we have:

\[ \frac{f(x+h) - f(x)}{h} = \frac{float(f(x +h)) + \epsilon - float(f(x_) + \epsilon}{h} = \frac{float(f(x+h))-float(f(x))}{h} + \frac{2\epsilon}{h}. \]

The errors may or may not cancel so the algebra with \(\epsilon\) is unusual to the untrained eye. It basically takes into account the worse case.

The key is the \(2\epsilon/h\) term – it gets bigger as \(h\) gets smaller.

So if each floating point approximation is no more off than \(\epsilon\), we have this bound on the error:

\[ \text{error} \leq \frac{2\epsilon}{h} + (M/2)h \]

Where \(M\) is a constant that depends on the maximum absolute value of the second derivative and the \(1/2\) comes from second semester calculus. Choosing \(h\) to make this as small as possible is possible with some calculus involving the derivative, but for now we simply note the answer is \(h=((2\epsilon)/(M/2))^{1/2}\).

Taking \(\epsilon\) as machine tolerance and (conveniently) \(M=1\) we get

\[ h_{min} = \sqrt((2\epsilon)/(1/2)) \approx 10^{-8} \]

Or more precisely:

sqrt( (2*eps())/(1/2) )
2.9802322387695312e-8

We can check this is about right, by graphing on a different scale:

f(h; M=1) = 2*eps()/h + (M/2)*h
g(x) = log10(f(10.0^(-x)))
plot(g, 1, 10)          # min near 8, which means h like 1e-8

Numerically, we can check how this basically works for some test function:

f(x) = exp(x) - x^3
fp(x) = exp(x) - 3x^2
[hs [Df(f,h=h)(.5) - fp(.5) for h in hs] ]
16×2 Matrix{Float64}:
 0.1      -0.074746
 0.01     -0.00682885
 0.001    -0.000676365
 0.0001   -6.75712e-5
 1.0e-5   -6.75649e-6
 1.0e-6   -6.75645e-7
 1.0e-7   -6.65768e-8
 1.0e-8    4.47752e-9
 1.0e-9   -1.73158e-7
 1.0e-10  -1.73158e-7
 1.0e-11   4.26773e-6
 1.0e-12  -0.000106755
 1.0e-13   0.000559379
 1.0e-14  -0.0105429
 1.0e-15  -0.0105429
 1.0e-16   1.32172

We see the best case is around \(h=10^{-8}\), as expected by the theory.

6.6.1 Approximation errors for the central difference quotient

For the central difference quotient, the errors are different. The error in \(h\) becomes:

\[ \text{error} \leq (2\epsilon)/(2h) + (M/6) h^2 \]

This gives \((3\epsilon/M)^{1/3}\) as the choice for \(h\) that minimizes the right-hand expression.

Taking \(\epsilon\) as the machine tolerance and (conveniently) \(M=3\) we get

\[ h_{min} \approx 10^{-6} \]

We can again check how this works for our function and point:

[hs [ Dc(f,h=h)(.5) - fp(.5) for h in hs] ]
16×2 Matrix{Float64}:
 0.1      -0.00725076
 0.01     -7.25212e-5
 0.001    -7.25213e-7
 0.0001   -7.2531e-9
 1.0e-5   -8.54954e-11
 1.0e-6    3.6629e-11
 1.0e-7    3.6629e-11
 1.0e-8    4.47752e-9
 1.0e-9   -6.21359e-8
 1.0e-10  -1.73158e-7
 1.0e-11   4.26773e-6
 1.0e-12  -0.000106755
 1.0e-13   0.000559379
 1.0e-14  -0.0105429
 1.0e-15   0.100479
 1.0e-16   1.32172

6.6.2 Practice

Question: best depends on the function and value

A subtle point above is that we are minimizing an upper bound in \(h\) with an assumption, not the actual error. The actual answer may differ as it depends on a function and the point of evaluation. Look at the function \(f(x) = x^2 - 2x\) and compare the values at \(0.5\) as above:

hs = [(1/10)^i for i in 1:16]
f(x) = x^2 - 2x
fp(x) = 2x - 2
[hs [Dc(f,h=h)(.5) - fp(.5) for h in hs] ]
16×2 Matrix{Float64}:
 0.1       2.22045e-16
 0.01     -6.66134e-16
 0.001    -6.66134e-16
 0.0001    1.10356e-13
 1.0e-5   -9.99867e-13
 1.0e-6   -2.87552e-11
 1.0e-7   -2.87552e-11
 1.0e-8   -5.02476e-9
 1.0e-9   -2.72292e-8
 1.0e-10  -8.27404e-8
 1.0e-11  -8.27404e-8
 1.0e-12   2.21217e-5
 1.0e-13   0.000244166
 1.0e-14   0.000799278
 1.0e-15   0.000799278
 1.0e-16  -0.110223

What is the best value of \(h\)?

For which of these values of \(h\) is the error smallest?

Select an item

Question: error analysis

Consider these commands:

f(x) = sin(x)
fp(x) = cos(x)
[hs [ Dc(f,h=h)(.5) - fp(.5) for h in hs] ]

Do they show the smallest error around \(10^{-6}\) as expected?

Select an item

Julia version:

VERSION
v"1.10.2"

Packages and versions:

using Pkg
Pkg.status()
Status `~/work/mth229.github.io/mth229.github.io/Project.toml`
  [7073ff75] IJulia v1.24.2
  [b964fa9f] LaTeXStrings v1.3.1
  [ebaf19f5] MTH229 v0.2.12
  [91a5bcdd] Plots v1.40.4
  [f27b6e38] Polynomials v4.0.7
  [438e738f] PyCall v1.96.4
  [612c44de] QuizQuestions v0.3.21
  [295af30f] Revise v3.5.14
  [f2b01f46] Roots v2.1.5
  [24249f21] SymPy v2.0.1
  [56ddb016] Logging