```
using MTH229
using Plots
plotly()
```

`Plots.PlotlyBackend()`

A notebook for this material: ipynb

Newton’s method is an old method for *approximating* a zero of a function, \(f(x)\):

\[ f(x) = 0 \]

Previously we discussed the *bisection method* which applied for some continuous function \(f(x)\) which changed signs *between* \(a\) and \(b\), points which bracket a zero. Not only did we need to find these bracketing points – which wasn’t hard from a graph, more importantly the actual algorithm is pretty slow.

If the function \(f\) is sufficiently differentiable, then Newton’s method may work to find a zero. Unlike the bisection method which is slow but guaranteed to find a root by the intermediate value theorem, Newton’s method is fast (once it is close) but has no such guarantee of converging.

Newton’s method is useful to rapidly find a zero when there is a *nearby* initial guess (and a good function); the bisection method is useful when there is a zero *between* two values, which need not be nearby.

In this section, we’ll see how to implement the algorithm, try some examples, and then look at what can go wrong.

The `MTH229`

package provides a function `newton`

for easily performing Newton’s method, utilizing a function from the `Roots`

package. More usefully, we will see that `find_zero`

, which we used for bisection, can also be used for root finding with an algorithm that is a bit more robust than Newton’s method.

To begin, we load `MTH229`

(and `Plots`

); in the background loading `MTH229`

loads `Roots`

and its functions.

```
using MTH229
using Plots
plotly()
```

`Plots.PlotlyBackend()`

The idea behind Newton’s method is simple – *linear approximation*.

That is, most functions at any given point are well approximated by the tangent line at that point. If we have a good, *nearby*, guess \(x_n\), then we can improve this guess by replacing it with the easy to find zero, \(x_{n+1}\), of the tangent line at \((x_n, f(x_n))\).

One step of Newton’s method illustrating how each step of the algorithm improves the previous one.

A simple picture shows that we have a triangle with base \(x_{n+1} - x_{n}\), rise \(0 - f(x_n)\) and slope \(f'(x_n)\), using the “rise over run” formula:

\[ f'(x_n) = \frac{-f(x_n)}{x_{n+1} - x_{n}}. \]

The basic algorithm of Newton’s methods solves this to get:

\[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. \]

Some books write the right-hand side as \(x_n - f'(x_n)^{-1} f(x_n)\), a form that generalizes to different settings.

Like the bisection method, Newton’s method is an *iterative method*. One begins with a (suitable, usually nearby) guess \(x_1\). From that the algorithm produces \(x_2\) which is used to produce \(x_3\), etc. The idea is that the algorithm eventually will settle on a value \(x_n\) sufficiently close to the desired root. We then call \(x_n\) an *approximate* zero.

Mathematically, the indices indicate that the right hand side is computed and assigned to the left hand side. This is exactly what is done in assignment within `Julia`

, so the above simply becomes:

`= x - f(x)/f'(x) x `

Where `f(x)`

is the function and `f'(x)`

its derivative. (In this case found by automatic differentiation.) This line starts with a previously defined value of `x`

and updates it accordingly.

The updating is continued – by executing the exact same command – until either the algorithm has gotten close enough to an answer (i.e., it has converged) or we have given up on it converging.

Here is an example to find a zero of the function: \(f(x) = x^3 - 2x - 5\).

A quick graph shows a root near 2:

```
f(x) = x^3 - 2x - 5
plot(f, -3, 3)
```

Here we improve the estimate for the root near 2 using Newton’s method. We will need the first derivative, which we denote `fp`

```
fp(x) = 3x^2 - 2
= 2 # starting value, x_1
x
= x - f(x)/fp(x) # new value, x_2
x
f(x) x,
```

`(2.1, 0.06100000000000083)`

We can visualize our progress as follows, noting that `x`

holds \(x_2\), and zooming into the domain \([1.9, 2.2]\):

```
= 2, x
x1, x2 plot(f, 1.9, 2.2, legend=false)
plot!(zero)
plot!([x1, x1, x2], [0, f(x1), 0]) # why these values?
scatter!([x1, x2], [0, 0])
```

Continuing, though without visualizing the progress, this next step will compute \(x_3\):

```
= x - f(x)/fp(x)
x f(x) x,
```

`(2.094568121104185, 0.00018572317327247845)`

This next step will compute \(x_4\):

```
= x - f(x)/fp(x)
x f(x) x,
```

`(2.094551481698199, 1.7397612239733462e-9)`

This next step will compute \(x_5\):

```
= x - f(x)/fp(x)
x f(x) x,
```

`(2.0945514815423265, -8.881784197001252e-16)`

This next step will compute \(x_6\):

```
= x - f(x)/fp(x) # x stopped changing
x f(x) x,
```

`(2.0945514815423265, -8.881784197001252e-16)`

We see that \(x_5=x_6\), so the algorithm has stabilized. We also see that \(f(x_5)\) is basically \(0\) (Recall, `eps()`

is the machine precision, or the size of the difference between floating point values at \(1.0\) is basically \(10^{-16}\), the size of \(f(x_5)\).)

You can see in this case that the convergence happens quickly as soon as the algorithm gets close.

The approximate root is \(x_5\). It is important to realize that the actual, *exact*, answer is not likely to be the value computed by Newton’s method, which we call `xstar`

at times. In most cases, the true answer will be irrational and `xstar`

a floating point number, which ultimately can never be better than an approximation to an irrational number.

The above example iterated until it was clear the algorithm does not improve itself, as the values returned did not change. This need not be the case for every problem.

Rather, we can determine two ways that the number is close enough to the answer:

- The sequence of
`x`

’s stop changing by much - the values
`f(x)`

get close enough to zero.

In the above, the first one was used. In either case, rather than look for values to be equal (e.g. \(x_{n+1} = x_{n}\) or \(f(x_n) = 0\), we look at whether these two things are close enough to be so. That means for some tolerance, we stop when the change in the `x`

’s is smaller than the tolerance or `f(x)`

is smaller – in absolute value – than the tolerance.

The above approach – basically repeating steps – can be tedious. There will be a function to do this for you (`newton`

). One can use copy and paste to do much of this though:

```
f(x) = x^3 - 2x - 5
= 2
x
= x - f(x)/f'(x)
x = x - f(x)/f'(x)
x = x - f(x)/f'(x)
x = x - f(x)/f'(x)
x = x - f(x)/f'(x)
x
f(x)) (x,
```

`(2.0945514815423265, -8.881784197001252e-16)`

Note

Newton looked at this same example in 1699 (B.T. Polyak, *Newton’s method and its use in optimization*, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, *per se*, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of Newton-Raphson method.

Using Newton’s method to find \(\sqrt{k}\) (by solving for roots of \(f(x) = x^2 - k\)) is also referred to as the Babylonian method, due to its origins. The resulting method

\[ x_{n+1} = \frac{1}{2}(x_n + \frac{k}{x_n}) \]

is described by the first-century Greek mathematician Hero of Alexandria.

Let \(k=15\) and \(x_1\) be \(4\). What is the value of \(x_4\)?

`LoadError: UndefVarError: `x0` not defined`

The function \(f(x) = \sin(x)\) has derivative \(f'(x) = \cos(x)\). Use Newton’s method to solve \(f(x) = 0\) starting at \(3\). Repeat 5 times. What value do you get for `x`

?

(This can be used to compute \(\pi\) numerically, as the convergence is very fast. Here it takes 4 steps to verify.)

Let \(f(x) = x^2 - 3^x\). This has derivative \(2x - 3^x \cdot \log(3)\). Starting with \(x_1=0\), what does Newton’s method converge on?

For iterative algorithms it is better to repeat the expression until something happens – not a fixed number of times. In this case, we need a criteria to decide if the algorithm has converged. We shall use the following:

- If the value of \(|x_{n+1} - x_n| < tol\) the algorithm has converged
- If the value of \(|f(x_n)| < tol\) the algorithm has converged
- If the above hasn’t happened by \(n=100\) the algorithm fails to converge

This isn’t perfect, but will be sufficient. (Well, in fact no stopping rule can be perfect, but this one doesn’t account for the relative size of the \(x_n\)s which can be important.)

The first two steps require a tolerance. We will use `1e-14`

for this. This is about 100 times the machine precision, `eps()`

, which is sufficient when the answers are moderate in size. This is not very good if the answers are very large.

A basic algorithm is to repeat a step of Newton’s method until the above occurs. We wrap this up in a function for reuse, and employ a `while`

loop to repeat the update step until something happens:

```
function nm(f, fp, x)
= x, Inf
xnew, xold = f(xnew), Inf
fn, fo
= 1e-14
tol = 1
ctr
while (ctr < 100) && (abs(xnew - xold) > tol) && ( abs(fn - fo) > tol )
= xnew - f(xnew)/fp(xnew) # update step
x = x, xnew
xnew, xold = f(xnew), fn
fn, fo = ctr + 1
ctr end
if ctr == 100
error("Did not converge in 100 steps")
else
xnew, ctrend
end
```

`nm (generic function with 1 method)`

Here we use the `nm`

function to find a zero of this polynomial:

```
f(x) = x^3 - 5x + 1
fp(x) = 3x^2 - 5
= nm(f, fp, 0) # takes 6 steps xstar, ctr
```

`(0.20163967572340466, 6)`

However, the `MTH229`

package provides the `newton`

function. So we shall use that in the sequel. To see the number of steps, the argument `verbose=true`

may be given.

We revisit a problem from a previous project, finding zeroes of the function \(f(x) = \exp(x) - x^4\). We know from previous work that there are three of them. Let’s find one near \(x=2\):

```
f(x) = exp(x) - x^4
= 2
x = newton(f, 2) # newton will use automatic differentiation for the derivative xstar
```

`1.4296118247255556`

It took 8 steps and we are this close:

`f(xstar) xstar, `

`(1.4296118247255556, 0.0)`

In this case, the answer is exact up to floating point round off.

Repeat the problem of finding a root of \(f(x) = \exp(x) - x^4\) starting at \(x=2\). (`newton(f, 2, verbose=true)`

). How many iterations does it take with the default tolerances?

If we repeat with \(f(x) = \exp(x) - x^4\) but start now at \(x=8\) where does the algorithm converge?

Let \(f(x) = \sin(x) - \cos(4\cdot x)\).

Starting at \(\pi/8\), solve for the root returned by Newton’s method

In order to use Newton’s method we need to evaluate \(f'(x)\). We have used automatic differentiation above through `f'(x)`

. Automatic differentiation returns a numerically accurate value for the derivative.

However, Newton’s method is actually fairly robust to using other related values to the derivative. That is the method will converge, though perhaps not as fast as with the derivative.

The secant method is perhaps the oldest numerical linear algebra tool dating back over 3000 years well before Newton’s method. Rather than use the derivative at \(x_i\) to compute \(x_{i+1}\), the secant line is used between \(x_{i-1}\) and \(x_i\). This method will also converge to a zero with a good starting point, though not nearly as quickly as Newton’s method.

You can check – if you want – by repeating the last command until the change in `x2`

is within your tolerance:

```
= 1, 2 # initial guess of 2
x2, x1 f(x) = x^2 - 2 # some function
fp(x1,x2) = (f(x1) - f(x2))/(x1 - x2)
= x2 - f(x2)/fp(x1, x2), x2 # update step x2, x1
```

`(1.3333333333333333, 1)`

We can repeat via copy and paste:

```
= x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1
f(x2) x2,
```

`(1.4142135623730947, -8.881784197001252e-16)`

The last line shows the algorithm has basically converged, as the values agree to \(10^{-14}\). We have

Recall the forward difference approximation to the derivative:

\[ f'(x) \approx \frac{f(x + h) - f(x)}{h} \]

For some small \(h\) (with \(h=10^{-8}\) a reasonable choice for many functions). This can be used

One can also use approximate derivatives based on forward differences in place of \(f'(x)\) in the formula. Again, this won’t be as fast.

The update step \(x - f(x)/f'(x)\) becomes

\[ x - \frac{h \cdot f(x)}{f(x+h) - f(x)}. \]

The issue with this approximation is when the estimated value gets close to the actual root, the value of \(h\) becomes too large. Steffenson’s method replaces \(h\) with \(f(x)\), which for values close to a root gets quite small. This improves the convergence rate to be on par with Newton’s method. In this case, the update step looks like

\[ x - \frac{f(x)^2}{f(x+ f(x)) - f(x)}. \]

Use the secant method to find a root to \(f(x) = \cos(x) - x^3\) starting with \(x_1=1/2\) and \(x_2=1\).

Use the secant method to find a root to \(f(x) = x^5 + x - 1\) starting with \(x_1=1/2\) and \(x_2=1\).

`find_zero`

functionThere are also very fast algorithms which do not require a derivative. The `Roots`

package provides an interface to these through the `find_zero`

function.

The `find_zero`

function has two interfaces:

- when called with a bracketing interval, as in
`find_zero(f, (a,b))`

, it will use a bisection method to find a root. - when called with a starting point, as in
`find_zero(f, a)`

, it will use an iterative algorithm to search for a root.

Many bracketing methods (like bisection) are guaranteed to converge, but can be slow. The iterative algorithm used by default with `find_zero`

tries to speed the convergence up, but if along the way it finds a bracketing interval, that will guarantee convergence.

We focus on the simplest usage of `find_zero`

where an initial guess is supplied and the default order is used. Here is an example to find \(-\sqrt{2}\):

```
f(x) = x^2 - 2
find_zero(f, -1)
```

`-1.4142135623730951`

`find_zero`

and a graph to find all roots.Let’s look, again, at the task of finding all zeros to the function \(e^x - x^4\). We follow a standard approach:

- graph the function to roughly identify potential zeros
- use these as starting points for
`find_zero`

to improve

The following graph suggests, perhaps, there may be \(3\) zeros, one near \(9\), one near \(2\) and one near \(-1\).

```
f(x) = exp(x) - x^4
plot(f, -5, 9)
```

We can improve these guesses with

`find_zero(f, 9), find_zero(f, 2), find_zero(f, -1)`

`(8.6131694564414, 1.4296118247255556, -0.8155534188089607)`

Avoiding repetition

The above can be written without repeating `find_zero`

by using a comprehension:

`find_zero(f, x) for x in [9, 2, -1]] [`

```
3-element Vector{Float64}:
8.6131694564414
1.4296118247255556
-0.8155534188089607
```

Or even more compactly, using the broadcast notation:

`find_zero.(f, [-1, 2, 9])`

```
3-element Vector{Float64}:
-0.8155534188089607
1.4296118247255556
8.6131694564414
```

As another illustration, let \(f(x) = \cos^2(x^2)\) on \([-1,2]\). Find all the zeros of the derivative of \(f(x)\).

We graph the derivative to identify starting points:

```
f(x) = cos(x^2)^2
plot(f', -1, 2)
```

We see there are 3 potential zeros, one near 0, one near 1.2 and close to 1.7. Here we improve our guesses:

`= find_zero.(f', [0, 1.2, 1.7]) # or [find_zero(f', x) for x in [0, 1.2, 1.7]] xs `

```
3-element Vector{Float64}:
0.0
1.2533141373155003
1.772453850905516
```

The function values at these points can be found with

`f.(xs) # or map(f, xs) or [f(x) for x in xs]`

```
3-element Vector{Float64}:
1.0
2.5860584564030066e-32
1.0
```

Solving \(f(x) = g(x)\) requires representing the difference of the two sides. For example, find the zero of \(cos(x) = 7x\) in \((0, \pi/2)\). A graph will show a zero left of \(0.2\), which we take as an initial value:

```
f(x) = cos(x)
g(x) = 7x
eqn(x) = f(x) - g(x)
find_zero(eqn, 0.2)
```

`0.14143076140757285`

Let

\[ f(x) = 4x^4 -5x^3 + 4x^2 -20x - 6 \]

Apply Newton’s method with \(x_1=0\) using an automatic derivative. What value does it converge to?

Let’s try with a function where the derivative is not known easily. If we set

\[ f(x) = x^x - 2 \]

Can we find a root using Newton’s method, where \(x > 0\)?

We graph the function to see, using a smallish interval at first:

```
f(x) = x^x - 2
plot(f, 0, 2)
plot!(zero)
```

Eyeing this, we pick an initial point, \(1\), for Newton’s method (`newton(f, 1)`

) to the right of the minimum, which appears to be around \(x=0.35\).

What is the value of the approximate zero?

Use `find_zero`

to find the one root of `x^5 - x - 1`

. First plot to get an estimate.

Let \(f(x) = 5/\sin(x) + 8/\cos(x)\), Starting at \(x=\pi/4\), use `find_zero`

to find a root of the derivative of \(f(x)\) given by `f'`

.

The tangent line of `f`

at `c`

can be computed using the point-slope form, \(f(x) = f(c) + f'(c) \cdot (x - c)\). However, we employ the `tangent(f,c)`

function of the `MTH229`

package, which returns a function that represents the tangent line.

Let \(f(x) = x^2 - 3x + 5\). Use `find_zero`

to find the intersection point of the tangent line at \(1\) and the tangent line at \(3\). Where does this happen?

(Hint, let `t1`

and `t3`

be defined using `tangent(f,c)`

for the appropriate `c`

and then apply `find_zero`

to `eqn(x) = t1(x) - t3(x)`

starting at \(1\).)

As great as Newton’s method is, it won’t always work for various reasons, some of which are described in the following. Here is what you need to keep in mind. Newton’s method works well if

- The zero is a simple root – that is of multiplicity 1
- The magnitude, \(|f'(x)|\), is not too small (If the tangent line is nearly flat, the next guess is far from the previous)
- The magnitude, \(|f''(x)|\), is not too big (function doesn’t have so much curve that the tangent line is a poor approximation)
- The initial guess is not to far from a zero

The above points come from the following formula which you can find in many texts.

\[ \Delta x_{i+1} = \frac{f' '(\alpha)}{f'(\alpha)}(\Delta x_i)^2 + \text{error} \]

which is valid when \(f(x)\) satisfies \(f(\alpha) = 0\), the third derivative exists near \(\alpha\), and \(\Delta x_i = x_i - \alpha\) is the error between the zero and the \(i\) th estimate. When the derivative at \(\alpha\) is non-zero, the error is basically a constant times \(\Delta x_i^2\). This is interpreted as saying there is quadratic convergence in the error, as the next one is related to the previous one squared.

Now we look at some cases where the above three points do not hold.

Let \(f(x) = \sin(x) - x/4\) and \(x_1 = 2\pi\). This value is deliberately a poor choice:

```
f(x) = sin(x) - x/4
fp(x) = cos(x) - 1/4
newton(f, fp, 2pi, verbose=true)
```

```
Results of univariate zero finding:
* Converged to: -2.4745767873698292
* Algorithm: Roots.Newton()
* iterations: 22
* function evaluations ≈ 44
* stopped as |f(x_n)| ≤ max(δ, |x|⋅ϵ) using δ = atol, ϵ = rtol
Trace:
x₁ = 6.2831853071795862, fx₁ = -1.5707963267948968
x₂ = 8.3775804095727828, fx₂ = -1.2283696986087573
x₃ = 6.7397541447611076, fx₃ = -1.2440675381169748
x₄ = 8.6608848638032576, fx₄ = -1.4734831418043957
x₅ = 7.145187249669843, fx₅ = -1.0271496267351363
x₆ = 9.7071751202999792, fx₆ = -2.7054524356150202
x₇ = 7.471984835940221, fx₇ = -0.94007407756685935
x₈ = 15.128927112593999, fx₈ = -3.2350143158338276
x₉ = 12.152806896784888, fx₉ = -3.4400768736735046
x₁₀ = 17.320458159488304, fx₁₀ = -5.3292452806856589
x₁₁ = -8.262352431213138, fx₁₁ = 1.1478190870135017
x₁₂ = -6.4886031684043539, fx₁₂ = 1.4181745383583302
x₁₃ = -8.4340373897987018, fx₁₃ = 1.2720772543609207
x₁₄ = -6.8400965825145015, fx₁₄ = 1.1814574151203185
x₁₅ = -8.8128360727706223, fx₁₅ = 1.6287509745643769
x₁₆ = -7.2885505274866542, fx₁₆ = 0.97777993115770767
x₁₇ = -10.709994864744841, fx₁₇ = 3.6369972987156078
x₁₈ = -3.8698495348738184, fx₁₈ = 1.6330320788034982
x₁₉ = -2.230811533392838, fx₁₉ = -0.23228002463635777
x₂₀ = -2.499925490828641, fx₂₀ = 0.026449537717568972
x₂₁ = -2.4747617934295065, fx₂₁ = 0.00019161604875239657
x₂₂ = -2.474576797589688, fx₂₂ = 1.058441445600522e-08
x₂₃ = -2.4745767873698292, fx₂₃ = 2.2204460492503131e-16
```

`-2.4745767873698292`

Though `Julia`

makes this happen fast, it will take more than 20 steps before converging and the answer is no where near the guess. This trace might show why

Newton, way off

When \(|f'(x)|\) is too close to \(0\), the path can jump alot. In the figure, what was the longest jump?

The method did find a zero, but the initial guess was nowhere near the final zero. How close was the closest zero to the initial guess?

Let \(f(x) = x^{1/3}\). We know the root is 0. Let’s see what happens if we use Newton’s method. We have to be careful though as `Julia`

thinks that cube roots of negative numbers (via `x^(1/3)`

are `NaN`

, not a number. (You can avoid this, by making your number complex, e.g. `x + 0*im`

, but then the real answer is not given as an answer. It is just one of three and not the chosen one.)

So we define our function using `Julia`

’s `cbrt`

function, which works as we desire for negative numbers, as follows:

```
f(x) = cbrt(x)
= newton(f, 2) xstar
```

`LoadError: Roots.ConvergenceFailed("Algorithm failed to converge")`

Still an issue. Why?

Despite all our care with the derivative, the method did not converge in \(200\) steps. Can you see why from this trace?

`LoadError: UndefVarError: `n!` not defined`

For \(f(x) = x^{1/3}\), simplify the expression by hand:

`x - f(x)/f'(x)`

What do you get?

Apply Newton’s method to \(f(x) = (x-2)^7 \cdot (x-3) \cdot (x-4)\) starting at 1.9981. The algorithm does not converge to 2 – an obvious root. From a plot of \(f(x)\) explain why not:

The function `f(x) = atan(x)`

is a strictly increasing function with one zero, \(0\). Yet it can pose problems with Newton’s method. For which values of \(x\) does Newton’s method converge:

Sometimes, the process can cycle even for reasonable functions.

Let \(f(x) = x^3 - 5x\). Starting with \(x_1=1\), compute three steps of Newton’s method. What are the terms in the series produced?

Here is a pathological example where the value always cycles no matter where you start unless you start at \(0\).

Let \(f(x) = \sqrt{|x|}\). This is the one-sided square root function turned into an even function. We could also have defined it by:

`f(x) = x >= 0 ? sqrt(x) : sqrt(-x)`

`f (generic function with 1 method)`

where the ternary operator `a ? b : c`

looks at `a`

and if true will execute `b`

otherwise `c`

.

This makes it easier to write the derivative of the function in `Julia`

:

`fp(x) = x >=0 ? (1/2)*sqrt(x)/x : -(1/2)*sqrt(-x)/(-x)`

`fp (generic function with 2 methods)`

To see what happens when using Newton’s method, lets start at \(x=2\)

```
= 2
x
= x - f(x)/fp(x)
x = x - f(x)/fp(x)
x = x - f(x)/fp(x) x
```

`-2.0`

Try again with \(x=3.0\) What sequence do you get:

Environment of

`Julia`

when generated
`Julia`

version:

`VERSION`

`v"1.10.4"`

Packages and versions:

```
using Pkg
Pkg.status()
```

```
Status `~/work/mth229.github.io/mth229.github.io/Project.toml`
[a2e0e22d] CalculusWithJulia v0.2.6
[7073ff75] IJulia v1.25.0
[b964fa9f] LaTeXStrings v1.3.1
[ebaf19f5] MTH229 v0.3.2
[91a5bcdd] Plots v1.40.4
[f27b6e38] Polynomials v4.0.11
[438e738f] PyCall v1.96.4
[612c44de] QuizQuestions v0.3.23
[295af30f] Revise v3.5.14
[f2b01f46] Roots v2.1.5
[24249f21] SymPy v2.1.1
[56ddb016] Logging
```