using MTH229
using Plots
plotly()
Plots.PlotlyBackend()
A notebook for this material: ipynb
Newton’s method is an old method for approximating a zero of a function, \(f(x)\):
\[ f(x) = 0 \]
Previously we discussed the bisection method which applied for some continuous function \(f(x)\) which changed signs between \(a\) and \(b\), points which bracket a zero. Not only did we need to find these bracketing points – which wasn’t hard from a graph, more importantly the actual algorithm is pretty slow.
If the function \(f\) is sufficiently differentiable, then Newton’s method may work to find a zero. Unlike the bisection method which is slow but guaranteed to find a root by the intermediate value theorem, Newton’s method is fast (once it is close) but has no such guarantee of converging.
Newton’s method is useful to rapidly find a zero when there is a nearby initial guess (and a good function); the bisection method is useful when there is a zero between two values, which need not be nearby.
In this section, we’ll see how to implement the algorithm, try some examples, and then look at what can go wrong.
The MTH229
package provides a function newton
for easily performing Newton’s method, utilizing a function from the Roots
package. More usefully, we will see that find_zero
, which we used for bisection, can also be used for root finding with an algorithm that is a bit more robust than Newton’s method.
To begin, we load MTH229
(and Plots
); in the background loading MTH229
loads Roots
and its functions.
using MTH229
using Plots
plotly()
Plots.PlotlyBackend()
The idea behind Newton’s method is simple – linear approximation.
That is, most functions at any given point are well approximated by the tangent line at that point. If we have a good, nearby, guess \(x_n\), then we can improve this guess by replacing it with the easy to find zero, \(x_{n+1}\), of the tangent line at \((x_n, f(x_n))\).
One step of Newton’s method illustrating how each step of the algorithm improves the previous one.
A simple picture shows that we have a triangle with base \(x_{n+1} - x_{n}\), rise \(0 - f(x_n)\) and slope \(f'(x_n)\), using the “rise over run” formula:
\[ f'(x_n) = \frac{-f(x_n)}{x_{n+1} - x_{n}}. \]
The basic algorithm of Newton’s methods solves this to get:
\[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. \]
Some books write the right-hand side as \(x_n - f'(x_n)^{-1} f(x_n)\), a form that generalizes to different settings.
Like the bisection method, Newton’s method is an iterative method. One begins with a (suitable, usually nearby) guess \(x_1\). From that the algorithm produces \(x_2\) which is used to produce \(x_3\), etc. The idea is that the algorithm eventually will settle on a value \(x_n\) sufficiently close to the desired root. We then call \(x_n\) an approximate zero.
Mathematically, the indices indicate that the right hand side is computed and assigned to the left hand side. This is exactly what is done in assignment within Julia
, so the above simply becomes:
= x - f(x)/f'(x) x
Where f(x)
is the function and f'(x)
its derivative. (In this case found by automatic differentiation.) This line starts with a previously defined value of x
and updates it accordingly.
The updating is continued – by executing the exact same command – until either the algorithm has gotten close enough to an answer (i.e., it has converged) or we have given up on it converging.
Here is an example to find a zero of the function: \(f(x) = x^3 - 2x - 5\).
A quick graph shows a root near 2:
f(x) = x^3 - 2x - 5
plot(f, -3, 3)
Here we improve the estimate for the root near 2 using Newton’s method. We will need the first derivative, which we denote fp
fp(x) = 3x^2 - 2
= 2 # starting value, x_1
x
= x - f(x)/fp(x) # new value, x_2
x
f(x) x,
(2.1, 0.06100000000000083)
We can visualize our progress as follows, noting that x
holds \(x_2\), and zooming into the domain \([1.9, 2.2]\):
= 2, x
x1, x2 plot(f, 1.9, 2.2, legend=false)
plot!(zero)
plot!([x1, x1, x2], [0, f(x1), 0]) # why these values?
scatter!([x1, x2], [0, 0])
Continuing, though without visualizing the progress, this next step will compute \(x_3\):
= x - f(x)/fp(x)
x f(x) x,
(2.094568121104185, 0.00018572317327247845)
This next step will compute \(x_4\):
= x - f(x)/fp(x)
x f(x) x,
(2.094551481698199, 1.7397612239733462e-9)
This next step will compute \(x_5\):
= x - f(x)/fp(x)
x f(x) x,
(2.0945514815423265, -8.881784197001252e-16)
This next step will compute \(x_6\):
= x - f(x)/fp(x) # x stopped changing
x f(x) x,
(2.0945514815423265, -8.881784197001252e-16)
We see that \(x_5=x_6\), so the algorithm has stabilized. We also see that \(f(x_5)\) is basically \(0\) (Recall, eps()
is the machine precision, or the size of the difference between floating point values at \(1.0\) is basically \(10^{-16}\), the size of \(f(x_5)\).)
You can see in this case that the convergence happens quickly as soon as the algorithm gets close.
The approximate root is \(x_5\). It is important to realize that the actual, exact, answer is not likely to be the value computed by Newton’s method, which we call xstar
at times. In most cases, the true answer will be irrational and xstar
a floating point number, which ultimately can never be better than an approximation to an irrational number.
The above example iterated until it was clear the algorithm does not improve itself, as the values returned did not change. This need not be the case for every problem.
Rather, we can determine two ways that the number is close enough to the answer:
x
’s stop changing by muchf(x)
get close enough to zero.In the above, the first one was used. In either case, rather than look for values to be equal (e.g. \(x_{n+1} = x_{n}\) or \(f(x_n) = 0\), we look at whether these two things are close enough to be so. That means for some tolerance, we stop when the change in the x
’s is smaller than the tolerance or f(x)
is smaller – in absolute value – than the tolerance.
The above approach – basically repeating steps – can be tedious. There will be a function to do this for you (newton
). One can use copy and paste to do much of this though:
f(x) = x^3 - 2x - 5
= 2
x
= x - f(x)/f'(x)
x = x - f(x)/f'(x)
x = x - f(x)/f'(x)
x = x - f(x)/f'(x)
x = x - f(x)/f'(x)
x
f(x)) (x,
(2.0945514815423265, -8.881784197001252e-16)
Newton looked at this same example in 1699 (B.T. Polyak, Newton’s method and its use in optimization, European Journal of Operational Research. 02/2007; 181(3):1086-1096.) though his technique was slightly different as he did not use the derivative, per se, but rather an approximation based on the fact that his function was a polynomial (though identical to the derivative). Raphson (1690) proposed the general form, hence the usual name of Newton-Raphson method.
Using Newton’s method to find \(\sqrt{k}\) (by solving for roots of \(f(x) = x^2 - k\)) is also referred to as the Babylonian method, due to its origins. The resulting method
\[ x_{n+1} = \frac{1}{2}(x_n + \frac{k}{x_n}) \]
is described by the first-century Greek mathematician Hero of Alexandria.
Let \(k=15\) and \(x_1\) be \(4\). What is the value of \(x_4\)?
LoadError: UndefVarError: `x0` not defined
The function \(f(x) = \sin(x)\) has derivative \(f'(x) = \cos(x)\). Use Newton’s method to solve \(f(x) = 0\) starting at \(3\). Repeat 5 times. What value do you get for x
?
(This can be used to compute \(\pi\) numerically, as the convergence is very fast. Here it takes 4 steps to verify.)
Let \(f(x) = x^2 - 3^x\). This has derivative \(2x - 3^x \cdot \log(3)\). Starting with \(x_1=0\), what does Newton’s method converge on?
For iterative algorithms it is better to repeat the expression until something happens – not a fixed number of times. In this case, we need a criteria to decide if the algorithm has converged. We shall use the following:
This isn’t perfect, but will be sufficient. (Well, in fact no stopping rule can be perfect, but this one doesn’t account for the relative size of the \(x_n\)s which can be important.)
The first two steps require a tolerance. We will use 1e-14
for this. This is about 100 times the machine precision, eps()
, which is sufficient when the answers are moderate in size. This is not very good if the answers are very large.
A basic algorithm is to repeat a step of Newton’s method until the above occurs. We wrap this up in a function for reuse, and employ a while
loop to repeat the update step until something happens:
function nm(f, fp, x)
= x, Inf
xnew, xold = f(xnew), Inf
fn, fo
= 1e-14
tol = 1
ctr
while (ctr < 100) && (abs(xnew - xold) > tol) && ( abs(fn - fo) > tol )
= xnew - f(xnew)/fp(xnew) # update step
x = x, xnew
xnew, xold = f(xnew), fn
fn, fo = ctr + 1
ctr end
if ctr == 100
error("Did not converge in 100 steps")
else
xnew, ctrend
end
nm (generic function with 1 method)
Here we use the nm
function to find a zero of this polynomial:
f(x) = x^3 - 5x + 1
fp(x) = 3x^2 - 5
= nm(f, fp, 0) # takes 6 steps xstar, ctr
(0.20163967572340466, 6)
However, the MTH229
package provides the newton
function. So we shall use that in the sequel. To see the number of steps, the argument verbose=true
may be given.
We revisit a problem from a previous project, finding zeroes of the function \(f(x) = \exp(x) - x^4\). We know from previous work that there are three of them. Let’s find one near \(x=2\):
f(x) = exp(x) - x^4
= 2
x = newton(f, 2) # newton will use automatic differentiation for the derivative xstar
1.4296118247255556
It took 8 steps and we are this close:
f(xstar) xstar,
(1.4296118247255556, 0.0)
In this case, the answer is exact up to floating point round off.
Repeat the problem of finding a root of \(f(x) = \exp(x) - x^4\) starting at \(x=2\). (newton(f, 2, verbose=true)
). How many iterations does it take with the default tolerances?
If we repeat with \(f(x) = \exp(x) - x^4\) but start now at \(x=8\) where does the algorithm converge?
Let \(f(x) = \sin(x) - \cos(4\cdot x)\).
Starting at \(\pi/8\), solve for the root returned by Newton’s method
In order to use Newton’s method we need to evaluate \(f'(x)\). We have used automatic differentiation above through f'(x)
. Automatic differentiation returns a numerically accurate value for the derivative.
However, Newton’s method is actually fairly robust to using other related values to the derivative. That is the method will converge, though perhaps not as fast as with the derivative.
The secant method is perhaps the oldest numerical linear algebra tool dating back over 3000 years well before Newton’s method. Rather than use the derivative at \(x_i\) to compute \(x_{i+1}\), the secant line is used between \(x_{i-1}\) and \(x_i\). This method will also converge to a zero with a good starting point, though not nearly as quickly as Newton’s method.
You can check – if you want – by repeating the last command until the change in x2
is within your tolerance:
= 1, 2 # initial guess of 2
x2, x1 f(x) = x^2 - 2 # some function
fp(x1,x2) = (f(x1) - f(x2))/(x1 - x2)
= x2 - f(x2)/fp(x1, x2), x2 # update step x2, x1
(1.3333333333333333, 1)
We can repeat via copy and paste:
= x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1 = x2 - f(x2)/fp(x1, x2), x2
x2, x1
f(x2) x2,
(1.4142135623730947, -8.881784197001252e-16)
The last line shows the algorithm has basically converged, as the values agree to \(10^{-14}\). We have
Recall the forward difference approximation to the derivative:
\[ f'(x) \approx \frac{f(x + h) - f(x)}{h} \]
For some small \(h\) (with \(h=10^{-8}\) a reasonable choice for many functions). This can be used
One can also use approximate derivatives based on forward differences in place of \(f'(x)\) in the formula. Again, this won’t be as fast.
The update step \(x - f(x)/f'(x)\) becomes
\[ x - \frac{h \cdot f(x)}{f(x+h) - f(x)}. \]
The issue with this approximation is when the estimated value gets close to the actual root, the value of \(h\) becomes too large. Steffenson’s method replaces \(h\) with \(f(x)\), which for values close to a root gets quite small. This improves the convergence rate to be on par with Newton’s method. In this case, the update step looks like
\[ x - \frac{f(x)^2}{f(x+ f(x)) - f(x)}. \]
Use the secant method to find a root to \(f(x) = \cos(x) - x^3\) starting with \(x_1=1/2\) and \(x_2=1\).
Use the secant method to find a root to \(f(x) = x^5 + x - 1\) starting with \(x_1=1/2\) and \(x_2=1\).
find_zero
functionThere are also very fast algorithms which do not require a derivative. The Roots
package provides an interface to these through the find_zero
function.
The find_zero
function has two interfaces:
find_zero(f, (a,b))
, it will use a bisection method to find a root.find_zero(f, a)
, it will use an iterative algorithm to search for a root.Many bracketing methods (like bisection) are guaranteed to converge, but can be slow. The iterative algorithm used by default with find_zero
tries to speed the convergence up, but if along the way it finds a bracketing interval, that will guarantee convergence.
We focus on the simplest usage of find_zero
where an initial guess is supplied and the default order is used. Here is an example to find \(-\sqrt{2}\):
f(x) = x^2 - 2
find_zero(f, -1)
-1.4142135623730951
find_zero
and a graph to find all roots.Let’s look, again, at the task of finding all zeros to the function \(e^x - x^4\). We follow a standard approach:
find_zero
to improveThe following graph suggests, perhaps, there may be \(3\) zeros, one near \(9\), one near \(2\) and one near \(-1\).
f(x) = exp(x) - x^4
plot(f, -5, 9)
We can improve these guesses with
find_zero(f, 9), find_zero(f, 2), find_zero(f, -1)
(8.6131694564414, 1.4296118247255556, -0.8155534188089607)
The above can be written without repeating find_zero
by using a comprehension:
find_zero(f, x) for x in [9, 2, -1]] [
3-element Vector{Float64}:
8.6131694564414
1.4296118247255556
-0.8155534188089607
Or even more compactly, using the broadcast notation:
find_zero.(f, [-1, 2, 9])
3-element Vector{Float64}:
-0.8155534188089607
1.4296118247255556
8.6131694564414
As another illustration, let \(f(x) = \cos^2(x^2)\) on \([-1,2]\). Find all the zeros of the derivative of \(f(x)\).
We graph the derivative to identify starting points:
f(x) = cos(x^2)^2
plot(f', -1, 2)
We see there are 3 potential zeros, one near 0, one near 1.2 and close to 1.7. Here we improve our guesses:
= find_zero.(f', [0, 1.2, 1.7]) # or [find_zero(f', x) for x in [0, 1.2, 1.7]] xs
3-element Vector{Float64}:
0.0
1.2533141373155003
1.772453850905516
The function values at these points can be found with
f.(xs) # or map(f, xs) or [f(x) for x in xs]
3-element Vector{Float64}:
1.0
2.5860584564030066e-32
1.0
Solving \(f(x) = g(x)\) requires representing the difference of the two sides. For example, find the zero of \(cos(x) = 7x\) in \((0, \pi/2)\). A graph will show a zero left of \(0.2\), which we take as an initial value:
f(x) = cos(x)
g(x) = 7x
eqn(x) = f(x) - g(x)
find_zero(eqn, 0.2)
0.14143076140757285
Let
\[ f(x) = 4x^4 -5x^3 + 4x^2 -20x - 6 \]
Apply Newton’s method with \(x_1=0\) using an automatic derivative. What value does it converge to?
Let’s try with a function where the derivative is not known easily. If we set
\[ f(x) = x^x - 2 \]
Can we find a root using Newton’s method, where \(x > 0\)?
We graph the function to see, using a smallish interval at first:
f(x) = x^x - 2
plot(f, 0, 2)
plot!(zero)
Eyeing this, we pick an initial point, \(1\), for Newton’s method (newton(f, 1)
) to the right of the minimum, which appears to be around \(x=0.35\).
What is the value of the approximate zero?
Use find_zero
to find the one root of x^5 - x - 1
. First plot to get an estimate.
Let \(f(x) = 5/\sin(x) + 8/\cos(x)\), Starting at \(x=\pi/4\), use find_zero
to find a root of the derivative of \(f(x)\) given by f'
.
The tangent line of f
at c
can be computed using the point-slope form, \(f(x) = f(c) + f'(c) \cdot (x - c)\). However, we employ the tangent(f,c)
function of the MTH229
package, which returns a function that represents the tangent line.
Let \(f(x) = x^2 - 3x + 5\). Use find_zero
to find the intersection point of the tangent line at \(1\) and the tangent line at \(3\). Where does this happen?
(Hint, let t1
and t3
be defined using tangent(f,c)
for the appropriate c
and then apply find_zero
to eqn(x) = t1(x) - t3(x)
starting at \(1\).)
As great as Newton’s method is, it won’t always work for various reasons, some of which are described in the following. Here is what you need to keep in mind. Newton’s method works well if
The above points come from the following formula which you can find in many texts.
\[ \Delta x_{i+1} = \frac{f' '(\alpha)}{f'(\alpha)}(\Delta x_i)^2 + \text{error} \]
which is valid when \(f(x)\) satisfies \(f(\alpha) = 0\), the third derivative exists near \(\alpha\), and \(\Delta x_i = x_i - \alpha\) is the error between the zero and the \(i\) th estimate. When the derivative at \(\alpha\) is non-zero, the error is basically a constant times \(\Delta x_i^2\). This is interpreted as saying there is quadratic convergence in the error, as the next one is related to the previous one squared.
Now we look at some cases where the above three points do not hold.
Let \(f(x) = \sin(x) - x/4\) and \(x_1 = 2\pi\). This value is deliberately a poor choice:
f(x) = sin(x) - x/4
fp(x) = cos(x) - 1/4
newton(f, fp, 2pi, verbose=true)
Results of univariate zero finding:
* Converged to: -2.4745767873698292
* Algorithm: Roots.Newton()
* iterations: 22
* function evaluations ≈ 44
* stopped as |f(x_n)| ≤ max(δ, |x|⋅ϵ) using δ = atol, ϵ = rtol
Trace:
x₁ = 6.2831853071795862, fx₁ = -1.5707963267948968
x₂ = 8.3775804095727828, fx₂ = -1.2283696986087573
x₃ = 6.7397541447611076, fx₃ = -1.2440675381169748
x₄ = 8.6608848638032576, fx₄ = -1.4734831418043957
x₅ = 7.145187249669843, fx₅ = -1.0271496267351363
x₆ = 9.7071751202999792, fx₆ = -2.7054524356150202
x₇ = 7.471984835940221, fx₇ = -0.94007407756685935
x₈ = 15.128927112593999, fx₈ = -3.2350143158338276
x₉ = 12.152806896784888, fx₉ = -3.4400768736735046
x₁₀ = 17.320458159488304, fx₁₀ = -5.3292452806856589
x₁₁ = -8.262352431213138, fx₁₁ = 1.1478190870135017
x₁₂ = -6.4886031684043539, fx₁₂ = 1.4181745383583302
x₁₃ = -8.4340373897987018, fx₁₃ = 1.2720772543609207
x₁₄ = -6.8400965825145015, fx₁₄ = 1.1814574151203185
x₁₅ = -8.8128360727706223, fx₁₅ = 1.6287509745643769
x₁₆ = -7.2885505274866542, fx₁₆ = 0.97777993115770767
x₁₇ = -10.709994864744841, fx₁₇ = 3.6369972987156078
x₁₈ = -3.8698495348738184, fx₁₈ = 1.6330320788034982
x₁₉ = -2.230811533392838, fx₁₉ = -0.23228002463635777
x₂₀ = -2.499925490828641, fx₂₀ = 0.026449537717568972
x₂₁ = -2.4747617934295065, fx₂₁ = 0.00019161604875239657
x₂₂ = -2.474576797589688, fx₂₂ = 1.058441445600522e-08
x₂₃ = -2.4745767873698292, fx₂₃ = 2.2204460492503131e-16
-2.4745767873698292
Though Julia
makes this happen fast, it will take more than 20 steps before converging and the answer is no where near the guess. This trace might show why
Newton, way off
When \(|f'(x)|\) is too close to \(0\), the path can jump alot. In the figure, what was the longest jump?
The method did find a zero, but the initial guess was nowhere near the final zero. How close was the closest zero to the initial guess?
Let \(f(x) = x^{1/3}\). We know the root is 0. Let’s see what happens if we use Newton’s method. We have to be careful though as Julia
thinks that cube roots of negative numbers (via x^(1/3)
are NaN
, not a number. (You can avoid this, by making your number complex, e.g. x + 0*im
, but then the real answer is not given as an answer. It is just one of three and not the chosen one.)
So we define our function using Julia
’s cbrt
function, which works as we desire for negative numbers, as follows:
f(x) = cbrt(x)
= newton(f, 2) xstar
LoadError: Roots.ConvergenceFailed("Algorithm failed to converge")
Still an issue. Why?
Despite all our care with the derivative, the method did not converge in \(200\) steps. Can you see why from this trace?
LoadError: UndefVarError: `n!` not defined
For \(f(x) = x^{1/3}\), simplify the expression by hand:
x - f(x)/f'(x)
What do you get?
Apply Newton’s method to \(f(x) = (x-2)^7 \cdot (x-3) \cdot (x-4)\) starting at 1.9981. The algorithm does not converge to 2 – an obvious root. From a plot of \(f(x)\) explain why not:
The function f(x) = atan(x)
is a strictly increasing function with one zero, \(0\). Yet it can pose problems with Newton’s method. For which values of \(x\) does Newton’s method converge:
Sometimes, the process can cycle even for reasonable functions.
Let \(f(x) = x^3 - 5x\). Starting with \(x_1=1\), compute three steps of Newton’s method. What are the terms in the series produced?
Here is a pathological example where the value always cycles no matter where you start unless you start at \(0\).
Let \(f(x) = \sqrt{|x|}\). This is the one-sided square root function turned into an even function. We could also have defined it by:
f(x) = x >= 0 ? sqrt(x) : sqrt(-x)
f (generic function with 1 method)
where the ternary operator a ? b : c
looks at a
and if true will execute b
otherwise c
.
This makes it easier to write the derivative of the function in Julia
:
fp(x) = x >=0 ? (1/2)*sqrt(x)/x : -(1/2)*sqrt(-x)/(-x)
fp (generic function with 2 methods)
To see what happens when using Newton’s method, lets start at \(x=2\)
= 2
x
= x - f(x)/fp(x)
x = x - f(x)/fp(x)
x = x - f(x)/fp(x) x
-2.0
Try again with \(x=3.0\) What sequence do you get:
Julia
when generated
Julia
version:
VERSION
v"1.10.4"
Packages and versions:
using Pkg
Pkg.status()
Status `~/work/mth229.github.io/mth229.github.io/Project.toml`
[a2e0e22d] CalculusWithJulia v0.2.6
[7073ff75] IJulia v1.25.0
[b964fa9f] LaTeXStrings v1.3.1
[ebaf19f5] MTH229 v0.3.2
[91a5bcdd] Plots v1.40.4
[f27b6e38] Polynomials v4.0.11
[438e738f] PyCall v1.96.4
[612c44de] QuizQuestions v0.3.23
[295af30f] Revise v3.5.14
[f2b01f46] Roots v2.1.5
[24249f21] SymPy v2.1.1
[56ddb016] Logging