A tutorial on centered paraxial optical systems, including Smith matrices

Paraxial ray tracing is a method of analysing optical systems with roots in antiquity, but which was developed fully only after the 17th century. This type of analysis is valuable for understanding how optical instruments function, and for the arrangement of systems from components. It is not satisfactory for the design of optical components of wide aperture or field, high power, or fidelity of imaging, for which accurate ray tracing and wave considerations are necessary. The treatment is restricted to spherical surfaces, the only type of accurate surface that can be easily manufactured, and to ray paths without reflection (dioptric systems) in homogeneous media.

The imaging properties of a sequence of spherical surfaces between media of different indexes of refraction in which the centres of all the surfaces lie on the same straight line, the *axis* of the system, is very simple provided the rays are all of small height and inclination; that is, for *paraxial* rays. In this case a bundle of rays leaving one point, the *object*, all pass through a second point, the *image*. These points are *conjugate*, since the roles of object and image may be interchanged, because light rays are reversible. The imaging is said to be *stigmatic*. This is only an approximation, but it is a very useful one. Conventionally, we consider the rays to move from left to right. When it is said that a bundle of rays passes through a point, this means that the physical rays would actually pass through the point, forming a *real* object or image, or would pass through the point if extended, forming a *virtual* object or image. It must be stressed that the paraxial imaging properties are only approximate, and a more detailed analysis is necessary for accurate results.

The imaging properties of the system, no matter how many surfaces it comprises, can be expressed by the locations of six *cardinal* points on the axis, shown in the figure below. Points belonging to the object space, in front of the first surface, are unprimed, and the index of refraction in this region is n. Points belonging to the image space, behind the last surface, are primed, and the index of refraction in this region is n'. F is the first focal point. A bundle of rays leaving this point, or any point on a plane perpendicular to the axis at this point, is imaged into parallel rays. F' is the second focal pont. A bundle of parallel rays in the object space is imaged into a point lying in a plane perpendicular to the axis at this point. H and H' are a pair of conjugate points, called the *principal points*. Any point on a plane perpendicular to the axis passing through H is imaged into a point at the same height on a plane perpendicular to the axis through H'. These are points of unit lateral magnification. K and K' are another pair of conjugate points, called the *nodal points*. Any ray passing through K also passes through K' at the same angle to the axis. These are points of unit angular magnification.

The distance FH is called the first focal length f, and the distance H'F' is the second focal length f'. These quantities satisfy the relation n'f = nf'. The distances from the focal points to the nodal points are also equal to f and f', but interchanged. When n = n', the focal lengths are equal, and the principal and nodal points coincide. Everything depends only on two independent quantities, which can be taken as a focal length, and the distance between principal (or nodal) points.

Suppose ab is any ray in the object space. We have extended the ray, if necessary, so that it intersects the first focal plane at a, and the first principal plane at b. We also draw the line aK to the first nodal point. In the image space, the ray must pass through c (unit lateral magnification), and be parallel to aK (unit angular magnification), since a is in the focal plane. The ray cd can now be drawn in the image space. By this means any ray ab in the first medium can be projected into ray cd in the last medium, and any point in the object space can be mapped into its conjugate point in the image space. when drawing a ray diagram, such as the one here, transverse distances are greatly exaggerated.

The Figure shows an object of height y and an image of height y'. The relation between object and image distances, with respect to the focal points F and F', by the Newtonian Lens Formula xx' = ff', easily derived from similar triangles. This formula is valid for thick lenses, as well as for the thin lenses to which the familiar Gaussian lens formula applies. If the object and image media are the same, then f = f', and the focal length is a mean proportional between the object and image distances.

The location of the focal, principal, and nodal points can be made for an actual system of spherical surfaces by tracing paraxial rays through the system, using the law of refraction at each surface. This can also be done by multiplying 2x2 matrices representing each surface and distances between the surfaces. The relations expressed graphically here can also be expressed algebraically. In this case, the algebraic signs of the quantities must be carefully handled.

Let's now do a paraxial ray trace for the basic case of a spherical interface between media of index n on the left, and index n' on the right. In the diagram, C is the centre of the sphere, and V its *vertex*, where it intersects the axis. A ray is drawn from any point O on the axis intersecting the surface at point P, a distance h from the axis, where it is refracted and directed toward point I. Since h is small compared to r, the radius of the sphere, we may assume that the sine and tangent of a small angle are equal to the angle itself, and that point P is approximately above V. Then, the angles can be expressed as shown. The law of refraction is then n(h/r + h/s) = n'(h/r - h/s'). Remarkably, the h cancels out, showing that all rays from O meet at I, to this approximation, so that O and I are conjugate points. This equation can then be thrown into the usual Gaussian form as n/s + n'/s' = (n' - n)/r. The quantity of the right, the *power* of the surface, is proportional to the difference of the indexes and inversely proportional to the radius of the surface. If r is measured in metres, the power is in *diopters*, abbreviated D.

From this formula, we find the focal point F by setting s' equal to infinity, with the result that f = nr/(n' - n). Setting s equal to infinity locates F': f' = n'r/(n' - n). Both principal points must coincide at the vertex V in this approximation, since the ray passes from object to image space by refraction at this single location. The distance to the nodal points is the difference in the focal lengths, f' - f = r. Therefore, the nodal points are located at C. Any ray directed towards C passes through the surface without refraction, retaining its inclination, which is quite clear without calculation. We have now determined the cardinal points of a single spherical refracting surface. If the centre of curvature C is on the other side of the vertex V, we need only change the sign of r in the formulas.

Next to the plane mirror, the most familiar optical device is the lens. A simple lens has two spherical surfaces bounding a medium of index n, and is normally used with air on both sides. The result of the preceding paragraph can be applied in succession to the two surfaces, considering the image formed by the first surface as the object for the second. This is straightforward, but involves some algebra and interpretation to find the cardinal points. Things become much simpler if the two surfaces are considered to be located at the same point as far as measurements along the axis are concerned. This approximation is called the *thin lens*, and is rough but qualitatively useful. Since the object and image space indexes are the same (unity) and the refracting surfaces are located at a single point V, the principal and nodal points coincide at the vertex V, and all measurements are made from this point. Applying the equation for a spherical surface, we have 1/s + n'/x = (n'-1)/r1 and n'/(-x) + 1/s' = (1-n')/r2, where r1 and r2 are the two radii, and x is the image distance for the first surface. Adding the two equations, we have the familiar Gaussian lens formula 1/s + 1/s' = (n'-1)(1/r1 - 1/r2) = P. This is commonly called the *lensmaker's formula*. An equiconvex lens of index 1.5 and radii of 250mm has a power of about 4D, or focal length 250 mm. If it is 50mm in diameter, its thickness must be at least 5mm.

Suppose we have a thin lens in the wall of a tank of water, so that the final index is not unity, but n". Taking this into account in the formulas, we find instead that P' = 1/s + n"/s' = P + (n"-1)/r2. The result depends on the radius of the second surface. For an equiconvex lens of glass (n = 1.5, r1 = -r2 = r), and water (n" = 4/3) in the tank, P' = 1/r - 1/3r = 2/3r. If the lens is surrounded by water on both sides, the power becomes P = 1/4r (the relative index between water and glass is about 9/8). Its power in air is, of course, P = 1/r. These simple formulas are good for understanding such behaviour qualitatively, even if they are not suitable for optical design.

We note that the power depends only on the difference of the curvatures of the two surfaces, not on the individual curvatures. This implies that a lens of given power can have a range of shapes, so the shape can be chosen to achieve the best image for a given object distance. Adding or subtracting the same curvature to both surfaces is *bending* the lens. Lens shape is actually quite a significant factor. In general, the best image is obtained when the refraction is equally divided between the two surfaces. For object and image at about the same distances, an equiconvex lens is best. If the object is at infinity (parallel light incident), a plano-convex lens is best, with the convex side facing the parallel light. The lens is turned around if the parallel light exits, as in a magnifier. An eyeglass lens is often a *meniscus*, which means a lens with curvatures of the same sign, so that it can be placed closer to the eye and be satisfactory for different directions of viewing.

A simple lens with only one free parameter offers very little scope for the reduction of lens *aberrations* (failure to focus to a point). Chromatic aberration is caused by the differences in index for different wavelengths, or *dispersion*, and cannot be reduced at all. Improvements are made possible only by using more surfaces and more media. The simplest case, that of two different glasses and three surfaces, turns out to be very satisfactory, so long as the proper glasses are used. Now there are two free parameters. One can be chosen to minimise chromatic aberration (discussed below), the other to minimise spherical aberration. The result is called an *achromat*. Any lens used in an optical system for polychromatic light should always be an achromat, designed for the particular object and image distances under which it is used. Simple lenses can be used for eyeglasses and similar low-power applications because the eye is tolerant of chromatic aberration unless it is excessive. In telescopes and microscopes, where high power is necessary, chromatic aberration is extremely annoying. Newton devised the reflecting telescope to avoid chromatic aberration. John Dollond designed the first achromats in the 18th century, when suitable glass was finally available.

The relaxed human eye, as described by Gullstrand, has the following parameters. The index of refraction of the image space is 1.336, the first focal length is 17.055 mm, the second focal length is 22.785 mm. H is 1.348 mm behind the anterior surface of the cornea, H' 1.602 mm behind, and the fovea is 24.0 mm behind.

Many optical instruments have two, or even more, lenses one after the other along the optic axis, with the centres on the axis. We already know how to handle such a system when the locations of the cardinal points are known, so the problem is to find the cardinal points when the indexes and radii are known. This can be done graphically, or by calculation, with the formulas we already have, but the application proves very tedious. There is a very powerful way to handle this problem that is based on the fact that the change in direction of a ray at an interface, and the change in its height between interfaces, is a linear function of the inclination and height of the ray. For example, a ray with inclination a and height h at one plane perpendicular to the optic axis has the same inclination at a second plane a distance d farther on, but a new height h2 = h1 + ad at that plane. Linear substitutions like this are elegantly expressed by means of matrices, and successive transformations by matrix multiplication. Here we need only 2x2 matrices, so they are easy to handle. The inclination and height of a ray can be expressed by a column matrix whose upper element is the product of the index of refraction and the ray inclination (positive upwards to the right), and whose lower element is the height. Including the index of refraction merely makes the matrices simpler. These two quantities do not form a 2-dimensional vector, because they are of different kinds and units. We are purely using the property of matrices as describing linear transformations.

The graphic shows how to multiply 2x2 matrices, and how a ray is transformed by them. The matrices are written from right to left for each surface and translation in the order that the ray encounters them (this is the reverse of the order you may expect). The fundamental matrices are the ones for a translation and a spherical surface. Look how easy it is to get the lensmaker's formula for a thin lens! The matrix for the complete system can be interpreted as shown. The back focal length (bfl) and front focal length (ffl) are the distances of the focal points from the last and first surfaces of the system, given by the matrix elements A and D. The principal and nodal points are now found from the focal lengths given by the matrix element B. The fourth matrix element must be such that the determinant of the matrix, AD - BC, is unity. (Recall that the determinant of a product of matrices is the product of the determinants of each factor matrix.) The matrix method is well-adapted to numerical computation as well as to algebraic analysis.

It may make it clearer how the matrices operate to consider the case of a plane interface between media of indexes n and n'. The matrix in this case is simply the unit matrix, with 1's on the diagonal and 0's in the corners. The ray transformation is n'a' = na, and h' = h. The first equality is just the law of refraction for small angles, where sin a = a, and the second is obvious. The matrix methods can be extended to systems with reflection, but I shall not do this here.

Now we can easily solve the problem of two spaced thin lenses by multiplying three 2x2 matrices. The focal length of the combination is found from 1/f = 1/f1 + 1/f2 - d/f1.f2. If d = 0 -- that is, if the lenses are in contact -- the resultant power is just the sum of the separate powers of the lenses. If the lenses are separated by the sum of their focal lengths, the power is zero, since the matrix element B is zero. However, the matrix element A becomes -f1/f2, which is the *angular magnification*, or ratio of the inclination of the image ray to the inclination of the object ray in this case. A system for which B = 0 is said to be *telescopic*, and indeed two thin lenses separated by the sum of their focal lengths is a telescope. If both lenses are positive, the result is a Keplerian or astronomical telescope. If one lens is negative, the result is a Galilean telescope. The sum of the focal lengths must be greater than zero, of course, since we cannot separate the lenses by a negative distance.

Since the index of refraction varies with wavelength, so does the focal length of a lens. This causes the images in different colours to be different in size, and to be at different points, so images are coloured and a little fuzzy at the edges, a defect called *chromatic aberration*. If two thin lenses made from the same glass are separated by a distance equal to half the sum of their focal lengths, the focal length of the combination will not vary with wavelength, to a good approximation, eliminating much of the chromatic aberration. To show this, write the equation for two separated thin lenses as P = P' + P" - gP'P", where g is the distance between the lenses, and P is 1/f, the power. Then dP = dP' + dP" -g(P'dP" + P"dP') = (1 + gP')dP" + (1 + gP")dP'. From the lensmaker's formula, dP' = dnP'/(n-1), and similarly for dP", so we find that dP = (P' + P" -2gP'P")dn/(n-1). dP will vanish if 2gP'P" = P' + P", or g = (P' + P")/2P'P" = (1/P' + 1/P")/2 = (f' + f")/2, which is what was to be shown. This property is used in the Huyghen's eyepiece, still in use because of its cheapness and simplicity. Chromatic aberration is specially annoying in eyepieces, which usually have high power.

The thick lens is another application of considerable interest. It is just like the two thin lenses, except that the first and last matrices are for the spherical surfaces. Now the thickness of the lens is taken into account, so a more accurate estimate of the power can be obtained, as well as the location of the cardinal points with respect to the lens. If this is done for the equiconvex lens of 4D and 5mm thickness that was mentioned above, we find 248.75mm for the focal length instead of 250mm, and the principal points are 1.25mm behind the vertexes. The construction of the matrices, their multiplication, and their interpretation are left as an exercise for the reader.

All of our results are for a centred paraxial system. It is very important that the centres of each interface be accurately on the axis for a good image to be formed. If any surface is out of adjustment, the image quality falls very rapidly. Therefore, in constructing an optical system, alignment is crucial to proper functioning. In fact, observation of an image can aid adjustment of the system.

There are many good references for further study of this material, which is called *geometrical optics* because of the typical ray diagrams. Two classic works are the Optics texts by Jenkins and White, and Hecht and Zajac. Some texts on Physical Optics do not treat optical systems or geometrical optics. The matrix method is rather recent, and will not be found in earlier texts. Texts on Practical Optics also contain much useful information, as well as laboratory procedures. There are proper ways to do everything in Optics, some of which are not obvious.

Return to Optics Index

Composed by J. B. Calvert

Created 11 April 2000

Last revised 23 June 2000