The Beautiful Theory

The principal results of Analytical Mechanics, the key to Quantum Mechanics and Relativity


  1. Introduction
  2. Coordinates and Work
  3. Variations
  4. Dynamics
  5. The Canonical Equations
  6. Simple Examples
  7. Canonical Transformations
  8. Hamilton-Jacobi Theory
  9. Mechanics and Waves
  10. The Continuum
  11. Appendix: Another Look
  12. References


The analytical mechanics of Euler, Lagrange, Hamilton and Jacobi was the outstanding theoretical invention of the 18th and early 19th centuries. It is an invention, a product of the mind, not a discovery or 'uncovering' of something that existed previously. The analytical means that it is an application of the infinitesimal calculus invented by Newton and Leibniz at the end of the 17th century. Its seed was the inchoate concept of energy and work in Leibniz, extending the Newtonian ideas of momentum and force. The concept of energy was not complete until the recognition of internal energy later in the 19th century, when the general principle of the conservation of energy was established at last. In the present essay, energy is simply a constant of the motion, depending on certain conditions, but is still very significant and useful. Analytical mechanics is essentially a mathematical, not a physical, development, though it is completely equivalent to Newton's theory of motion. Beyond this, questions of its 'validity' do not arise. It is, however, a model to which physical theories are referred. The most significant of these theories are the modern Quantum Mechanics of Heisenberg and Schrödinger, and the Theory of Relativity of Einstein, as well as the classical theory of elasticity. Quantum Mechanics is presented in terms of analytical mechanics in its most advanced form, depending on the Hamiltonian function, and Quantum Field Theory uses the Lagrangian density to achieve covariance. An appreciation of analytical mechanics is necessary to the understanding of any of these subjects.

An excellent exposition of the subject is the classic text of Lanczos (Ref. 1). Lanczos deeply admires the beauty and elegance of the theory, which appears in his gentle and thorough exposition, in which nothing necessary to understanding is omitted. There are even exercises, which are interesting and informative, and give the reader confidence in his or her understanding. There is really no competition for this book, so if you want to learn the foundations of analytical mechanics, this is the place. In this paper, I shall not give the derivations in any detail, for which the reader is always referred to Lanczos' lucid expositions. The analytical mechanics of particles was an important product of that rare age of cosmic thought and Liberalism, the 18th century.

As already mentioned, analytical mechanics uses the calculus, with differentiation and integration at every turn, so facility with the calculus is a pre-requisite. It is characterized by the employment of certain functions of a set of variables that describes the state of system in the necessary detail, but not beyond that, and by the use of variational principles as the fundamental statement of the the theory. A typical variational principle might say that a certain integral takes an extremal or stationary value for the actual motion, or that a certain algebraic expression vanishes for an arbitrary variation in the state of the system. The variational principles are used to derive the algebraic and differential equations that must then be solved to determine the motion of the system. The beauty of the theory is in its generality and symmetry, which was of inestimable value in the development of quantum mechanics and relativity, as outstanding examples.

One may estimate that no more than one person in 100,000 has heard of this theory and knows more than simply its name, and perhaps only 1 in 1,000,000 understand what it is about, and could give a connected summary of its principal results. It is not a collection of empirical facts or worse, as so much 'scientific' knowledge is today, but an intellectual exercise of high order, of which there are few or none taking place at the present day. Indeed, fewer students are now equipped to scale its heights than previously, as school mathematics concentrates on arithmetic and tricks. It is sobering to realize how fragile the hold on such knowledge is, how easily it could disappear, and how difficult it would be to recover.

Because of the limitations of the internet browser, partial derivatives are not explicity indicated. Any derivative should be assumed to be a partial unless it is clear that it is not. Also, the mention of a single quantity, such as a coordinate q, stands for the whole set of such quantities, and any product of such quantities should be assumed to be a sum, which will sometimes be explicitly pointed out with a Σ, but not always. Integrals are represented for the same reasons by an I operator whose arguments are the limits and the differential of the integration variable.

Coordinates and Work

In vectorial mechanics, one uses rectangular or polar coordinates, or such, that are good enough for specifying the location of points, but ill-adapted to many problems. A physical pendulum is a simple case. One parameter, the angle of rotation, completely describes the state of the system, instead of the 3N coordinates of each point in the pendulum, conceived as an assembly of N mass points. Even more difficulty is encountered when continuous systems are considered. Of course, you know how this is overcome in elementary mechanics, but the methods are not general nor elegant. If we are interested in the movements of the N molecules of a solid body, the difficulties of a vectorial treatment become overwhelming.

In analytical mechanics, one can use any set of parameters that fully describe the state of the system, and call them the generalized coordinates q. For simplicity, we shall suppress a subscript running from 1 to N, where N is the number of coordinates in the set. N is called the number of degrees of freedom of the system. When these coordinates change by infinitesimal amounts dq, the work done on the system is dW = Σ Fdq. The coefficient F of each dq is the generalized force associated with q. In most cases treated by analytical mechanics, this expression is an exact differential (the cross partial derivatives of the F's are equal), and can be integrated to give a function W(q), called the work function by Lanczos,w with the property that F = dW/dq (of course, this is a partial derivative for the coordinate concerned). By comparing these expressions with those for the work done by the Newtonian forces, they can be expressed in explicit form. For example, the generalized force associated with a rotation is the torque about the axis of rotation. W(q) encapsulates all the force information for any configuration in a single function of the coordinates.

If it is convenient, one can use a greater number of parameters than necessary, and then reduce the number of degrees of freedom by postulating relations of constraint between the q's, perhaps in the form Σ Adq = 0. As in the case of work, this expression may be integrable, so that the constraint may be expressed as f(q) = 0. The use of constraints may result in greater symmetry in the equations.

Time may appear explicitly in any of these relations as a parameter, and if it does so, the system is called rheonomic. If time does not appear, the system is scleronomic.

Forces that can be derived from some W(q) are called monogenic, and constraints of the form f(q) = 0 are called holonomic. Although the theory can be adapted to polygenic forces (friction), non-holonomic constraints (rolling) and explicit dependence on time, it is simplest for monogenic, holonomic and scleronomic systems. In such systems, energy is conserved, so they are also called conservative.

For conservative, scleronomic systems, the potential energy V(q) = -W(q) is defined so that the sum of the kinetic energy T and potential energy V is a constant: T + V = E, the total energy. In some cases, the work function can be extended to depend on the time derivatives of q, which we shall write as q' here, so that W = W(q,q'), and now V = [Σ(dW/dq')q'] - U. The reason for this will be seen in connection with Lagrange's Equation.

The kinetic energy represents the inertial term in analytical mechanics. It is 2T = Σ mv2 in vectorial mechanics. Using the expressions for the rectangular coordinates x as functions of the q, we find that 2T = Σ aij(q)q'iq'j. This is a positive definite quadratic form, just like the square of the line element, ds2 in Riemannian geometry. It can be interpreted as the metric in some Riemannian configuration space of N dimensions. Indeed, the aij(q) form the components of a symmetric metric tensor. The system can be regarded as a single point of unit mass moving in this generally non-Euclidean space, a point of view adopted in General Relativity, since this picture is invariant under coordinate transformations of the q's.


Variational methods are at the heart of analytical mechanics. An excellent example of their power is furnished by Statics, which gives the requirements for equilibrium. Everything is contained in the simple statement that δW = 0 for arbitrary, reversible variations δq of the q's, which is dignified by the title Principle of Virtual Work. This implies that F = 0 for each degree of freedom, which we obtain by varying only the q concerned. If we instead consider, for example, a variation in the x-coordinate, δx, the corresponding variations in the q's can be found in terms of δx, and we find that some linear combination of the F's must now be zero, and this linear combination is just the sum of the x-components of the applied forces, so we have the familiar requirement that ΣF = 0 for equilibrium. Should we consider a rotation about some axis, we find that ΣM = 0, or the sum of the moments about that axis vanishes. We do not get new conditions for equilibrium from the variational method, but we are able to express them easily for any coordinate system. If the forces are monogenic, we find that the work function W is stationary at the equilibrium values of the q's. The equilibrium is stable if this point is a maximum of W, and unstable if it is a minimum. When W = -V, we get the usual statement.

The distinction between the δ for a variation and d for an infinitesimal change can be significant. Both are small and approach zero in the limit, but their interpretations are quite different. δW can be a linear combination of δq's, and does not imply that there is necessarily some function W(q) of which this is a change. On the other hand, dW does imply that there is a function W(q), and that it can be expressed as a sum of partial derivatives times the dq's. d(δW) is the differential of a variation in W, while δ(dW) is the variation of a differential change in W. These quantities are equal, so that δ commutes with d, but this has to be proved. The distinction between δq and dq is less clear-cut, but the first is a kind of 'mathematical experiment' according to Lanczos, while the latter is an actual displacement of the system (a change in any q is called a 'displacement').

The methods of finding stationary points of a function are familiar from calculus: the partial derivatives with respect to each independent coordinate are set equal to zero. It may be convenient to use a larger number of coordinates constrained by some relations like f(q) = 0. Of course, we may use the constraint to solve for one coordinate in terms of the others, and then to eliminate this coordinate from the function, so that the stationary point can be found in the usual way. Lagrange devised a remarkable method for attacking this problem without removing a variable and perhaps destroying some symmetry in the problem. His method, well explained by Lanczos, is to form the augmented function W'(q) = W(q) + λf(q), and then to consider all the coordinates as independent. There is one undetermined multiplier λ for each constraint. If we have N degrees of freedom and M constraints, we now have a problem in N + M independent variables. When we find the N q's for the stationary point, we then use the M constraints to evaluate the λ's, and all is known. To see why this works, note that the difference between F' and F vanishes when the constraint holds.


d'Alembert made a crucial step when he allowed for inertia by including the negative of the force causing acceleration in the equations of statics. We can write this inertial force as -δT = -ΣmAR, where the sum is over the particles, of mass m, in the system, A is the acceleration and R is the position vector. The variational principle -δT + δW = 0 is called d'Alembert's Principle. Now δT = (dT/dt)dt = dT, where T = Σmv2/2. If the forces are monogenic, then δW = dW, and in both cases these are the changes in the actual motion. We then have dT - dW = dT + dV = d(T + V) = dE = 0, from which E = constant.

When d'Alembert's principle is applied in an accelerated reference frame, the inertial forces appear explicitly. If we are in an accelerated frame, and are considering dynamics within it without knowledge of its acceleration, certain forces appear that are proportional to mass. The gravitational force is indistinguishable from the apparent force resulting from an upwards acceleration of g in the absence of the earth; both are of amount mg directed downwards. In a rotating system, these extra forces are the Coriolis force, proportional to velocity, the centrifugal force independent of velocity, and a third force arising from angular acceleration of the reference frame which Lanczos calls the Euler force. These forces must be added to an applied force to determine the acceleration of a body. Contrary to the opinion in School Physics, centrifugal force is a quite proper concept, usually experienced directly when one is in the accelerated system. School Physics has enough trouble making the concept of inertial frame clear and in teaching the free-body diagram, to wander into accelerated reference frames, but it is still 'centripetal' force that does not actually exist.

Let's find a new way to write Newton's equation of motion. If T = mv2/2, then dT/dv = mv, the momentum. Then (d/dt)dT/dv = mdv/dt = F = dW/dx = - dV/dx, so we have (d/dt)dT/dv = - dV/dx. Now, we note that dT/dx = 0 and dV/dv = 0, so if we introduce a new function L = T - V, we can write the equation of motion in the form (d/dt)dL/dv = dL/dx. The derivatives with respect to v and x should be written as partials, since now L = L(x,v) = L(x,x'), where the prime stands for time derivative. This is the form of what is called Lagrange's Equation, but this demonstration shows none of the great advantages that are obtained by expressing all the dynamics of a system in one function of the coordinates and their velocities, the Lagrangian L. It is not hard to see, however, that T and V can be written using generalized coordinates, and then Lagrange's Equation will yield the equations of motion, but this could be equally well done without introducing L at all. The justification for L arises in a different way.

Because of browser limitations, let us write the definite integral between limits t = a and t = b as the operator I(a,b,dt). Then, if we write L = L(q,q',t), we assert that the motion of the system is such as to make δI(a,b,dt)L(q,q',t) = 0, where there is no variation of q at the end points a and b. This is called Hamilton's Principle, and is equivalent to earlier variational principles of 'least action' proposed by different investigators. The main problem is finding the form of L(q,q',t) appropriate to the system. For conservative systems, L = T - V is sufficient to determine L, but the applicability of the principle is much wider. Hamilton's principle is manifestly (obviously) invariant under coordinate transformation, since only the value of L is important. This gives a wide applicability to all its consequences, which are many.

The integral is not used directly. Instead, it implies differential equations that the coordinates must satisfy. These are the Euler-Lagrange equations, of exactly the form of Lagrange's Equation derived for a very special case above. Now we know that they are much more general, applying also when T depends on q and W on q', and when either depends explicitly on the time. By separating the T and V parts of the equation, we can identify an inertial side depending on T, and a force side depending on V, with more general definitions of force and momentum than in Newtonian vector mechanics. The variational principle is used to determine the equations of motion; solving them is another thing. However, the generality gained allows us to use as many tricks of coordinate transformation as we can. Note that whatever the coordinate transformation, the new equations of motion have the same form as the old (i.e., Lagrange's Equation).

The value of δI(a,b,dt)L can be expressed by partial integration as a part that vanishes for any δq between a and b, plus a term depending on the variations at the limits, Σ(dL/dq')δq evaluated at t = a minus the same quantity evaluated at t = b. The quantity dL/dq' is represented by p, and is called the generalized momentum. In Hamilton's Principle, δq is required to vanish at these points. Now, if δL = (dL/dt)δt, we find for the interval b - a = δt that [L(b) - L(a)]dt = Σpq'(b)dt - Σpq'(a)dt, or Σpq'(b) - L(b) = Σpq'(a) - L(a). This has been carried out for an L that does not depend explicitly on t, i.e., for a scleronomic system. The quantity Σpq' - L is, therefore, a constant of the motion that can be identified with the total energy for a conservative system. If the system is not conservative, it is still a constant, but not the total energy. A full derivation is given in Lanczos.

The Canonical Equations

The conserved quantity we have just found, pq' - L, has a very suggestive form (the sum over pq' is understood). Legendre's Dual Transformation starts with a function F(u), where u represents a set of variables. For each u there is a v = dF/du (in general this is a partial derivative). The new function G = uv - F, where uv represents the sum over all the variables, is the Legendre dual of F. We imagine it expressed in terms of the v's alone, G(v). If we work out δG = (dG/dv)δv = uδv + vδu - δF = uδv + (v - dF/du)δu = uδv. By good luck, the coefficient of δu is zero, so we do not have to work out δu in terms of δv. Remarkably, we find that u = dG/dv just as v = dF/du. If there is some coordinate x that does not share in the transformation, we have dG/dx = -dF/dx.

The function H(p,q) = pq' - L(q,q') is just the Legendre transform of the Lagrangian, with the property that dH/dp = q'. the q did not partake of the transformation, so dH/dq = -dL/dq = -p' (by Legendre's Equation). The equations of motion are now first order, not second, and the number of coordinates has been increased from N q's to 2N p's and q's. H is called the Hamiltonian, and q' = dH/dp, p' = -dH/dq are called the canonical equations. We have nothing new physically at all, but the new description is remarkably symmetrical. The linearity of the equations of motion is what made the Hamiltonian formalism so suggestive and valuable in quantum mechanics.

Now we can invert the transformation, to obtain the variational principle δI(a,b,dt)[pq' - H(p,q)] = 0, where δp and δq vanish at the end points. The Euler-Lagrange system of equations for this problem with 2N degrees of freedom are the canonical equations. The integrand is expressed in terms of q and q', p and p'. In fact, p' does not appear. The first term, a sum with N terms, is the 'kinetic' part of the new Lagrangian, and is a linear function of the velocities q'. The second term, the Hamiltonian, is the 'force' part, and depends only on the 'coordinates' q and p. If we define complex coordinates 21/2u = q + ip and 21/2u* = q - ip, the canonical equations become du/dt =-i(dH/du*).

We have dH/dt = (dH/dq)q' + (dH/dp)p' = (dH/dq)(dH/dp) - (dH/dp)(dH/dq) = 0, so H is a constant of the motion if it does not explicitly contain the time. H(p,q) = E defines a surface in the 2N-dimensional phase space with coordinates p and q, and says that a system remains on this surface as time passes. A point representing the state of the system has a velocities given by the canonical equations, which are independent of the time, so the movement is analogous to steady or streamline motion of a fluid. Liouville's Theorem is the statement that this fluid is incompressible, or that phase space volumes are preserved. Liouville's theorem is the analogue of div v = 0, or (d/dq)q' + (d/dp)p' = (d/dq)(dH/dp) - (d/dp)(dH/dq) = 0, by the equality of cross second (partial) derivatives.

If a certain coordinate q does not appear in the Hamiltonian, the canonical equations give p' = 0 or p = C, a constant, for the momentum conjugate to q. In the variational principle, the term pq' becomes Cq' and integrates immediately, giving no contribution to the variation. In H, p is replaced by the constant C. Such variables are called ignorable, and are easily handled in the Hamiltonian formalism.

Simple Examples

Suppose a particle of mass m is attracted to the origin by a force of ΓM/r2. Choose polar coordinates r,θ. The kinetic energy can be easily written down: 2T = m(r'2 + r2θ'2). The potential energy V = -ΓM/r. The momenta are pr = mr', and pθ = mr2θ'. The coordinate θ does not appear in the Hamiltonian, and so it is ignorable. We set pθ = L, the constant angular momentum, and so H = p2/2m + L2/2mr2 - ΓMm/r, where p is the radial momentum. The canonical equation for r' just gives us back the definition of the radial momentum, but the one for p' gives p' = -L2/m2r3 + ΓMm/r2 = mr", which is the differential equation for r(t). In the case of a circular orbit of radius r = a, this equation gives L2 = m2ΓMa, and the expression for the angular momentum L = 2πma2/T, where T is the period of the orbit. Combining these two equations, we find a3/T2 = ΓM/4π2, which is Kepler's third law. The expression for L gives us Kepler's second law of constant areal velocity.

Suppose a particle of mass m is attracted to the origin along a straight line with a force -kx, where x is the displacement. Now 2T = mx'2, and 2V = kx2, so L(x,x') = (mx'2 - kx2)/2. The momentum p = dL/dx' = mx', the usual linear momentum. Then H = px' - L = (px' + kx2)/2 = p2/2m + kx2/2. H is a constant of the motion. The canonical equations are x' = p/m, and p' = -kx. These two first-order equations give us mx" + kx = 0, which we recognize as the equation of motion of the harmonic oscillator.

It is clear that the advantages of Hamiltonian formalism lie in the formulation of the equations of motion, not in their solution. That this help is not to be despised is shown by the problem of small oscillations. We can start by describing the state of the system in terms of any convenient set of generalized coordinates. If we expand the potential energy about the equilibrium position, the first non-zero terms are the quadratic. We keep only these terms, and the potential energy becomes a symmetric quadratic form. The kinetic energy is a positive definite symmetric quadratic form. It is possible to find a transformation of coordinates that simultaneously reduces the kinetic energy to a sum of squares, 2T = Σq'2, and the potential energy to diagonal form, 2V = Σλq2. Now the Hamiltonian is simply the sum of harmonic oscillator Hamiltonians, one for each degree of freedom. Each degree of freedom is now a normal mode, with its characteristic frequency λ1/2, and the general motion is a superposition of these normal modes. A little linear algebra is a small price to pay for untangling all of this.

Canonical Transformations

Hamilton's Principle, in the form δI(a,b,dt)L(q,q') = 0, shows that if we make any transformation of coordinates q = f(Q), we still get exactly the same Euler-Lagrange equations, but now expressed in the new coordinates Q and their velocities Q'. This is some help, but it is difficult to improve the solubility of the problem by coordinate transformation in the Lagrange formalism.

In the Hamiltonian formalism, we have δI(a,b,dt)[pq' - H(p,q)] = 0. Any transformation of the 2N variables p,q would still give the Euler-Lagrange equations for this problem, but this would not, in general, reduce to the canonical equations, which depend on the form of the integrand. A canonical transformation is a transformation that preserves the form of the integrand, and therefore the canonical equations. A transformation like that for the Lagrangian turns out to be canonical, but this is a very limited family of transformations. It is far more useful to mix the p's and q's in the transformation, so that the new coordinates Q are functions of both p and q, as are the new momenta P. This can be done by requiring the form pδq to be invariant, that is, to become the form PδQ in the transformed coordinates. This is still not the most general canonical transformation, however.

If pδq - PδQ = δS(q,Q) the transformation will be canonical, since the term in S will integrate and give only boundary terms, which will vanish since the variation at the limits is zero. Since δS = (dS/dq)δq + (dS/dQ)δQ, and the variations are independent, we find that P = -dS/dQ and p = dS/dq. [don't forget that these expressions are really sums over all the coordinates] This gives us 2N equations defining the transformation implicitly. The canonical transformation is completely specified by the generating function S(q,Q). For example, if S = qQ, we find P = -q and p = Q. In this transformation, the coordinates and momenta are interchanged.

It is always fruitful to look for invariants under transformation. They will give us properties that are not accidental, but persistent and characteristic. Here, they will turn out to be the constants of the motion. When the form pδq is integrated around a curve in configuration space, we see that the value of this integral is invariant, because δS will integrate to zero around a closed curve. This is something like the conservation of rotation in hydrodynamics. We can appeal to an analogue of Stokes's Theorem to convert the line integral around the closed curve to an area integral over a surface bounded by the curve. We require two parameters to specify location on the surface, say u and v. The result is that the quantity Σ[(dq/du)(dp/dv) - (dp/du)(dq/dv)] is the quantity that must be integrated over an area to give the line integral of pδq. We have indicated the sum to make it cl ear that the sum is over all N degrees of freedom, and the derivatives are actually partials. This quantity is abbreviated [u,v], and is called the Lagrange bracket of u and v. It must be an invariant under canonical transformation, independent of the coordinate system used to specify it.

Suppose we have a canonical transformation from q,p to Q,P. Consider the Lagrange bracket [Q,Q] (these are two different Q's, or the same one). The bracket can easily be evaluated in the new system, and is zero because all dP/dQ are zero. Similarly, [P,P] = 0. [Q,P] = 0 as well, except when Q and P are conjugate, and then it equals unity (from (dQ/dQ)(dP/dP) = 1). Since these values are invariant, the same values must be obtained when they are evaluated in the old system, using the transformation equations Q = f(p,q) and P = g(p,q). This may be used as a test to see if a given transformation is canonical. If it is, the brackets will be correct.

If we consider u and v as functions of q,p instead of the other way around, we can form the expression {u,v} = Σ[(du/dq)(dv/dp) - (dv/dq)(du/dp)], called the Poisson bracket of u and v. This is also an invariant, and has the property that {Q,Q} = 0, {P,P} = 0, {Q,P} = 0 except when Q,P are conjugate, and then equals unity. The Poisson bracket can also be used to test for a canonical transformation. For the transformation generated by S = qQ, we have P = -q and Q = p, so {Q,Q} = (dQ/dq)(dQ/dp) - (dQ/dp)(dQ/dq) = 0, {P,P} = 0, obviously, and {Q,P} = (dQ/dq)(dP/dp) - (dP/dq)(dQ/dp) = -1, so the transformation is canonical.

The property of the brackets that {u,v} = -{v,u}, and that they were invariant under canonical transformation, suggested their replacement by commutators of operators in quantum mechanics, as well as the concept of conjugate operators. In fact, this is a valid way to 'quantize' a mechanical system.

Hamilton-Jacobi Theory

If canonical transformation T is generated by S, and another T* generated by S*, then the transformation T*T = TT* is generated by S + S*, as can be seen by adding the corresponding forms pδq - PδQ = δS for the successive transformations. S = 0 generates the identity transformation, and S* = -S the inverse to S. Since we are dealing with successive transformations, the associative law (TT*)T' = T(T*T') holds. Therefore, canonical transformations form a group.

Suppose S(Q,q,t) takes the initial Q,P into q,p that change with time. In the small interval of time Δt, an infinitesimal canonical transformation takes q into the neighbouring q = q + Δq, and p into p = p + Δp, where Δpδq - Δqδp = δ[(dS/dt)Δt]. At any time, p = dS/dq, and we can use this relation to eliminate Q from dS/dt, obtaining a function that we shall temporarily call -B(p,q). In terms of this function, Δpδq - Δqδp = -[(dB/dq)δq + (dB/dp)δp]. On equating coefficients of the variations, we have Δq = (dB/dp)Δt and Δp = -(dB/dq)Δt. In the limit as Δt goes to zero, this gives q' = dB/dp, p' = -dB/dq (primes are total time derivatives). These are the canonical equations!

Therefore, we can identify B = H(p,q), the Hamiltonian, and obtain the differential equation dS/dt + H(dS/dq,q) = 0. The solution of this equation gives the generating function S(t) that describes the evolution of the system from Q,P at t = 0 to q,p at time t. This remarkable theory was first brought forward by Hamilton, but Jacobi related it to canonical transformations and gave it its final form. The equation dS/dt + H = 0 is called the Hamilton-Jacobi equation, and is the final pinnacle in the theory.

Further consideration of the time dependence leads to the following recipe for solving a Hamiltonian problem. Write the Hamiltonian H(p,q) and equate it to the energy E. Replace p by dS/dq, and find a general solution for S with N-1 constants of integration α. For each constant of integration, find β = dS/dα, and write down dS/dE = t - τ, where τ is a constant. These equations can be solved for the q = q(E,α,β,t - τ), which is the desired solution.

To make this clearer, let us look at a harmonic oscillator with H = (p2 + q2)/2. For convenience, we have set m = k = 1, which gives ω = 1 as well. The analysis can be carried out for different m and k in exactly the same way. The equation to be solved is (dS/dq)2 + q2 = E. This can be integrated directly to give S = E[cos-1v - v(1 - v2)1/2, where v = q/(2E)2. As N = 1 in this case, there are no arbitrary constants necessary. Now dS/dE = cos-1v = t - τ, so q = (2E)1/2cos(t + τ), which we recognize as the solution we know so well.

The canonical transformation generated by S in the preceding paragraph is q = P sin t + Q cos t, p = P cos t - Q sin t, a rotation in phase space. P and Q are the initial values of the momentum and displacement. In the general case, the orbit is an ellipse, but here it is a circle because of the simple choice of parameters. Since we have N = 1, the motion is determined by a single constant, the energy. When N > 1, we need additional constants. If the Hamiltonian can be expressed as a sum of harmonic oscillator Hamiltonians, as in the theory of small vibrations, then the constants can be the separate energies of the oscillators, and S is a sum of terms, one for each oscillator, which represents a normal mode of vibration.

Mechanics and Optics

Let's study the motion of a particle in a potential energy V(x,y,z). The energy equation is (px2 + py2 + pz2)/2m + V(x,y,z) = E, so the Hamilton-Jacobi equation becomes (dS/dx)2 + (dS/dy)2 + (dS/dz)2 = 2m(E - V). This says that |grad S| = [2m(E - V)]1/2 = v. This means that the surface S = C + dC is found by going a distance ds = dC/|grad S| normal to the surface S = C. Starting from any surface, we can fill space with surfaces of constant S. Now P = grad S, so the normals to S are just the trajectories of a particle. This arises from the variational principle. In general, we could not expect the trajectories to be orthogonal to such a family of surfaces.

The analogy to geometrical optics is striking. Here, the ray paths are the orthogonal trajectories of the wavefronts, so we expect the phase φ to be analogous to S. For light rays, v = c/n, where n(x,y,z) is the index of refraction, which may vary with position. If the wave is of the form Aei(ωt - φ), then time and phase differences are related by ωdt = dφ. The Hamilton-Jacobi equation now becomes (dφ/dx)2 + (dφ/dy)2 + (dφ/dz)2 = (ωn/c)2 = (2π/λ)2. It was this analogy between mechanics and ray optics that led deBroglie to the concept of matter waves, where λ = h/mv.

In the scalar wave theory of light, the wave function must satisfy the wave equation. If we write the wave function in the form eiωt&psi(x,y,z), the amplitude ψ must satisfy d2ψ/dx2 + d2ψ/dy2 + d2ψ/dz2 + (2π/λ)2ψ = 0 (Helmholtz's Equation). If ψ = e-iφ, we find that φ satisfies |grad φ|2 = (2π/λ)2 -i[d2φ/dx2 + d2φ/dy2 + d2φ/dz2]. As λ becomes small, the ratio of the first term on the right to the second becomes large. The second term is proportional to the curvature of the wavefront. When the radius of curvature of the wavefront is much larger than the wavelength, we have the Hamilton-Jacobi equation for φ, showing that this is the limit at small wavelengths. In optics, φ is known as the eikonal, perhaps with a different normalization.

If we introduce the deBroglie wavelength in Helmholtz's Equation, and substitute v2 = 2m(E - V), we find Schrödinger's Equation.

The Continuum

Up to now, we have been considering systems with a finite number of degrees of freedom, which are coordinates describing the state of the system. The variational principle is in terms of an integral over the parameter time, and yields the trajectory of the system, expressing the coordinates as a function of time. Fluids, elastic solids, electromagnetic fields in space, and other systems, cannot be described in terms of a finite set of coordinates, and require a different approach. In these cases, we have a set of field quantities instead, say φα(x,y,z,t). It is convenient for the set of field quantities to be tensors under the rotation group, which is easier than it sounds, and only means that all directions in space are equivalent. If there is only one quantity invariant under rotation, the field is scalar. Three components are necessary for a vector, and two for a spinor. This gives the general idea. The variational principle will then show how the field components vary in time and space, in terms of the parameters x,y,z and t.

The principle will be that the variation of the integral of some Lagrangian with respect to the field variables, extended over a finite range of all the coordinates, is zero, considering variations on the boundary to vanish. The description of the system is in the field variables, not the coordinates, which are now purely parameters. Just as the Lagrangian depended on q and q' = dq/dt in the discrete case, it will now depend on φ, dφ/dx, dφ/dy, dφ/dz and dφ/dt, and perhaps on x,y,z,t as well. Where previously we used integration by parts to eliminate the δq', we will now use the divergence theorem as well. The partial differential equations that result will be the field equations of the problem.

Just finding the field equations is not the reason for having a variational principle. In fact, the field equations are used to find the Lagrangian in the variational integral. From the variational principle flows every sort of delight: conservation laws, opportunities for quantization, relativistic invariance and the forms of interactions, to mention a few. We can illustrate this with the electromagnetic field.

To do this properly, we must proceed in a relativistically conscious way, but this can be done for our purposes without much trouble. We use coordinates xα, where α = 0,1,2,3. x0 = ct, x1 = x, x2 = y, x3 = z. Superscripts represent contravariant quantities, subscripts covariant. A repeated Greek subscript and superscript in the same term is summed from 0 to 3, and a repeated Latin one from 1 to 3. Lowering or raising a 1,2 or 3 index changes the sign of the quantity, but lowering or raising a 0 does nothing. Hence, xαxα = c2t2 - r2 = c2τ2, a relativistic invariant. A Lorentz transformation is a linear transformation that preserves this quantity for any 4-vector xα. If you have studied relativity, these matters will be old friends.

The electromagnetic field is described by the 4-vector potential φα = (Φ,A), where Φ is the scalar (electric) potential and A is the vector (magnetic) potential. It must satisfy the relativistically invariant Lorentz condition dφα/dxα = 0 (the derivatives are partials, of course). By relativistically invariant we mean only that it is not changed by a Lorentz transformation. The field tensor is Fαβ = dφα/dxβ - dφβ/dxα. It is antisymmetric, so its diagonal terms are zero. The electric field is represented by F0i, and the magnetic field by Fij, i != j (!= is not equal). These six components are transformed into each other under Lorentz transformation, so the separation into electric and magnetic fields is not relativistically invariant.

A suitable relativistically-invariant Lagrangian must be a scalar, and the simplest one that can be formed from Fαβ is FαβFαβ = 2(B2 - E2). It is customary to take 1/4 of this for the electromagnetic Lagrangian. This is a purely kinetic Lagrangian, depending only on the derivatives of the field, called the free Lagrangian. The analogue of the Euler-Lagrange equations yields the field equations dFαβ/dxβ = 0. If you write them out, you will find that they are the wave equations for the components of the potential φα. If you write them out in terms of B and E, you will find curl B - dE/cdt = 0, and div E = 0, which are the Maxwell equations that usually have source terms J and ρ in place of the zeros.

The interaction with charged matter is represented by adding a term Jαφα to the Lagrangian. This term is Lorentz invariant, the inner product of two 4-vectors, the current J and the potential φ, and again is the simplest form possible with this restriction. Now the Maxwell equations pick up their familiar source terms. Incidentally, the other four Maxwell equations, the ones that never have source terms, come from the antisymmetry of the field tensor.

In quantum field theory, one begins with the free Lagrangians for the fields involved, and introduces plausible interaction terms whose validity is confirmed (or refuted) by experiment. In the present example, if we added the free electron Lagrangian, and expressed the current in terms of the electron field (which is very easy to do), we would have the foundations of quantum electrodynamics, the most precise physical theory to date. Without analytical mechanics, this would have been impossible.

Appendix: Another Look

Let's take another look at the transformation between the Lagrangian and Hamiltonian formalisms, as Lanczos also does in an Appendix. This can serve as a review of an important point in the theory, and reinforce our understanding. The action is the integral of a certain function L of the coordinates and velocities between two definite times, and is stationary for the actual trajectory q(t) of the system, where q stands for the set of generalized coordinates. The variation of the action consists of two parts, one depending on the variation of the coordinates δq, and the other on the variation of the velocities, δq'. Through an integration by parts, the second part also becomes dependent on δq. If the coefficients of the δq are set equal to zero, we find the Euler-Lagrange system of differential equations for q(t). The integration by parts introduces the 'surface term' that vanishes if δq vanishes at the limits.

Now we look at what happens if we make certain changes of variables in the integral. These are rather special changes. If we call the derivatives of L with respect to the velocities new functions, the momenta p, then these defining equations can be used to eliminate the velocities q' in favour of the momenta p. We cannot simply eliminate the q' in L and proceed as if we had twice as many variables, p's as well as q's, since the variations are not independent. One way to get around this was previously mentioned, the Legendre transformation, but another is to use the undetermined multipliers of Lagrange to make the variations independent, and this is what will be done here.

Let us, then, substitute the new variables w for q' in L, with the subsidiary conditions that w = q', or q' - w = 0. Now we can multiply the conditions by undetermined multipliers p and add the result to L, with the result that L' = L + p(q' - w). Variations in q and w are now independent. The time derivatives of w do not appear, so its Euler-Lagrange equation is dL'/dw = 0, or dL/dw = p, which can be solved for the w's in terms of p and q. Now L' = pq' - (pw - L) = pq' - H(p,q), where H(p,q) = pw - L(q,w) = pq' - L(q,p), where the w's have been eliminated in favour of p and q. The variations now are in p and q. This is exactly the required Hamiltonian form, whose Euler-Lagrange system is the canonical equations.

It is equally possible to go the other direction, eliminating p in favour of q and q'. Then, L' reduces to the original Lagrangian L. We can clearly see that it is all a matter of a transformation of the variables in which the action integral is expressed, and treating the connection between δq' and δq either by partial integration or by introducing new variables without the derivatives by Lagrange's method.


The subject of the present essay is treated in any graduate-level text in Mechanics, and forms an early part of the studies of any student studying for an advanced degree in Physics. Classic texts include those by Whittaker, Goldstein, Corben and Stehle, Sommerfeld, and Landau and Lifshitz. The Lagrangian formalism is often presented in undergraduate courses in intermediate mechanics and theoretical physics.

  1. C. Lanczos, The Variational Principles of Mechanics, 4th ed. (Toronto: U. of Toronto Press, 1970)

Return to Math Index

Composed by J. B. Calvert
Created 17 July 2000
Last revised 14 August 2000