Including maxima and minima and Lagrange's undetermined multipliers
The references give excellent treatments of partial derivatives and transformations, with all the details. In this article, I want to relate the mathematical treatment of partial derivatives to the discussions seen in thermodynamics texts, as well as making the meaning and manipulations of the Jacobian clearer. Because of the limitations of HTML, determinants are represented with the rows on the same line, separated by commas. The reader is encouraged to write out these determinants in the usual form, and to make sketches clarifying the analysis by appeal to spatial intuition.
We'll be talking mainly about functions of two variables here, but most of the results are easily extended to a greater number of variables, but often at the cost of additional complexity. If the dependent variable u is a function of the independent variables x and y, we may write u = f(x,y). For any permissible values of x and y, this function determines a unique value u. In this case, f(x,y) can be imagined to be a surface in three dimensions and our visual intuition can be a great aid. This is not possible for a greater number of independent variables.
A function of a single variable, u = f(x) is continuous if lim(x→a) f(x) = f(a), and differentiable if the limit lim(Δx→0) [f(x + δx) - f(x)]/Δx exists. This limit is the derivative du/dx, df/dx, Df or f'(x). Our intuition would say that if a function is continuous, it would also be differentiable, but this is not true. Crafty mathematicians have exhibited continuous functions that nowhere have a derivative. For the usual sort of physical functions a continuous function is differentiable, but these functions are a special class.
For a function u = f(x,y) continuity is defined in much the same way, but the values of dx and dy must not be restricted in the approach to the limit. The partial derivative ∂f/∂x is the limit of [f(x+δx,y) - f(x,y)]/δx, and the partial derivative ∂f/∂y is defined in a similar way. For f(x,y) to be differentiable, these two derivatives of course must exist, but in addition the expression Δu = (∂f/∂x)Δx + (∂f/∂y)Δy must also be valid for arbitrary Δx and Δy. For all physical functions, this will be true, but crafty mathematicians can find functions for which the partial derivatives exist, but the expression for Δu is invalid. The word "partial" is perhaps badly chosen. These are the usual derivatives, but with the other independent variables treated as constants. They are not partially derivatives, but are only a part of all possible derivatives.
The partial derivatives of u = f(x,y) may be denoted as above, with the curly ∂ instead of the d. Another notation is ∂f/∂x = f1(x,y) and ∂f/∂y = f2(x,y). Generally, a subscript n refers to the partial with respect to the nth variable in the variable list. Alternatively, we may write fx and fy for the same derivatives. The subscripts are repeated for derivatives of higher order.
It is almost always true that ∂2f/∂x∂y = ∂2f/∂y∂x, or that "cross derivatives are equal." It is certainly true if the second derivatives of f(x,y) exist and are continuous, but is also true under weaker hypotheses. Functions do exist for which the cross derivatives are not equal at certain points, but such functions are not encountered in physics. This property can be extended to guarantee that the order of differentiation is immaterial for any number of differentiations, so long as the derivatives involved are continuous.
When we write u = f(x,y) and then the partial derivative ∂f/∂x it is clear that the other independent variable is y because the function is defined as f(x,y). If we write simply ∂u/∂x, we do not show exactly which function is meant, so we may write (∂u/∂x)y to show that we are considering a function of x and y, and that y is being held constant in the differentiation. In mathematics, there is usually no ambiguity, but in thermodynamics the independent variables must be made clear. (∂S/∂T)v is, in general, different from (∂S/∂T)p, since the entropy S may be expressed either as S(T,v) or S(T,p) and the functions are not the same. The sudden appearance of these subscripts after the student has studied partial derivatives in mathematics often causes anxiety. They are, however, nothing more than a specification of the function to be differentiated.
Recall from the study of functions of single variable that if y = f(x), then under certain conditions an inverse function x = g(y) exists, and that dg/dy = 1/(df/dx). We usually require that these functions are defined so that they are single-valued. For example, if y = tan x, then dy/dx = 1/cos2x, so that d(tan-1y)/dy = dx/dy = cos2x = 1/(1 + tan2x) = 1/(1 + y2). The derivative of the inverse function is simply the reciprocal of the derivative of the direct function.
Also, if y = f(w) and w = g(x), or y = f[g(x)], we recall the chain rule, or dy/dx = (df/dw)(dg/dx) = f'g'. This principle can clearly be extended to functions of an arbitrary number of variables, and to an arbitrary "depth". We shall use the chain rule very often in this article.
Before going on, we shall stop to consider a function u = f(x,y) that is defined implicitly. That is, as F(u,x,y) = 0. Even if this equation cannot be solved conveniently in the form u = f(x,y), we can still find the partial derivatives of u if we know the function F. In fact, taking the derivative with respect to x, the chain rule gives F1(∂u/∂x) + F2 = 0, so ∂u/∂x = -F2/F1. Similarly, the derivative with respect to y gives F1(∂u/∂y) + F3 = 0, or ∂u/∂y = -F3/F1.
Here we have assumed that u is the dependent, and x,y the independent variables. This is not necessary, for F(u,x,y) = 0 equally well defines the additional functions x = g(u,y) and y = h(u,x). It should be clear how to find derivatives like ∂x/∂u or ∂y/∂x by differentiating F and using the chain rule. Subscripts to show the other independent variable are welcome here, but not absolutely necessary. This is one of the advantages of the implicit definition, that any two of the variables may be considered independent, while the remaining one is the dependent variable.
An excellent example is the thermodynamic equation of state of a simple substance, which may be written F(p,v,T) = 0. Here, p is the pressure, v the volume per mole, and T the absolute temperature. In any problem, we choose two of these as the independent variables. For an ideal gas, the equation of state is very simple: pv - RT = 0. This can easily be solved for p, v or T and so is an excellent example for us. For example, p = RT/v, so that (∂p/∂T)v = R/v. From the implicit function, F1(∂p/∂T)v + F3 = 0. Therefore, (∂p/∂T)v = -F3/F1 = -(-R)/v = R/v, the same result without solving the equation.
Let us now find ∂T/∂p)v from the implicit function. We have F1 + F3(∂T/∂p)v = 0 on differentiating with respect to p, holding v constant. Therefore, (∂T/∂p)v = -F1/F3 = 1/(∂p/∂T)v. This is a general result. We may invert a partial in this way, but only when the variable held constant is the same. This is one of the relations frequently used in thermodynamics.
It is easier to find the partials by differentiating the implicit function than by remembering formulas. The reader may find for himself that (∂T/∂v)p = -F2/F3 and that (∂v/∂p)T = -F1/F2. Multiplying these two partials and (∂p/∂T)v together, we find that their product is -1. This is a relation between three partials that also is frequently used in thermodynamics. The subscripts are not really necessary here, but are a help in understanding. The reader should work these out for the ideal gas, and see that their product is indeed -1. We can always write down the three-partial product by "cancelling" the differentials in sequence, as (∂T/∂v)(∂v/∂p)(∂p/∂T) = -1. The subscripts can then be filled in easily.
Another way to find these relations uses differentials. If z is considered to be the dependent variable, and x,y the independent variables, then dx = (∂x/∂y)dy + (∂x/∂z)dz, and dz = (∂z/∂x)dx + (∂z/∂y)dy. Eliminating dz, we have dx[1 - (∂z/∂x)(∂x/∂z)] = dy[(∂x/∂y) + (∂x/∂z)(∂z/∂y)]. Since dx and dy are independent, their coefficients must vanish. This gives us the same two relations as derived above another way. If this is applied to thermodynamics, it is easy to fill in the subscripts giving the variables held constant by inspection.
Any function u = f(x,y, ...) is called an algebraic function if it can be defined implicitly by an equation F(u,x,y, ...) = 0 where F is a polynomial in u,x,y, ... . This includes rational fractional functions and functions involving radicals. If a function cannot be described in this way, it is called transcendental. Elementary functions include algebraic functions as well as the transcendental exponential, logarithmic, trigonometric and hyperbolic functions. What is called elementary is more determined by convention than by mathematical properties.
Let's now go back to u = f(x,y) and add a companion function v = g(x,y) with a new dependent variable v. Under certain conditions (which we shall investigate) these functions can be considered to map a point P(x,y) in the xy-plane onto a point S(u,v) in the uv-plane. If the functions are continuous and differentiable, a neighbourhood of P is mapped into a neighbourhood of S. We clearly have four partials, those of u and v with respect to x and y. Just as P(x,y) determines a point S(u,v), so a point S(u,v) determines a point P(x,y). This is the inverse transformation. There are four first partials relating to this transformation, those of x,y with respect to u,v. If the equations can be solved to yield x = F(u,v) and y = G(u,v) the derivatives can be found by direct differentiation, as in the case of the direct transformation.
However, finding the inverse transformation may be difficult or inconvenient. The partials can still be found by using the chain rule. Differentiating u = f(x,y) and v = g(x,y) with respect to u, we find 1 = f1xu + f2yu, and 0 = g1xu + g2yu. These are simultaneous linear equations for xu and yu, two of the desired derivatives. The solutions are xu = g2/J and yu = -g1/J, where J is the determinant |f1 f2, g1 g2| = |∂u/∂x ∂u/∂y, ∂v/∂x ∂v/∂y| = ∂(u,v)/∂(x,y). A determinant whose elements are partial derivatives in this pattern is called a Jacobian, usually represented as in the preceding sentence. Similar formulas can be found for xv and yv in the same way, differentiating with respect to v instead of u.
For these partials to exist, the Jacobian must not vanish. This is an important restriction on the functions f and g that may define a transformation. Let us suppose there is a functional relationship between f and g. That is, there is some function H such that u = H(v), or g(x,y) = H[f(x,y)]. The Jacobian in this case is |f1 f2, H'f1 H'f2|, which vanishes because two rows are proportional. Therefore, there must be no functional relationship between f(x,y) and g(x,y). Although functional dependence causes the Jacobian to vanish, a vanishing Jacobian is not sufficient for functional dependence.
The Jacobian for the inverse transformation is ∂(x,y)/∂(u,v) = |g2/J -g1/J, -f2/J f1/J| = 1/J, where J = ∂(u,v)/∂(x,y). We may now write ∂x/∂u = (∂v/∂y)[∂(x,y)/∂(u,v)]. The partial on the left-hand side can be obtained by "cancelling" v and y on the right-hand side. Similarly, (∂v/∂x)[∂(x,y)/∂(u,v)] = -(∂y/∂u). Note how the minus sign arises when the first of (x,y) and the second of (u,v) are cancelled. If both first, or both second, variables are cancelled, the sign is positive. This is a very useful property of Jacobians, suggesting many relations.
The expression ∂(x,y)/∂(u,v)dudv represents the area in the xy-plane corresponding to the intervals du and dv. Writing this out, the expression is |∂x/∂u ∂x/∂v, ∂y/∂u ∂y/∂v|dudv = |(∂x/∂u)du (∂x/∂v)dv, (∂y/∂u)du (∂y/∂v)dv| = |dx dx', dy dy'|, where dx, dy are the distances corresponding to du, and dx',dy' those corresponding to dv. This is just the z-component of the cross product of the two displacement vectors in the xy-plane, which is the element of area. If this expression is in an integral, we may replace it by dxdy, or ∂(x,y)/∂(u,v)dudv = dxdy.
A good example is the transformation between rectangular and polar coordinates. If we write x = r cos θ, y = r sin θ for the transformation from (r,θ) to (x,y). It is easy to find the inverse transformation in this case. We have r = √(x2 + y2) and θ = tan-1(y/x). We may represent the xy and rθ planes as rectangular coordinates. In this case, the whole xy-plane is represented by the strip between θ = 0 and θ = 2π (or any strip of the same width). That is, θ is multiple-valued unless we restrict it to such a vertical strip. It is more usual to superimpose the xy and rθ planes, where the coordinates in the rθ-plane represented by circles of radius r and radial lines at angles θ measured anticlockwise from 0 along the x-axis.
The Jacobian ∂(x,y)/∂(r,θ) is |cos θ -rsin θ, sin θ rcos θ| = r. The Jacobian ∂(r,θ)/∂(x,y) is easily found directly to be 1/r, as expected. This means that the transformation of the area element is rdrdθ = dydx, as is well known.
As an example of finding partials with the aid of the Jacobian, suppose that we have the transformation x = r cosθ, y = r sinθ and wish to know the derivative ∂r/∂x. We write ∂r/∂x = [∂(r,θ)/∂(x,y)](∂y/∂θ) = (1/r)(r cosθ) = cosθ. It's that easy! The result is correct, of course, as can be found from the inverse transformation r = √(x2 + y2).
It is necessary to take some care in cancelling variables in Jacobians, because sometimes a -1 may creep in. Suppose u and v are defined implicitly by two equations F(u,v,x,y) = 0 and G(u,v,x,y) = 0. To find the partials of u and v with respect to x, we differentiate: F1(∂u/∂x) + F2(∂v/∂x) + F3 = 0, and G1(∂u/∂x) + G2(∂v/∂x) + G3 = 0. These two simultaneous linear equations are then solved for the partials. The results are ∂u/∂x = -[∂(F,G)/∂(x,v)]/[∂(F,G)/∂u,v)] = -[∂(F,G)/∂(x,v)][∂(u,v)/∂(F,G)]. The result is found by cancelling F, G and v, but there is also a minus sign. This comes from the difference in u = f(x,y) and u - f(x,y) = 0; that is, in the definitions of F and G.
An important application of partial differentiation is finding maxima and minima. As we know, for a function of one variable y = f(x) there is a horizontal tangent at a maximum or minimum, so that the point x can be found by setting f'(x) = 0. If the point is a maximum, then f"(x) < 0, while if it is a maximum, f"(x) > 0. If f"(x) = 0 at the point, further investigation is necessary. If d3f/dx3 ≠ 0, we have a point of inflection where the curve crosses its tangent. If this derivative is zero, we can proceed this way until we find a nonzero derivative and can then identify the behavior of the curve at the point. We exclude the maxima and minima that occur at the boundary of a region of definition, of course, which cannot be found in this way.
For a function u = f(x,y), it is clear that maxima and minima must occur at points where f1 = f2 = 0. The classification of the behavior of the function at such points is more involved than for one independent variable. We may have not only maxima and minima where the value of u decreases or increases in all directions about the point (x,y), but saddle points as well, where there is an increase in some directions and a decrease in others. If the expression formed from the second derivatives f122 - f11f22 < 0, then we have a maximum at (x,y) if f11 < 0, and a minimum if f11 > 0 (the derivatives with respect to y could also be used here). This expression is the Jacobian ∂(f1,f2)/∂(x,y). If this Jacobian is, on the other hand, > 0, then we have a saddle point.
Try this out on the functions x2 - y2 and x2 + y2. The first has a saddle point at (0,0), while the second has a minimum at (0,0). Also, look at the function y2 - x3. Sketch perspective views of these functions, or graph them on a graphing calculator, such as the HP-48G.
Suppose we want to maximize or minimize a function u = f(x,y) where the two variables are connected by a relation g(x,y) = 0. If we can solve g(x,y) = 0 for either y or x, then this variable can be eliminated in u = f(x,y) and the extremum found in the usual way for one variable. Alternatively, we can proceed as follows. Differentiating g(x,y) = 0, we find that g1dx + g2dy = 0. If either g1 ≠ 0, or g2 ≠ 0, we can solve for one differential in terms of the other. Let's assume g2 ≠ 0, so that dy = (g1/g2)dx. We then assume x is the independent variable, and set du/dx = f1 + (g1/g2) = 0. If we multiply by g2, we see that an equivalent condition is f1g2 - f2g1 = 0, or ∂(f,g)/∂(x,y) = 0. The location of an extremum (x,y) can be found by solving this equation simultaneously with g(x,y) = 0.
As a simple example, consider a rectangle of sides x and y. Its area is xy and its perimeter is 2x + 2y. Let us maximize the area under the condition tha the perimeter be constant at c. Then f(x,y) = xy and g(x,y) = 2x + 2y - c = 0. f1 = y, f2 = x, g1 = 2, g2 = 2. So, ∂(f,g)/∂(x,y) = 2y - 2x = 0 gives x = y. That is, the rectangle of maximum area is a square. Each side is then of length c/4.
If we do not want to choose either x or y as the independent variable, we may follow Lagrange's procedure. We introduce a new constant λ defined by F = f(x,y) + λg(x,y), and minimize F on the assumption that the variables x and y may now vary independently. This gives two conditions f1 + λg1 = 0 and f2 + λg2 = 0. When this is combined with g(x,y) = 0, we have sufficient equations to determine x, y and λ. If we had more variables and more conditions, we would introduce a different undetermined multiplier for each condition, and then regard the variables as independent.
In the preceding example, we would minimize xy + λ(x + y). This gives y + λ = 0 and x + λ = 0. These two equations yield x = y. Since x + y = c/2, x = y = c/4. The multiplier λ = -c/4, but we do not require its value.
Suppose we wish to find the extrema of a function of three variables u = f(x,y,z). If the variables are independent, we solve the system f1 = f2 = f3 = 0. If there is one condition g(x,y,z) = 0, we may use Lagrange's method and find the extrema of f(x,y,z) + λg(x,y,z). From this we find f1 + λg1 = 0, f2 + λg2 = 0, and f3 + λg3 = 0. One of these conditions must be soluble for λ. Suppose it is the first equation; then λ = -f1/g1. If this value is substituted in the remaining two conditions, we have f2 - f1g2/g1 = 0, or ∂(f,g)/∂(x,y) = 0, and f3 - f1g3/g1 = 0, or ∂(f,g)/∂(x,z) = 0. These two equations, solved concurrently with g = 0, locate the extrema. Solving either of the other equations for λ gives similar expressions. The condition for a solution is that at least one of the partials of g is nonzero.
If there are two conditions, g(x,y,z) = 0 and h(x,y,z) = 0, Lagrange's method easily leads us to a solution. There are now two multipliers, λ and μ, which satisfy f1 + λg1 + μh1 = 0, f2 + λg2 + μh2 = 0, and f3 + λg3 + μh3 = 0. Some pair of these three equations must be soluble for λ and μ. Suppose it is the first two equations; then λ = -[&part(f,h)/&part(x,y)]/[∂(g,h)/∂(x,y)] and μ = -[&part(g,f)/∂(x,y)]. If these values are substituted in the third equation, we find f3[∂(g,h)/∂(x,y)] - g3[∂(f,h)/∂(x,y)] - h3[∂(g,f)/∂1,2] = 0. This is just the expansion of a 3 x 3 Jacobian with respect to the third column, so the condition becomes ∂(f,g,h)/∂(x,y,z) = 0. Take care with the signs in working this out. In particular, ∂(g,f)/∂(x,y) = -∂(f,g)/∂(x,y). Jacobians are very useful in expressing the results of these operations. They arise quite naturally in the solution of simultaneous equations whose coefficients are partial derivatives.
D. V. Widder, Advanced Calculus, 2nd ed. (New York: Dover, 1989). Chapter I. Chapter IV treats maxima and minima, Lagrange's multipliers and other applications.
R. Courant, Differential and Integral Calculus, 2nd. ed., Vol I. (London: Blackie and Son, 1937). Chapter 10.
Composed by J. B. Calvert
Created 9 December 2004
Last revised 12 December 2004