Classical intermediate algebra is the toolbox of mathematical analysis

- Introduction
- Expressions and Polynomials
- Identity Transformations
- Partial Fractions and Root Transformations
- Equations
- Approximate Solutions
- Congruences
- Groups, Rings, Fields and Euclid's Algorithm
- Inequalities
- Matrices and Determinants
- Eigenvalues, Eigenvectors and Diagonalization
- Solving Simultaneous Linear Equations
- Vectors
- Exponents and Logarithms
- Ratio and Proportion
- Progressions, Averages and Finance
- Factorials, Permutations and Combinations
- Binomial Theorem
- Mathematical Induction
- Complex Numbers
- Analytic Geometry
- Tables and Interpolation
- Algebraic Heuristics
- Algebra Problems
- References

Arithmetic, although very useful and the foundation of mathematics, is not suited to making general statements about numbers, nor to reasoning about their abstract properties. Geometry, however, powerfully extends arithmetic to these areas. A geometric figure, for example a triangle, does not refer to any specific triangle, but to any triangle satisfying the conditions of the problem. Conclusions drawn on the basis of these diagrams refer to all examples. Another way to extend arithmetic is to introduce symbols that stand for any number, just as a triangle in geometry stands for any triangle. This step appears to have been taken by Diophantus in the 3rd century, and re-introduced to Europe via the Arabs about a thousand years later together with algorism, which is arithmetic using written symbols, the Hindu-Arabic digits. This reasoning was called by the Italian word *algebra*, derived from the Arabic *al-jabr*, a "putting together." Its operations were very suited to calculations on paper, and it developed rapidly in the 16th and 17th centuries, applied at first mainly to the solution of equations.

Although algebra was reintroduced to Europe around the 13th century, notably by Leonardo Pisano, alias Fibonacci, in 1202, it was originally treated in words as *rhetorical algebra*, much like geometry, and did not make rapid progress. François Viète (1540-1603) introduced a better notation, with unknowns represented by vowels and given quantities by consonants, but it was not until the 17th century that most of the modern notation was in use, including exponents, which were introduced by René Descartes (1596-1650). Coordinate representations, or analytic geometry, were introduced jointly by Descartes and Pierre de Fermat (1601-1665). The growth of mathematics during the 17th and 18th centuries was explosive. It is astonishing that Leonhardt Euler (1707-1783) came only a century after the first steps by Descartes and Fermat.

As an example of the difference between geometry and algebra, consider the algebraic expression (a + b)(a - b), representing the product of the sum and difference of two numbers a and b. By the ordinary rules of arithmetic, this expression can be "multiplied out" as a^{2} - ab + ba - b^{2}. Then, since ab = ba, it can be simplified to a^{2} - b^{2}. We conclude that, in general, the difference in the squares of two numbers is equal to the product of their sum and their difference. This problem can also be treated in geometry; in fact, it is proposition 5 of book 2 of Euclid: "If a straight line be divided into two equal parts, and also into two unequal parts, the rectangle contained by the unequal parts, together with the square on the line between the points of section, is equal to the square on half the line. An editor has added at the end of the proposition: "From this proposition it is manifest that the difference of the squares on two unequal straight lines, AC, CD, is equal to the rectangle contained by their sum and difference." Both proofs are equally valid, but the algebraic proof is much easier to picture in the mind, and very simple.

The essence of algebra, then, is the use of literal symbols to stand for general numbers or other quantities. If these numbers are the real numbers, then classical algebra is algebra over the field of reals, where the symbols behave like real numbers. This can be extended, with great benefits, to the field of complex numbers. What is called "algebra" is limited to finite operations. That is, infinite series and limiting processes are excluded, and are classified as *calculus*. On the other hand, many useful procedures related more or less distantly to working with symbols are studied with algebra.

If a, b and c are any real or complex numbers, then addition "a + b" and multiplication "ab" have the properties a + b = b + a, ab = ba (commutativity), (a + b) + c = a + (b + c) and (ab)c = a(bc) (associativity). Multiplication is distributive over addition: a(b + c) = ab + ac. Parentheses are used to determine the order of operation, as here.

What is called *modern algebra* works with symbols that may obey different rules of composition or operations than the familiar ones of real numbers that we have just presented. In these systems, there are generally two operations, analogous to addition and multiplication, that obey certain rules. A familiar example is Boolean algebra, where the symbols represent the binary values 0 and 1 or T and F, and the operations are OR (analogous to +) and AND (analogous to X). In Boolean algebra, addition (OR) is distributive over multiplication (AND) as well: a + bc = (a + b)(a + c). Also, if complementation (0 ↔ 1) is represented by a prime, (a + b)' = a'b' and (ab)' = a' + b', which are deMorgan's theorems connecting the two operations. In matrix algebra, multiplication is not commutative. We also have the *laws of signs* in products and quotients, that (-)(-) = (-)/(-) = (+) and (+)(-) = (+)/(-) = (-)/(+) = (-).

In this article, we shall be concerned mainly with classical algebra over the real or complex numbers. What I am trying to do here is to review the material that appears in a high-school or undergraduate algebra course (perhaps extended a little) to lay the foundation for more advanced work. I have tried to include everything that was included in a standard algebra course, and more, but I have not included much on graphs and plotting, assuming the reader is well-acquainted with such things. I doubt that this valuable material is well presented in today's high school, and it has more or less vanished from the undergraduate curriculum as an integrated course. None of this material should be scorned as too simple or too obvious; I use it continually in my daily work, hardly remembering where I first encountered it. I certainly cannot say that I learned much of it in school!

Algebra is usually presented at three levels: elementary or high school, intermediate or college, and advanced or higher algebra. Analytic geometry and linear algebra are closely related subjects studied at an intermediate level, following college algebra. Higher algebra includes the theory of numbers, which is interesting but seldom used in applications. This article deals mainly with intermediate algebra, which is a necessary mathematical foundation for scientific studies.

In the 19th century, school mathematics in America meant arithmetic. Elementary arithmetic taught addition, subtraction, multiplication and division. "Higher" arithmetic was devoted to many useful computational skills, such as fractions, percentage, ratio and proportion, measures and mensuration, arithmetic and geometric progressions, raising to powers and extracting roots, accounts, annuities, permutations and combinations, and similar matters. Algebra and geometry were more advanced subjects, involving theory as well as practical matters, and were taught in high school and college. These subjects tended to absorb some of the matter previously part of higher arithmetic, while arithmetic descended into a struggle with innumeracy, where it appears to languish today.

The "HP 48" mentioned below at several points is the plotting scentific pocket calculator made by Hewlett-Packard. It is an extremely useful accessory for numerical calculations. It can also plot functions, a valuable aid to visualization.

A *monomial* is written cx^{k}, where c is a real or complex number, the integer k is the *exponent* and x is the *variable*. It stands for the value of x multiplied by itself k times, times the real or complex number c. A *polynomial* is a sum of monomials, each called a *term*, generally written in the order of decreasing k, as c_{n}x^{n} + c_{n-1}x^{n-1} + ... + c_{1}x + c_{0}. *Collecting terms* in an expression means combining terms differing only in the numerical coefficient. The highest exponent n is the *order* of the polynomial. Often, c_{n} is taken as 1, since this is easily arranged by dividing by the highest-order coefficient. Such an *expression* is also called an entire rational algebraic function, since it involves only addition, multiplication and raising to a power. For any value of x, it represents a number f(x), a function of x. It is a very particular and restricted kind of function, of course, but one about which we can make important statements. One is that its value is finite for all finite values of x, but is infinite as x → ∞ for n ≥ 1. This holds for x complex as well as real.

Another property of a polynomial is that its derivative, df(x)/dx, is also a polynomial, since dx^{n}/dx = nx^{n-1}, and that all its derivatives exist and are polynomials down to the n^{th}, which is a constant, and beyond that are zero. Consideration of conclusions from calculus departs from the field of algebra, strictly speaking, but is often of utility. We'll stay away from applications of calculus in this article, except here and in the Newton-Raphson approximation method. However, transcendental functions such as the trigonometric functions, logarithms and exponents are fair game.

*Algebraic* numbers are numbers that are the solution of an algebraic equation, as x = √2 is the solution of x^{2} = 2. Rational numbers are, of course, algebraic, since they are the solutions of linear equations. The roots of quadratic, cubic and quartic equations are algebraic. Numbers that are not algebraic are called *transcendental*. The base of natural logarithms, e = lim(x→0) (1 + x)^{1/x} = 2.718... is transcendental, as is π = 3.1416... . Real numbers are the rational numbers and all limits of sequences of rational numbers (Dedekind cuts), and are in one-to-one correspondence with the points on a line. Higher algebra investigates these matters.

A *root* x_{j} of a polynomial is a value of x for which the value of the polynomial is zero, f(x_{j}) = 0. If the algebra is over the complex numbers, then a polynomial of order n has exactly n roots, which may be repeated. This result is so useful that it is called The Fundamental Theorem of Algebra. (It can be derived from the theorem that any polynomial of positive degree has at least one root, which is also called the Fundamental Theorem of Algebra.) Any polynomial can, it implies, be expressed as the product of linear factors like (x - x_{j})^{sj} for each root x_{j} repeated s_{j} times. If we restrict ourselves to reals, then any polynomial is the product of terms like this for real roots, and quadratic expressions for complex roots. Complex roots always occur in pairs with conjugate values if the coefficients of the polynomial are real.

Descartes' Sign Rule can give us a hint of how many positive or negative real roots a polynomial has. A *sign change* occurs when successive terms have opposite signs. For example, the polynomial f(x) = 2x^{5} + 4x^{3} + 3x^{2} - 1 has only one change of sign (between the last two terms). Descartes' Sign Rule says that the maximum number of positive roots cannot exceed the number of sign changes of f(x), and the maximum number of negative roots cannot exceed the number of sign changes of f(-x). In fact, the number of roots can only be less than the maximum number by an even integer. In this case, then, f(x) = 0 has one positive root (it is about 0.4844). f(-x) has two variations in sign, so the number of negative roots is either two or zero. In this case, f(x) has no negative roots, since it has two pairs of complex roots in addition to the positive root.

A more general rational function is the ratio of two polynomials, a numerator P(x) of order n, and a denominator Q(x) of order m. This function is infinite at the zeros of Q(x), at which it is said to have a *pole* of order s_{j} at the zero x_{j}, which is repeated s_{j} times. Such a rational fractional function is finite except at its poles. As x → ∞, it is infinite if n > m, zero if n < m, and approaches a constant otherwise. If n > m, the function can be expressed as an integral polynomial of order n - m and a sum of *partial fractions* representing the poles. The methods for doing this will be described below.

The HP-48 can find all the roots of a polynomial for you. First do → SOLVE, then ↓ ↓ and press OK. Suppose you want to find the roots of x^{2} + x - 2. Press ← [] to get the square brackets indicating a vector object. Type in 1 SPC 1 SPC 2 +/-, then OK. Finally, press SOLVE. The roots then appear as a vector after ROOTS:, in this case [1 -2]. This polynomial is (x - 1)(x + 2), expressed as linear factors. Complex and irrational roots will be displayed as well. For example, find the roots of x^{2} - 2, by entering coefficients [1 0 -2] instead, and also the roots of x^{2} + 2. To evaluate a polynomial, press ← [], enter the coefficients, and press ENTER. Then type in the value of x for which the polynomial is to be evaluated and press ENTER. Finally type ← SOLVE (note ←!) then POLY (to get this menu), and PEVAL to evaluate the polynomial.

When evaluating a polynomial manually, it is best to use Horner's Method rather than by evaluating each term separately and adding. Start by multiplying the value of x by c_{n}, then add c_{n-1}. Multiply the result by x and add c_{n-2}, and repeat until the last term is reached. For example, to evaluate x^{2} + x - 2 at x = 3, do 1 times 3 (3), add 1 (4), times 3 (12), minus 2 (10). There are fewer multiplications, and less roundoff error, this way.

A monomial may contain more than one variable; for example, 2xy or -4x^{2}y. From these we can find general rational functions of more than one variable. There are certain special cases that have interesting properties, such as functions where the sum of the exponents is constant, such as ax^{2} + bxy + cy^{2}, which is a *homogeneous* function of order 2, so that f(kx,ky) = k^{2}f(x,y). Otherwise, it is difficult to make general statements.

If P(x) and Q(x) are two algebraic functions of x, the statement P(x) = Q(x) may be either an *identity*, if it is true for any value of x, or just an *equation* otherwise. In that case, it may be true for certain *roots* of the equation, or may never be true, in which case the equation is called *inconsistent*. A great deal of practical algebra is devoted either to finding identities, or to finding the roots of equations. An identity is sometimes expressed by the symbol ≡ if this is to be emphasized. This symbol is more often used for a *congruence* (see below).

We have already seen that an identity can be obtained by multiplying out indicated multiplications, as in (a + b)(a - b). The inverse process, which is not as straightforward, is called *factoring*. It is, nevertheless, a powerful tool in algebraic manipulations. A polynomial can be factored into linear factors if we know its roots. Most factoring is done, however, by recognizing familiar patterns and exercising ingenuity. For example, the difference of squares factors into the product of their sum and difference, while (x^{3} - y^{3}) = (x - y)(x^{2} + xy + y^{2}). The terms of a polynomial may be grouped so that common factors, often linear ones, appear, and can be "factored out." Sometimes a term must be added and subtracted (which does not change the value of the expression) in order to do this, as in completing the square. An introduction to algebra must spend a lot of time on these matters, to develop the student's facility in working with symbolic expressions.

Suppose we have a rational fractional expression F(x) = P(x)/Q(x), where the order n of the numerator is greater than the order m of the denominator Q(x). We can express this function as an entire rational expression plus partial fractions. The first step is to find the entire rational expression by a formal division of P(x) by Q(x). The quotient will be the desired rational expression, and the remainder R(x) will give us a function R(x)/Q(x) that can be expressed as the sum of partial fractions. Also, we have expressed P(x) as P(x) = Q(x)F(x) + R(x).

The division is carried out in the form of a long division in arithmetic, but working with the symbols instead. This is best shown by example. Let F(x) = (x^{3} - 2x^{2} + x -5)/(x^{2} + 2x + 1). We write x^{2} + 2x + 1 ) x^{3} - 2x^{2} + x - 5. The divisor "goes" x times. We write this down as the first part of the quotient, multiply the divisor to give x^{3} + 2x^{2} + x and subtract from the dividend, with the result -4x^{2} - 5. The divisor goes -4 times into this, making the quotient x - 4, while multiplying the divisor gives -4x^{2} - 8x - 4. Subtracting this, we find 8x - 1. This is of lower order than the divisor, so it is the remainder. The answer is F(x) = (x - 4) + (8x - 1)/(x^{2} + 2x + 1).

The denominator of the fractional part has a double root at x = -1. We suppose that it can be written as A/(x + 1) + B/(x + 1)^{2}, where A and B are constants to be determined. If this is so, then (x + 1)^{2}F(x) = 8x -1 = A(x + 1) + B. Equating the coefficients of equal powers on both sides of this identity, we find A = 8, B = -9. Therefore, F(x) = (x - 4) + 8/(x + 1) - 9/(x + 1)^{2}. Any rational fractional function can be expressed in this canonical form.

If we divide a polynomial f(x) by a linear factor (x - a) and obtain a quotient q(x) and remainder r independent of x, so that f(x) = (x - a)q(x) + r, then it is clear that f(a) = r, which proves the Remainder Theorem that the remainder after division of f(x) by (x - a) is the value of f(x) at x = a.

Division of polynomials can be carried out in an abbreviated form using only the coefficients. This procedure is called *synthetic division*.

Given a polynomial f(x), other polynomials can be formed that have roots shifted by a certain constant b, multiplied by a constant k, or that are reciprocals of the original roots. Let's show how to do this with a simple example, x^{2} + 2x - 3, which is just (x - 1)(x + 3), so the roots are +1 and -3. To form a polynomial with reciprocal roots, just take the coefficients in opposite order: 1 + 2x - 3x^{2}. Factored, this is (-3)(x - 1)(x + 1/3), so the roots are now +1 (1/1) and -1/3 (1/-3). To multiply the roots by k, multiply each coefficient by a power of k, starting with k^{0}, then k^{1}, and so on. The resulting polynomial is x^{2} + (2)(2)x - (2^{2})(3), or x^{2} + 4x - 12. This factors to (x - 2)(x + 6), so the roots are +2 and -6, twice the original roots. These rules can be proved by substitution for x in the original polynomial.

To shift a root by b, we want to express the polynomial as a_{0} + a_{1}(x - b) + a_{2}(x - b)^{2} + ... + a_{m}(x - b)^{m}. This can be written f(x) = a_{0} + (x - b)q(x). Therefore, we can find a_{0} by dividing f(x) by (x - b) and taking the remainder. The next coefficient can be found in the same way by dividing q(x) by (x - b) and taking the remainder, and so on.

Let's shift the roots of x^{2} + 2x - 3 by -1, so that b = 1. Dividing the polynomial by (x - 1) gives q(x) = x + 3 and remainder 0 (= a_{0}). Dividing (x + 3) by (x - 1) gives q(x) = 1 (= a_{2}) and remainder 4 (= a_{1}). Therefore, the new polynomial is x^{2} + 4x, or x(x + 4), with roots 0, -4 as desired. To shift the other way, divide by (x + 1) instead. The result is x^{2} - 4, with roots +2 and -2.

It is very useful to be able to find the roots of a general function F(x), which are the solutions of the equation F(x) = 0, whose *degree* is the largest power of x involved. If F(x) is a polynomial P(x), we have seen that the HP-48 can give us these roots without difficulty for any polynomial. However, the HP-48 uses an approximate method of solution that can be refined to any degree of precision, but is not expressed as an exact number. If the order of the polynomial is 1, then the equation ax + b = 0 is very easily solved in the form x = -b/a. Indeed, if a and b are rational numbers, then the solution is also rational.

If P(x) is of second order, ax^{2} + bx + c = 0, the situation becomes much more complicated. Now the solutions cannot be expressed, in general, as rational numbers, but square roots, and even square roots of negative numbers, are required. In fact, the *quadratic formula* x = [-b ±√(b^{2} - 4ac)]/2a, that can be obtained by completing the square, shows this explicitly. The quantity (b^{2} - 4ac) is called the *discriminant*. If it is greater than zero, the roots are real and distinct; if it is zero, the roots are real and equal; and if it is negative, the roots are complex. Solution of the quadratic equation requires the introduction of not only real (irrational) numbers, or *surds*, but even complex numbers.

By completing the square, we mean that we make an identity transformation of x^{2} + (b/a)x + (c/a) = 0 by adding and subtracting (b/2a)^{2}, and factoring as (x + b/2a)^{2} + (c/a) - (b/2a)^{2} = 0. Then, x + b/2a = ±√(c/a - b^{2}/4a^{2}), and the quadratic formula follows immediately. When using the quadratic formula, it is best to find that root that involves the sum of numerical values, and then to find the second root by using x_{1} + x_{2} = -b/a or x_{1}x_{2} = c/a, thereby avoiding roundoff error when one root is much smaller than the other.

To appreciate this consideration of round-off error, suppose we have two numbers, a = 1.001 and b = 1.000, and want the difference a^{2} - b^{2}. If our calculator gives 7 or more significant figures, then we find a^{2} = 1.002001 and b^{2} = 1.000000, so that the difference is 0.002001. Although this result is exact, it has only four significant figures. Three significant figures have been lost in the subtraction. If we wanted even greater accuracy, then the squaring would have to be carried out beyond 7 figures. However, algebra tells us that a^{2} - b^{2} = (a - b)(a + b). Now, using the factored form, the difference of the squares is (0.001)(2.001) = 0.002001, again the correct result. However, note that a calculator with only four significant figures would also give four significant figures in the answer; no precision would be lost.

The algebraic solution of cubic equations, a problem of long standing, was one of the challenges of Renaissance algebra. Niccolò Tartaglia (1500-1557) seems to be the first to have solved the general cubic, around 1521, which he kept a trade secret. The solution was revealed by Girolamo Cardano (1501-1576) in 1545. The canonical form is ax^{3} + bx^{2} + cx + d = 0. If we divide by a, and introduce y = x + b/3a, this becomes y^{3} + 3py + 2q = 0, where 3p = (3ac - b^{2})/3a^{2} and 2q = 2b^{3}/27a^{3} - bc/3a^{2} + d/a. If D = q^{2} + p^{3} > 0, then there is one real and two complex roots; if D < 0, there are three real roots. If D = 0, two of three real roots coincide. The degenerate case of p = q = 0 is trivial. Then, if u = [-q + √D]^{1/3} and v = [-q - √D]^{1/3}, the solutions are y_{1} = u + v, y_{2} = eu + e'v, y_{3} = e'u + ev, where e,e' = -1/2 ±(i/2)√3. Some other ways of expressing these roots exist. This method is more trouble than it is worth with the availability of easy approximate methods.

There are also methods for obtaining the roots of fourth-order equations in terms of radicals, but they are as inconvenient as Cartan's formulas. The roots of fifth- and higher-order equations are in general not expressible in terms of radicals, as was finally proved by Niels Abel (1802-1829) in the early 19th century. As a practical matter, note that it is not difficult to find numerical values for roots of equations of any degree by approximate methods.

Simultaneous equations are sets of equations in more than one variable that are satisfied for certain values of the variables. Systems of linear equations is a particular case for which a large amount of algebra exists, including matrices and determinants, which is called *linear algebra*. The simplest case is the system ax + by = e, cx + dy = f. If we solve the second equation for y, y = (f - cx)/d, and substitute in the first, we have ax + (b/d)(f - cx) = e, or (a - bc/d)x = e - bf/d, or x = (ed - fb)/(ad - cb), from which it follows that y = (af - ce)/(ad - cb). We can see here the application of the familiar Cramer's Rule using 2 x 2 determinants. A solution is not possible if ad = cb, or a/c = b/d, unless the numerators also vanish, which implies that a/c = e/f as well, and the equations are multiples of each other, or that e and f vanish, in which case only the ratio of y to x is determined. These matters are further discussed in the section on Matrices and Determinants.

Equations, and systems of equations, can be handled by certain "rules" that simply follow from the properties of arithmetic. Equal quantities can be added to, subtracted from, or multiplied with, both sides of an equation. The sides of an equation may be divided by equal quantities, provided the quantities are not zero. The result is not valid for values of a variable that cause such quantities to become zero. Inadvertent division by zero is a constant danger in algebra, where a quantity may be zero without appearing to be. Corresponding sides of equations may be added. This means that any multiple of one equation may be added to another equation, side to side, without changing the equality. These *row* operations may be used to solve linear equations. A term can be moved from one side of an equation to the other, provided that is sign is changed. We can solve an equation like 3x + 8 = 4x - 2 by first moving the 8 and the 4x, to get 3x - 4x = -8 - 2, or -x = -10, or, finally, x = 10.

Numerical methods are the most practical way to find roots of an equation f(x) = 0. The first step is to find a value near the root, or better, to bracket the root by two values at which the signs of the function are opposite. This can be done by plotting y = f(x), or by making trial evaluations until the general pattern can be discerned. Then, the approximate root is refined by an iterative procedure until the desired accuracy is attained. We'll only mention ones here that can be used for any kind of equation, not just for finding roots of polynomials; for others, see a book on the theory of equations.

A basic method of refinement is called *regula falsi*, the "rule of false position," also known as trial-and-error. It is essentially interpolation between known values of the function f(a) and f(b) at two locations a and b. The estimated root is then x = [af(b) - bf(a)]/[f(b) - f(a)]. Of course, this is best if f(a) and f(b) have opposite signs, so that x lies between a and b. This determines a new position, which can then be combined with one of the original ones to get a still better approximation. To see this, suppose f(b) = -f(a). Then x = (a + b)/2. If f(x) is linear, this will be the desired root. If f(x) is nonlinear, it will be closer to the actual root than either of the previous values. If f(a) and f(b) do not bracket a root, the method is sometimes called the *secant method*, but there really is no difference.

Newton's Method (also called the Newton-Raphson method) extrapolates along the tangent to the curve f(x) at the point x = a. The next approximation is at x = a - f(a)/f'(a). If conditions are favorable, that is, if f'(a) >> f(a), then convergence on the root is very rapid. By considering a nonlinear function, it can be seen that the next approximations by Newton's Method and regula falsi bracket the actual root, which is sometimes of service. This method involves the derivative, which is outside of algebra, but for polynomials is very easy to obtain.

A third method, that of simple iteration, is less general but very convenient when it is applicable. The equation is written in the form x = f(x). Successive approximations are found by substituting the previous approximation in f(x). As we shall see, this works only when |f'(x)| < 1. When this is not true for f(x), it often works to consider the inverse equation x = f^{-1}(x) instead. For example, consider x = tan(x). Here, f'(x) = 1 + tan^{2}(x) > 1, so we can't use iteration directly. However, for x = tan^{-1}(x) f'(x) = 1/(1 + x^{2}), which is indeed < 1, and we can iterate.

In this equation, we must be careful to stay on the correct branch of tan^{-1}x. The calculator returns the principal value, Tan^{-1}(x), which lies between ±(π/2). This branch does not intersect y = x except at the origin, x = 0. The next branch in the positive direction is found by adding π, so the equation we actually iterate is y = Tan^{-1}(x) + π to find the intersection with this branch. Put the HP-48 in the RAD mode. Then, with a starting value of x = 4.7, the successive iterations give 4.5027, 4.4938, 4.4934 and 4.4934. If we want 5 significant figures, we can stop here. To solve x = tan(ax), iterate x = (1/a)(Tan^{-1}(x) + π) instead. Where it works, this method is stable and self-correcting, probably the easiest to use.

To see better what is going on, suppose f(x) = c + ax. It is clear that the solution of x = f(x) is x = c/(1 - a) in this case. Take any starting value x_{0}. Then x_{1} = c + ax_{0}, x_{2} = c + ac + a^{2}x_{0}, and so on. In general, x_{n} = c(1 + a + a^{2} + a^{3} + ... + a^{n-1}) + a^{n}x_{0}. If |a| < 0, then the contribution of x_{0} goes to zero, and x goes to c/(1 - a), which is the correct result. This shows clearly why |a| must be less than zero for convergence.

Suppose we have an integral count of hours. Then, two values separated by a multiple of 24 will represent the same time of day. If a and b are the two values, this is expressed as a ≡ b (mod 24), read "a is congruent to b mod 24," and meaning that a and b give the same remainder when multiplied by 24, or that a = b + km, where k is some positive or negative integer. That is, 7 ≡ 2 (mod 5) and 12 ≡ -1 (mod 13), for example. In the degenerate case of m = 0, congruence reduces to ordinary equality. Congruence is an *equivalence relation*, since a ≡ a (mod m), if a ≡ b (mod m) then b ≡ a (mod m), and if a ≡ b (mod m) and b ≡ c (mod m), then a ≡ c (mod m).

It is not hard to prove that if a ≡ b (mod m) and c ≡ d (mod m), then a ± b ≡ c ± d (mod m), ab ≡ cd (mod m) and a^{n} = b^{n} (mod m). Also, if a ≡ b (mod m) and f(x) is a polynomial with integral coefficients, then f(a) = f(b) (mod m). Every integer is congruent to some integer in the range 0, 1, ..., m-1. That is, for days (mod 7), any day is congruent mod 7 to mon = 0, tue = 1, wed = 3, thu = 4, fri = 6, sat = 7 or sun = 8. Therefore, if we have a product ab = c, then this also holds for the integers congruent to a, b and c.

Any decimal integer can be written a_{0} + 10 a_{1} + 100 a_{2} + ... + 10^{n} a_{n} + ... . Now, 10 = 1 (mod 9), so 10^{n} = 1 (mod 9) in general. Therefore, this number is congruent to a_{0} + a_{1} + a_{2} + ..., which is just the sum of its digits. The resulting number can be treated the same way, so that it is possible to find an integer from 0 to 9 congruent to any integer by simply adding digits. This is the basis of the method of checking arithmetical operations by *casting out nines*. Each number is converted to a single integer to which it is congruent mod 9, and the operation is easily done with the single integers. It should, of course, give the same results.

For example, is 37 x 58 = 2146 correct? 3 + 7 = 10, 1 + 0 = 1, so 37 ≡ 1 (mod 9) and 5 + 8 = 13, 1 + 3 = 4, so 58 = 4 (mod 9). Their product should be ≡ 4 (mod 9). 2 + 1 + 6 = 9, which we can cast out, leaving 4, so 2146 ≡ 4 (mod 9), and the arithmetic checks. This check does not catch simple transposition of adjacent integers, since this does not change the congruent number.

Since 10 = -1 (mod 11), it is also easy to find the smallest congruent integer by adding digits multiplied by (-1)^{n}, which is *casting out elevens*. Casting out elevens will catch transpositions, unlike casting out nines, but it is a little harder to do. If we had 37 x 58 = 2416, casting out nines would not catch the error. However, 37 ≡ 4 (mod 11), since 7 - 3 = 4, and 58 ≡ 3 (mod 11) since 8 - 5 = 3, so the product should be 12 ≡ 1 (mod 11). It is, however, ≡ 7 (mod 11), so the error is found. 2146 ≡ 1 (mod 11) is the correct answer.

Congruences can be used to predict the even divisibility of numbers. If a ≡ 0 (mod m), then a is divisible by m without a remainder. We have already seen that finding congruences mod 9 is very easy. $2,211,111 ≡ 0 (mod 9), so it is evenly divisible by 9. Since 10 ≡ 1 (mod 3), any number is congruent to the sum of its digits mod 3, and so is divisible by 3 if the sum of its digits is. For example, $510 is evenly divisible by 3 ($170). Since 10 ≡ 0 (mod 2), a number is even or odd as its units digit is even or odd (no surprise).

Algebra can be developed logically from postulates and using set theory. If we begin with the positive integers 1, 2, 3, ..., we can create a set of elements closed under addition (and subtraction) by adding 0 and the negative integers. Addition is associative, 0 acts as an identity, since a + 0 = 0 + a = a, and for every element a there is an inverse -a. Such a set of elements is called a *group*, and has valuable properties, largely coming from closure and the existence of inverses. The group of integers is a commutative group, since addition is commutative. Many interesting groups are not commutative. If we define a second operation on the elements, which we may choose for integers to be multiplication, the set is also closed under multiplication, we have an identity for multiplication, 1, and the multiplication is associative and distributive with respect to addition. Such a set is called a *ring*. It is necessary that addition be commutative, but commutative multiplication is not required, nor is a unit for multiplication (the even integers have no such element). The integers are a good example of a ring, as do the even integers alone. Polynomials form a ring, and this gives them some properties analogous to those of integers. We shall not consider polynomials here, and stick to integers, but there are interesting analogous properties.

If there is a unit element for multiplication in the ring, multiplication is commutative, and--most importantly--a multiplication inverse for every element is in the set, the ring becomes a *field*. The rational numbers form a field, as do the real numbers and the complex numbers. The field of rationals is often denoted R, of reals R*, and of complex numbers R*(i). To get the complex number field, we *adjoin* the element i, which satisfies the algebraic equation x^{2} + 1 = 0. If the ring has everything needed to be a field except the inverses, it is called an *integral domain*. There is much more to be said about these things; this is only the briefest mention of the definitions.

A *divisor* c of an integer a is an integer that divides a exactly; that is, a/c has no remainder. The *greatest common divisor* (GCD) of two integers a,b is the largest integer c that divides both a and b, and is written c = (a,b). This was also long known as the greatest common measure, GCM. We can always write any integer as a product of prime factors, and if we know the factors, the GCD is easily determined. For example, 24 = 2^{3}.3 and 18 = 2.3^{2}, so the GCD is 2.3 = 6. But what about (3565,1426)? To avoid having to find the prime factors, we can use *Euclid's Algorithm*. First, we divide 3565 by 1426 and find the remainder, 713. Now we divide 1426 by the new remainder, and find that 1426 = 2.713 + 0; it divides evenly. This means that 713 is the GCD of 3565 and 1426. Indeed, 3565 = 5.713 and 1426 = 2.713.

Now let's find (2148,1633). The remainders are 515, 88, 75, 13, 10, 3, 1 and finally 3 = 3.1 + 0. The zero remainder comes only when we have worn out the number, so that (2148,1633) = 1, and the two numbers are said to be *relatively prime*. The GCD of two polynomials can be found in exactly the same way. It is much easier to understand after doing the arithmetic than simply by reading about it.

We might also be interested in the *least common multiple* (LCM) m = ka = jb, where k and j are integers. This is denoted m = [a,b]. Considering the prime factorization, it is clear that [a,b] = ab/(a,b), so we can find the LCM by Euclid's Algorithm without finding the prime factorization. Again, polynomials behave the same way. For example, [3565,1426] = 5.2.713 = 7130, which is 5 times the smaller number and 2 times the larger. On the other hand, [2148,1633] = 3507684.

The basic inequalities are >, "greater than," and <, "less than." A simple inequality may be specified, by &ne:, or limit points may be added to the basic inequalities as ≥ and ≤. The basic inequalities have the properties of: (a) if a>b, then b<a; (b) if a>b and b>c, then a>c; (c) if a>b, then a+c>b+c and a-c>b-c; (d) if a>b and c<d or a<b and c>d, then a-c>b-c. Two inequalities in the same direction may not be subtracted; (e) If a>b and c>0, then ac>bc and a/c>b/c. However, if c<0, then ac<bc and a/c<b/c. If both sides of an inequality are multiplied by a negative number, the direction of the inequality is reversed.

Using these rules, inequalities may be manipulated like equations, using the above rules. An inequality that holds for all values of the variables is an identity inequality. For example, x<2 is identical to x - 2 < 0. An inequality may be solved for the variable. For example, 5x + 3 < 8x + 1 becomes 5x - 8x < 1 - 3, or -3x < -2, or x < 2/3. Such first degree inequalities often appear, and solving them gives a clearer picture.

If a and b are the sides of a triangle, then |a + b| ≤ |a| + |b|, which is called the *triangle inequality*. It can be extended to |a + b + c + ...| ≤ |a| + |b| + |c| + ... . For real a, b, etc. the equality holds only if all numbers have the same sign. We also have that |a| + |b| ≥ |a - b| ≥ |a| - |b|. The Cauchy inequality is (a_{0}b_{0} + a_{1}b_{1} + ... + a_{n}b_{n})^{2} < (a_{0}^{2} + a_{1}^{2} + ... + a_{n}^{2})(b_{0}^{2} + b_{1}^{2} + ... + b_{n}^{2}). For n = 3, this states that the scalar product of two vectors is less than or equal to the product of the lengths of the vectors, but it is true for any n. The arithmetic mean of n quantities is greater than or equal to the geometric mean, or (a_{0} + a_{1} + ... + a_{n})/n ≥ (a_{0}a_{1}...a_{n})^{1/n}. Also, the absolute value of the arithmetic mean is less than the square root of the average sum of the squares of the quantities (rms value).

Inequalities play an important role in mathematical analysis. For example a function f(x) is *continuous* at x = a, if for any positive ε there is a δ such that when 0 < |x - a| < δ, |f(x) - f(a)| < ε.

Inequalities may be used to solve problems of maxima or minima, where the usual approach is to use calculus. For example, to show that the square is the rectangle of maximum area for a given perimeter, consider a rectangle of sides a and b. Then, we wish to show that [(a + b)/2]^{2} ≥ ab, or that the area of the square is greater than or equal to the area of a rectangle of the same perimeter, the equality holding only for a = b. Multiplying this out, we find that a^{2} + 2ab + b^{2} ≥ 4ab. Then, a^{2} - 2ab + b^{2} ≥ 0, or (a - b)^{2} ≥ 0, which is certainly the case, the equality holding only for a = b. The assertion is proved. We have also shown that when the sum of two quantities is constant, their product is a maximum when the quantities are equal, a conclusion often of use. This method was first used by Pierre de Fermat. The figure of maximum area for a given perimeter is a circle. When Dido was granted as much land as could be covered by the hide of an ox for her city of Carthage, she cut the hide into thin strips, sewed them together, and marked out a semicircle on the beach.

A *matrix* is a rectangular array of elements a_{ij} where this element belongs to the i^{th} row and j^{th} column of a total of m rows and n columns that constitute the m x n matrix, generally written within square brackets. A *square* matrix has m = n. A matrix may be represented by a single capital letter, as A. The elements may be thought of as real numbers, though more general elements are possible. These numbers may represent the coefficients in a system of linear equations, a transformation of coordinates, or a quadratic form, with many applications in physics. There is a great amount to say about matrices and related subjects. Only some of the most important properties can be mentioned here. Matrices were introduced by Cayley in 1857.

We have already used the *index notation* to describe arrays of quantities. An index is written as a subscript, and in any particular problem is assumed to vary in a certain *range*, normally 1, 2, ..., n. The symbol a_{i} stands at once for all the n values. Such an index is called a *free* index. Any particular value is denoted by a_{(i)}, where the index is put in parentheses. This symbol stands for one of the n values, and may be explicit, as a_{(2)}. A repeated index in quantities standing together, as a_{i}b_{i} implies a summation over the n values of the index. That is, if n = 3, then a_{i}b_{i} = a_{1}b_{1} + a_{2}b_{2} + a_{3}b_{3}. An index used this way is called a summation or *dummy* index, since the letter used for the index is inconsequential and can be changed at will. No more than two indices can be the same in a term. If this happens, change the dummy index to something different. If no summation is intended, then the indices are put in parentheses, as a_{(i)}b_{(i)}, which represents only one of the three terms in the summation above. This notation is more than a convenience; it is a powerful method of "doing algebra" on such array quantities, and can be used to prove all of the properties of matrices. On the other hand, the representation of matrix algebra with capital letters is only suggestive, and is not nearly as powerful as index notation. For more information on indices and how to use them, see Euclidean Tensors.

Two matrices are equal, A = B, only if they are the same shape (have equal numbers of rows, and of columns), and if corresponding elements are equal: a_{ij} = b_{ij}. Matrices of the same shape can be added or subtracted, element by element: C = A + B means c_{ij} = a_{ij} + b_{ij}. A matrix is multiplied by a number λ by multiplying each element by λ: C = λA means c_{ij} = λa_{ij}. It is seen now that we can make linear combinations of matrices. The matrix product C = AB means c^{ij} = a^{ik}b^{kj} (note the sum!). Clearly, the number of columns in the prefactor must equal the number of rows in the postfactor (though most matrices encountered are square). This rule of multiplication means that matrix multiplication is equivalent to the successive performance of linear transformations. Unlike the multiplication of real numbers, matrix multiplication does not commute in general; that is, AB ≠ BA. The unit matrix I has all elements on its diagonal 1, all off-diagonal elements zero, in indices I = δ_{ij}, called the *Kronecker delta*. Note that δ_{ik}δ_{kj} = δ_{ij}. Since the deltas are symmetric, the indices may be exchanged if desired. Summing over a Kronecker delta simply turns one free index into another. A constant, or *scalar* matrix can be written λI. Using index notation, it is easy to prove that scalar matrices always commute, since each diagonal element is the product of the two reals, and real numbers commute. The null matrix N is characterized by n_{ij} = 0. Note that this is n x m equations, written very concisely.

Vectors are represented by matrices with one column or one row, and symbolically by boldface small letters. In the index notation, both are written simply a_{i}. If we need to distinguish between them, {a_{i}} is a column vector, and [a_{i}] is a row vector. A matrix M represents a linear transformation **b** = M**a** of a column vector **a** to a column vector **b**. In indices, b_{i} = M_{ij}a_{j}. If this is written out as arrays, the vectors are column vectors. Alternatively, we could write b_{i} = a_{j}M_{ji}, and as arrays the vectors would be row vectors. These are, of course, not the same thing in general. Summing over the first index in M can give different results than summing over the second index. In the index notation, a_{i} can be written on either side of M; what is important is the summation index. There is no confusion in the index notation, but one must be careful when writing arrays in the usual way.

A matrix is *transposed* by interchanging rows and columns, written M' if m'_{ij} = m_{ji}. Sometimes a superscript T is used instead of the prime. The matrices transforming column and row vectors in the preceding paragraph are transposes of each other. It is easy to prove that (AB)' = B'A' using indices. For, (AB)' = (a_{ik}b_{kj})' = a_{jk}b_{ki} = a'_{kj}b'_{ik} = b'_{ik}a'_{kj} = B'A'. If A = A', or a_{ij} = a_{ji}, the matrix A is *symmetric*. If a_{ij} = -a_{ji}, the matrix A is *antisymmetric*. Note that in this case the diagonal elements will vanish.

The *complex conjugate* of a matrix A is the matrix A*, with elements a*_{ij}. Clearly, (AB)* = A*B*. The Hermitian conjugate of a matrix A or a_{ij} is the matrix A^{†} with elements a*_{ji}. A matrix is Hermitian if A^{†} = A.

Values are associated with a block of elements of a matrix by means of *determinants*, which are *square* arrays of elements written between absolute value bars. Determinants were introduced by Vandermonde in 1771, antedating matrices, but the present notation is due to Cayley (1841). They were originally called *eliminants*. The value associated with a determinant does not change if the rows or columns are subjected to transformations like those that do not change equalities if the elements are supposed to be the coefficients in linear equations. In particular, a multiple of any row or column may be added or subtracted from another without change in the value of the determinant, and consequently if any row or column is proportional to another, the value of the determinant is zero. If two rows or two columns are exchanged, the value of the determinant is multiplied by -1. If rows and columns are interchanged, or *transposed*, the value of the determinant is unchanged. The determinant of the product of two matrices is the product of the determinants: det (AB) = (det A)(det B).

A *partition* of a matrix consists of those elements simultaneously in certain rows and columns, not necessarily adjacent ones, though in most cases they are. If a matrix is partitioned by imagining dotted lines separating the partitions, it can be thought of as consisting of such super-elements, which can take part as units in working with the matrix. A partition can consist of as little as one element, or as much as the whole matrix. A partition with an equal number of rows and columns has a determinant. There is nothing special about partitions; they are simply collections of elements identified as a block.

Because I cannot represent a determinant in the usual form easily in HTML, the rows will be written on one line, separated by commas, as in D = |a b, c d|, where the first row is (a b), and the second is (c d). The value of this 2 x 2 determinant is D = ad - bc, the difference between the products of the diagonals. A 3 x 3 determinant E = |a b c, d e f, g h j| is most easily evaluated by expanding in minors of any row or column. To find the minor of a certain element, we strike out all elements in the same row and column. Expanding E by minors of the first row, we have a(ej - hf) - b(dj - gf) + c(dh - ge). Note that we must prefix each term with a sign, that is ±1 alternating as we count from the 0,0 element at the upper left-hand corner. If you are not familiar with this, try expanding by minors of, say, the second row (signs will be - + -). There is another method (Sarrus's formula), in which the first two columns are written to the right, and diagonal lines of 3 are multiplied, + when from lower left to upper right, and - when from lower right to upper left.

However you expand the 3 x 3 determinant, you will get aej + bgf + cdh - ahf - bdj - cge, which are the permutations of indices from each row or column with even permutations (even number of transpositions) with + prefixed, and odd permutations with - prefixed. A general 3 x 3 determinant may be expressed as D = ε_{ijk}a_{1i}a_{2j}a_{3k}, where ε_{ijk} = +1 if i,j,k is an even permutation of 1,2,3, -1 if an odd permutation, and 0 otherwise. Similar expressions exist for any dimension n. Try it for a 2 x 2 and it will be clear what is going on. All the algebraic properties of determinants can be worked out by algebra using this expression. Again in 3 dimensions for clarity, ε_{rst}b_{ir}b_{js}b_{kt} = det B ε_{ijk}. This can be used to prove that det (AB) = det A det B. First, write c_{ij} = a_{ik}b_{kj} and then det C = ε_{ijk} a_{1r}b_{ri} a_{2s}b_{sj} a_{3t}b_{tk} = a_{1r}a_{2s}a_{3k} ε_{rst}det B = det A det B.

It is very difficult to expand even a 3 x 3 determinant by hand without making some arithmetic error, so determinants are not a very practical calculational tool, however powerful they are in theory. In solving simultaneous linear equations, many good ways have been worked out that avoid determinants (for example, pivotal condensation or Gaussian elimination). The HP-48 will, however, evaluate determinants accurately for you. All you have to do is enter the coefficients and press a button.

The *inverse* of a matrix A is a matrix A^{-1} such that AA^{-1} = A^{-1}A = I. A square matrix A has an inverse A^{-1} provided its determinant, det A, is nonzero. If we have a system of equations **b** = A**x** and the matrix of the coefficients A has an inverse A^{-1}, then the solution for **x** is **x** = A^{-1}**b**, after multiplying both sides by the inverse. The elements of the inverse matrix are the *cofactors* of each element a_{ij} divided by det A, and put in the transposed position. The cofactor of any element is the determinant of order n - 1 formed by striking out the row and column of the element, times the sign (-1)^{i+j}. For example, let A = [2 1, -1 1]. Then, det A = 3. The cofactor of the 2 is 1, of the (1,2) 1 is -1 x -1 = 1, of the -1 is -1 x 1 = -1, and of the (2,2) 1 is 2. Therefore, A^{-1} = (1/3)[1 -1, 1 2]. Multiplying A by A^{-1} in either order will show that we have found the inverse. The HP-48 will find an inverse matrix if you simply press the 1/x key!

If we write out the solution for simultaneous linear equations using the inverse matrix, we will find that it is just the familiar Cramer's Rule, as follows. Suppose we have n equations in n unknowns, written in the usual form with the constant terms on the right of the equal signs. Let the determinant D be the determinant of the coefficients on the left of the equal signs. If D ≠ 0, then the equations have a unique solution. Let D_{x} be the determinant in which the column corresponding to the unknown x in D has been replaced by the column of the constant terms. Then the value of x that satisfies the equations is x = D_{x}/D, and similarly for the other unknowns. This solution is practically useful only for small n, because of the difficulty of evaluating determinants. If you try it for the simple 2 x 2 case, it will be clear that it is equivalent to solving for y in one equation, and substituting it in the other to find x.

A real matrix Q whose inverse is its transpose is said to be *orthogonal*. If the vectors **x** and **y** are transformed by such a matrix, then x'_{i} = q_{ij}x_{j} and y'_{i} = q_{ij}y_{j}. The scalar product of these vectors is x'_{i}y'_{i} = q_{ij}q_{ik}x_{j}y_{k} (note that the second dummy variable has been changed from j to k to avoid confusion with the first). Now, if q_{ik} = q^{-1}_{ki} (transpose equals inverse), the sum over i gives δ_{kj}. Summing over either k or j gives x'_{i}y'_{i} = x_{k}y_{k}. Therefore, an orthogonal transformation does not change the scalar product. It represents a rotation of the coordinate axes only. For complex matrices, the equivalent is a matrix whose inverse equals its Hermitian conjugate, since in the scalar product the first vector is conjugated. Such a matrix is called *unitary*.

Let's suppose we have a matrix A, and think of it as specifying a transformation of a vector x into a vector y symbolically by y = Ax. If for some value of the number λ we can find a vector x such that Ax = λx, then x is called an *eigenvector* of A belonging to the *eigenvalue* λ. Taking the complex conjugate, we have A*x* = λ*x*. Transposing, we have x*'A*' = λ*x*'. From the first equation, x*'Ax = λx*'x, and from the second, x*'A*'x = λ*x*'x. Subtracting these two equations, we find that 0 = (λ* - λ)x*'x, provided A satisfies the condition A = A*'. Therefore, if A is either real and symmetric, or complex and Hermitian, any eigenvalue λ is real, since the norm of a vector, x*'x, is not, in general, zero. Furthermore, if μ is a different eigenvalue, corresponding to a different eigenvector y, so that Ay = μy, then the same reasoning gives 0 = (λ - μ)y*'x. Since the eigenvalues are different by hypothesis, we must have y*'x = 0, or the eigenvectors belonging to different eigenvalues are *orthogonal*. In what follows, we shall assume that A is real and symmetric, so that its eigenvalues and eigenvectors are also real. The reader can fill in the exactly similar reasoning for a Hermitian A.

Let us make a linear coordinate transformation specified by X = Bx, where X are the new coordianates and x the old. If y = Ax, then B^{-1}Y = AB^{-1}X, or Y = BAB^{-1}X. This is called a *similarity transformation*. Transformations B that preserve the scalar product of two vectors are of special significance, since they connect equivalent coordinate systems. In this case, X'Y = (Bx)'(By) = x'B'By = x'y, provided B'B = I, or B' = B^{-1}. A matrix B whose transpose is also its inverse is called an *orthogonal* matrix. Since det B = det B', det(B'B) = det B det B' = |det B|^{2} = det I = 1, so |det B| = 1 and det B = ±1. If det B = 1, then B is a *rotation*, while if det B = -1, it is a rotation combined with an inversion x → -x, y → -y. In the complex domain, B*'B = I, or B^{} = B^{-1}, and B is called *unitary*. In two dimensions, such a transformation is a rotation about the origin, so that B = [cos θ sinθ, -sinθ cosθ]. It is easily seen that det B = 1 in this case.

Since Ax = λx, (A - λI)x = 0. The condition that this system of homogeneous linear equations have a solution is |A - λI| = 0. For each root of this equation, we can find a corresponding direction, a vector whose length is unspecified. If A is n x n, there are, then, in general n eigenvalues, which are real if A is symmetric (Hermitian), and the eigenvectors are unit vectors in the corresponding directions that are normal to each other. If we rotate the coordinates to these new directions, then A becomes diagonal, and the diagonal elements are the eigenvalues. The transformation B that diagonalizes A is a matrix whose rows are the eigenvectors. The columns of B' = B^{-1} are, likewise, the eigenvectors, and BAB^{-1} will be a diagonal matrix. The eigenvalue equation shows that the sum of the diagonal elements of a matrix, its *trace*, is preserved under similarity transformation, as is the determinant.

This can be illustrated by a problem in 2 dimensions, and the reader interested in diagonalization is encouraged to work through it in detail. The HP-48 makes a useful assistant. Let's start with the symmetric matrix A = [2 1, 1 1]. Its determinant is +1, and its trace is 3, which will be the sum of the two eigenvalues. The eigenvalue equation is the quadratic λ^{2} - 3λ + 1 = 0. Hence, λ = (3 ±√5)/2, or 2.6180, 0.38197, approximately. The sum of these is, of course, 3. Putting the first eigenvalue in the equation Av - λv = 0 (we want to use x,y for the coordinates, so we change to v for the eigenvector), we find that v = (x,y) must have y/x = 0.618 (both equations that we get say this). A unit vector in this direction is v_{1} = (0.85066, 0.52571). This vector makes an angle 31.716° with the x-axis. Using the other eigenvalue, we find v_{2} = (-0.52571, 0.85066). The matrix B is [0.85066 0.52571, - 0.52571 0.85066], which is easily seen to be orthogonal. If we form BAB^{-1}, perhaps with the aid of the HP-48, we find [2.6180 0, 0 0.38196], just what we would expect. With the approximate numbers, the zeros actually turn out to be about 5 x 10^{-5}. The sum of the eigenvalues of B is 2 cosθ = e^{iθ} + e^{-θ}. The eigenvalues of an orthogonal matrix are always of this form, so that |λ| = 1.

Diagonalization has many uses in physics as well as in mathematics, especially in describing properties that vary with direction. A symmetric matrix A = a_{ij} defines a *quadratic form* f(x) = a_{ij}x_{i}x_{j} in a rectangular coordinate system, or a vector relation y_{i} = a_{ij}x_{j}. The matrix may describe moments and products of inertia, elasticity, polarizability, or any of many other properties that vary with direction. If we rotate the coordinates, the values of the coefficients a_{ij} change, but the value of f(x) at the same physical point does not. In a coordinate system that diagonalizes A, called the *principal axes*, the quadratic form becomes a sum of squares, while the vector transformation reduces to stretches along the axes, y_{i} = a_{(i)}x_{i}. The diagonal elements are called the *principal values*. Often it is enough just to know the eigenvalues, which are easily obtained with the HP-48. In many cases, symmetry may help in determining the principal axes.

Quadric surfaces in three dimensions may be classified on the basis of the principal values of the coefficient matrix (linear terms are supposed to have been eliminated). If they are of the same sign and nonzero, the surface is *elliptic*. If the signs differ, the surface is *hyperbolic*. If one (or more) of the principal values is zero, the surface is *parabolic*. In each case, the surfaces f(x) = constant are surfaces of the corresponding type. If a general quadric in two dimensions is written Ax^{2} + Bxy + Cy^{2} + Dx + Ey + F = 0, then the quantity Δ = B^{2} - 4AC is invariant under the transformations of rotation and displacement that may eliminate the terms in B, D and E. This has the same form as the discriminant of a quadratic equation, but the symbols have a different meaning. If Δ < 0, then the curve represented is an ellipse; if Δ = 0, it is a parabola, and if Δ ≥ 0, it is a hyperbola. A degenerate parabola may be represented by parallel lines, a degenerate hyperbola by intersecting lines. The ellipse may be null or imaginary.

We now return to the problem of solving the system a_{ij}x_{j} = b_{i}. If the b_{i} are not all identically zero, then a solution for the x_{i} exists only if det A = |a_{ij}| ≠ 0. Cramer's method is not a satisfactory method of solution for n > 3. If we can invert the matrix A, then x_{i} = a^{-1}_{ij}b_{j}. If we have a lot of different b_{i}, then it may be worth the effort to find A^{-1}.

Otherwise, the best method is Gaussian elimination. In this method, we use the first equation to eliminate x_{1} in all the other equations. This is done by multiplying the first equation by a_{1i}/a_{11} and subtracting it from the other equations for l = 2, 3, ..., n. The element a_{11} is called the *pivot* in this elimination, and it is best to rearrange the rows and columns to obtain the largest possible pivot, especially if the problem is "stiff" and sensitive to round-off error. This process is then repeated for the next equation, eliminating x_{2} in all the rest, and so on, until finally the n^{th} equation has only one coefficient. What has been done is that A has been converted into an equivalent upper triangular matrix T. Now, starting with x_{n}, the x_{i} are determined in reverse order. This process is called *back substitution*. This is a very practical method when carried out by hand for large n. It is not as simple to program as might appear, so computer programs for Gaussian elimination have to be cunningly written. The reader is encouraged to carry this out for n = 4, with simple coefficients, to see how practical it is.

When A has been expressed as an upper triangular matrix T, it is easy to find its determinant, which is easily seen to be just the product of the diagonal elements. There are procedures for factoring a matrix into a lower triangular matrix L and an upper triangular matrix U ("L-U decomposition) that also leads to an easy solution of the system. If A is banded, that is, if it consists of diagonal elements and a parallel row on each side, there are very efficient methods for solving the system.

In Gauss-Jordan elimination, we eliminate variables not only in the equations below, but also in the equations above. This is a lot more work, but it does turn A into a diagonal unit matrix. If we craftily augment A with an n x n unit matrix on the right (this means just to write it there, so we have an n x 2n matrix), and carry out Gauss-Jordan elimination including this augmentation, then when we are finished and the left n x n is a unit matrix, the right n x n is magically just the inverse matrix A^{-1}. If we need the inverse, then Gauss-Jordan elimination is just what we want.

A third variant is an iterative process, called Gauss-Seidel. First, we solve equation (i) for x_{i}, so that each x is expressed in terms of the others. We make starting guesses for all the x's except x_{1}, and find a value for x_{1} from the first equation. In the second equation, we use this value for x_{1} and the estimates of the others to find a new value for x_{2}. When we have found x_{n}, we have completed the first iteration. Now we go back and do it all again with the new estimates. For a well-determined system, this process will converge to the solution. It is best to have a computer do the tedious iteration, but the programs are well-behaved.

There is a problem with four equations in four unknowns at the end of this article. The fun with simultaneous equations really does not begin until the 4 x 4 problem. Try several methods of solution! Solving it by determinants would involve evaluating 16 3 x 3 determinants. Such problems are not often encountered in elementary algebra texts because of the great chance of error in the tedious arithmetic. If you can evaluate a 3 x 3 correctly 19 out of 20 times, the chance of error in solving a system of 4 equations is 80%. The HP-48 can solve the problem quickly with SOLVE, but row operations are available to perform Gaussian elimination as well, using the matrix of the coefficients augmented by the constant terms. The command RCIJ multiplies row i by a constant c and adds the result to row j, saving a lot of tedious calculation. Columns can be swapped with CSWP to select the largest pivot. The command RREF performs a Gauss-Jordan elimination directly. These resources are instructive and should be investigated.

A discussion of vectors is not usually included in an algebra course, but it is really a part of algebra, so a few words will be said about it here. We have already called 1 x n and n x 1 matrices vectors, and indeed vectors obey the usual laws of matrix algebra. The case n = 3 is especially important in applications. Vectors are n-tuples of real or complex numbers that can be added or multiplied by scalars to form *vector spaces*, but we will not be concerned with the consequences of this important part of advanced mathematics here. If we define a *scalar product* of two vectors a_{i} and b_{i} to be the scalar c = a*_{i}b_{i}, then the *norm* of a vector a_{i} is the real number a*_{i}a_{i}. Vectors for which a scalar product is defined form a *normed vector space*, with considerably more interesting properties than a general vector space. If the a_{i} are real, as they are with the normal 3-dimensional vectors commonly met with in physics, then the complex conjugation is superfluous.

Ordinary vector analysis is just a shorthand notation for what is represented more exactly by the index notation. That is, **a** = a_{i}, i = 1,2,3. We write **a** + **b** and λ**a** to stand for the equivalent index expressions. The *length* of **a** is the square root of its norm, and can be written |**a**| or simply a, like any scalar. The *direction* of **a** is specified by the *unit vector* **e** = **a**/a. We may define unit vectors **i**, **j** and **k** along the x, y and z coordinate axes of a right-handed (a rotation from +x to +y advances a right-handed screw in the +z direction) coordinate system, and write any vector as a linear combination **a** = x**i** + y**j** + z**k**. The components of a unit vector are the *direction cosines* of the direction represented, and the sum of their squares, the length of the unit vector, is unity. The value of this notation lies entirely in its brevity; otherwise, it is rather inconvenient to work with, and fails completely when there are two or more indices. Nevertheless, a great deal of algebra can be done with only a few rules.

The scalar product is represented as **a**·**b** = ab cos θ, where a and b are the lengths of the vectors and θ is the angle between their positive directions. We are quite familiar with the expression in terms of indices. The *vector product* is a peculiarity of three dimensions. It is really an antisymmetric quantity with two indices, a *pseudovector*, and can be written **a** X **b** = c_{i} = ε_{ijk}a_{j}b_{k}, in terms of the antisymmetric epsilon quantity we defined above. The expression in terms of the components is like a determinant, and can be evaluated as one. The vector product is normal to the plane of the two vectors, and its magnitude is ±ab sin θ. The scalar triple product c = (**a** X **b**)·**c** is easily written as ε_{ijk}a_{i}b_{j}c_{k}, the value of a determinant with the three vectors as rows. This happens to be the volume of the parallelepiped with the given vectors as sides. The formula for the contraction (scalar product, or summation over one index) of two epsilons for three dimensions is ε_{ijk}ε_{irs} = δ_{jr}δ_{ks} - δ_{js}δ _{kr}. This formula can be used to work out the meaning of the vector triple product (**a** X **b**) X **c**. Indeed, this product is ε_{ris}ε_{ijk }a_{j}b_{k}c_{s} = -ε_{irs}ε_{ijk} a_{j}b_{k}c_{s}. Expressing the contraction of the epsilons by the delta functions, the product is -(δ_{rj}δ_{sk} - δ_{rk}δ_{sj}) a_{j}b_{k}c_{s} = a_{s}b_{r}c_{s} - a_{r}b_{s}c_{s}, or, in vector notation, **b**(**a**·**c**) - **a**(**b**·**c**). Other vector identities can be worked out in the same fashion. Note that exchanging any two of the indices of an epsilon changes its sign, but a cyclic permutation does not. Be careful of the order of indices. Of course, if you work in the index notation from the first, you do not need any of these identities, which are not nearly as important as they may seem.

A method of expressing quantities with two indices, which is easily done in the index notation as a_{ij}, in the vector notation in terms of objects like **a****b** or **i****j**, where two vectors stand together but are not multiplied, was once proposed. In this way, two directions could be associated with a single quantity. These objects were called *dyadics*. The element a_{23} in the index notation would be expressed as a_{23}**j****k** in dyadics, and all terms had to be explicitly written. This was a very cumbersome notation, with no advantages, and has been totally abandoned in favor of indices.

The square and cube of a variable a were written "a square" and "a cube" at first, as in geometry. Then they might be abbreviated as "as" and "ac," and finally written, logically, as "aa" and "aaa." This last notation could be extended beyond the possibilities of three-dimensional space, to give "aaaa" and "aaaaa" and so on. Descartes (or someone he overheard) invented the concise index notation, so that aa = a^{2} and aaa = a^{3}, where the superscript integer was the *exponent* of a. We immediately see that a^{n}a^{m} = a^{m+n}, and a^{n}/a^{m} = a^{n-m}. We can extend the exponent to negative numbers in this way, finding that a^{-n} = 1/a^{n}, and also to zero, since if m = n, then a^{0} = 1. Also, (a^{n})^{m} = a^{nm}, so the idea of exponent can even be extended to fractional and rational exponents, since (a^{1/n})^{n} = a^{1} = a implies that a^{1/n} is the n^{th} root of a. We note that (ab)^{n} = a^{n}b^{n} and (a/b)^{n} = a^{n}/b^{n}. All these relations constitute the "laws of exponents" of elementary algebra and are easily proved for integral exponents.

The arithmetical process of finding powers is called *involution*, and the process of "extracting" roots is called *evolution*. These terms are not much used at present. Involution and evolution are easily carried out with logarithms or a pocket calculator. The process for finding square roots numerically is explained in Square Roots by Hand.

From the fact that rational exponents can be given meaning in this way, it will come as no surprise that the concept of exponent can be extended to real numbers. The exponential function y = e^{x} can help us do this, since it is defined for arbitrary complex real x. The basic property e^{u}e^{v} = e^{u+v} can be proved by actually multiplying the defining series for each and simplifying, using the Binomial theorem. Strictly, however, this lies outside algebra since infinite series are involved. Nevertheless, we may work with exponential functions in algebra to good effect.

The *logarithm* is the inverse of the exponential: x = log_{b} y if y = b^{x}. The number b is the *base* of the logarithm, and can be any real number except 1, and is usually positive. Since the exponential function y = e^{x} has so many useful properties (it is its own derivative and integral, for example), logarithms to the base e = 2.718281828... are very important, and may be denoted for brevity as ln x instead of log_{e} x. They are called natural or Napierian. Though Napier invented logarithms, they were not natural logarithms (exponents had not yet been invented). The other base commonly used is 10, and the logarithms are called, not surprisingly, common logarithms. They are also known as Briggsian, from their inventor. Logarithms to base 10 are specially suited to calculations with decimal numbers. Logarithms to the base 2 are used in connection with binary numbers. Taking the natural logarithm of the definition, ln y = x ln b, or x = log_{b} y = ln y / ln b. The factor (1/ln b) converts natural logarithms to logarithms to the base b. In particular, ln 10 = 2.302585093 and 1/ln 10 = 0.4342944819, which are useful for converting between natural and common logarithms. The properties log 1 = 0, log (1/a) = -log a, log b = 1, log 0 = -∞ can be deduced directly from the definition.

The word *logarithm* comes from the Greek for "calculating-number,"
very appropriate since they were invented to facilitate multiplication, division, raising to powers and finding roots with the new Arabic numbers, which are in general very tedious and error-prone procedures. From the properties of exponents, it is easy to see that log (ab) = log a + log b, log (a/b) = log a - log b, log (a^{n}) = n log a, and log (a^{1/n}) = (1/n) log a. Logarithms were of very great use with trigonometry, and are generally studied in the trigonometry course at an elementary level. Logarithmic scales are the basis for the slide rule.

Plotting graphs on forms with logarithmic scales should be noted here. There are accurate printed forms with scales graduated proportional to logarithms, so that the logarithms do not have to be looked up. If you do not have such forms with logarithmic scales, the L scale of a slide rule is a quick way to get logarithms with sufficient accuracy for plotting. The variation y = ax^{b} becomes log y = log a + b log x on log-log paper, a straight line. The variation y = ae^{bx} becomes log y = log a = bx, a straight line on semilog paper, which has a linear scale for x and a logarithmic scale for y. The applications of log plots are very widespread, from determining power laws to studying radioactive disintegration.

If a power law y = ax^{n} + b is definitely known, then the constants a and b can be determined by plotting y against x^{n}. The plot will be a straight line with slope a and intercept b. The same principle can be used in other cases. Note that the log-log plot determines n as well as a, and b is assumed to be zero.

The *ratio* of two quantities is, in algebra, the quotient of their numerical values, written a/b or a:b, where a is called the *antecedent*, and b the *consequent*. The symbol : means the same as /, and often has a fraction bar drawn through it. In Latin, *ratio* means "reason," and I do not know how it came to be applied to the quotient of similar magnitudes. The quantities should have the same units, so that the ratio is a pure, or dimensionless, number. Otherwise, it is just the quotient of the numerical values, perhaps a *rate*, but not a ratio. We can increase or decrease numbers in a certain ratio by multiplying by the ratio. Ratio played a very important role in pre-algebraic mathematics.

A *proportion* is an equality between ratios, as a/b = c/d, also written a:b::c:d. The name is appropriate, since if a,b and c,d are the width and lengths of two buildings, then the buildings have proportionate shapes. Geometry established rules dealing with proportions. For example, if a/b = c/d, then (a + b)/b = (c + d)/d. This was proved geometrically, though we see with algebra that it is just the result of adding unity to each side of the proportion, which does not change it. In a proportion, a and d are called the *extremes*, and b and c the *means*. Then, we can say the product of the extremes is equal to the product of the means. Using algebra, of course, this is obvious. Also, d was called the *fourth proportional* to a.

The *Rule of Three* long played an important role in the arithmetic of proportions. This rule is: To find an unknown mean, divide the product of the extremes by the known mean; To find an unknown extreme, divide the product of the means by the known extreme. Of course, this is an easy consequence of ad = bc, since then b = ad/c and a = bc/d.

If a/b = c/d, then certain other relations between quantities exist, which were given Latin names that are now rarely used. That b/a = d/c was *alternando*, a/c = b/d *invertendo*, (a + b)/b = (c + d)/d *componendo*, (a - b)/b = (c - d)/d *dividendo*, and (a + b)/(a - b) = (c + d)/(c - d) was *componendo et dividendo*. These equalities are rather obvious using algebra, but in older work after establishing a/b, you simply said: "and dividendo, (a - b)/b = (c - d)/d ... without doing any algebra. Also, if a/b was a ratio, then a^{2}/b^{2} was the *duplicate ratio* and a^{3}/b^{3} the *triplicate ratio*. √(a/b) was the *subduplicate ratio*. For example, the gravitational force is in inverse duplicate ratio of the distances. The product of the ratios (a/b)(c/d) was called the *compound* ratio of a/b and c/d. A *continued proportion* was a/b = c/d = e/f = ... . Knowing these terms may help in understanding some older references.

If a/b = b/c, then b was called the *mean proportional* between a and c, and c was the *third proportional* to a. Also, b was the *geometric mean* between a and c. In algebra, we see that this is simply b^{2} = ac, so that finding a mean proportional is finding the side of a square that is of the same area as a given rectangle. Geometrically, we would lay a and c end to end, bisect the total distance and draw a circle with this radius, then erect a perpendicular to the circle from the point common to a and c. The length of this perpendicular would then be the mean proportional, since it is an altitude of a right triangle and makes two proportional right triangles, from which the result follows. [An angle in a semicircle is a right angle.] Algebraically, of course, we simply take the square root of ac.

Suppose we divide a line of length a into two parts, x and a - x. If, then, a/x = x/(a - x), x is a mean proportional between a and a - x. This is also called dividing a line in *extreme and mean* ratios (which are equal). Algebraically, we see that a(a - x) = x^{2}, or x^{2} + ax - a^{2} = 0. The root of this quadratic equation is x = [-a ±√(a^{2} + 4a^{2})]/2, or (x/a) = [√5 - 1]/2 = 0.618034, discarding the negative root. This is called the *golden section*, supposed to have esthetic beauty. x is also the side of a regular decagon inscribed in a circle of radius a. There is an easy geometric construction for the golden ratio, shown in the figure at the right. The reader will find it easy to prove that x has the length stated above. The golden ratio is used in the construction of a regular pentagon, and plays a role also in the geometry of the dodecahedron and icosahedron. If the vertices of a regular pentagon are connected to form a pentagonal star, then these lines are divided in the golden ratio.

Finite sequences, called *progressions*, are also considered part of algebra. Consider the *arithmetic progression* a, a + d, a + 2d, ..., a + (n - 1)d, where each term is larger than the last by the constant difference d. The sum of these terms is (n/2)(first term + last term) or na + n(n - 1)d/2. The second term in this expression is just the sum of the series d[1 + 2 + ... + (n - 1)]. In the progression a, ar, ar^{2}, ..., ar^{n-1}, the values increase (or decrease) by the same factor r. The sum of this *geometric progression* is a(1 - r^{n})/(1 - r). This can be seen by expanding (1 - r)^{n} by the binomial theorem. We will give below a method of summing the series directly. If |r| < 1, then r^{n} becomes smaller and smaller as n increases. As n approaches infinity, the sum approaches Σ(0,∞)ar^{k} = a/(1 - r), so the infinite geometric series *converges*.

Financial calculations may involve progressions, and are usually included in intermediate algebra. The basic idea is that of the time value of money. An amount P received now, a present value, will be paid back one time interval later, conventionally a year, as a future value F = P(1 + i), where i is the *interest rate* per period (normally expressed as a percentage). After n periods, the future value F = (1 + ni)P. This is called *simple interest*. If the interest is *compounded* for n periods, then the total future amount is F = P(1 + i)^{n}. If the interest is compounded m times in one period, the interest rate for each subperiod is i/m, where i is called the *nominal interest rate* per period, usually one year. Then for one full period F = P(1 + i/m)^{m}, and for n periods F = P(1 + i/m)^{mn}. The *effective interest rate* is 1 - (1 + i/m)^{m}. For example, 6% compounded semiannually is an effective rate of 6.09%. Using the nominal rate allows easy comparison between rates.

The same formulas can be used for fractional periods. For example, if $500 is borrowed at 3% compounded semiannually, the amount after 3 months will be $500(1 + 0.03/2)^{1/2} = $503.74. This differs very little from $500(1 + 0.03/4) = $503.75, which is 3% compounded quarterly.

Calculus shows that if an amount of money F grows continually at a rate proportional to F, then dF/dt = rF, where r is the nominal interest rate. Integrating, F = Pe^{rt}. In one time period, P increases by a factor e^{r}, which is the effective interest rate plus 1. For example, if r = 0.05 per annum, e^{r} - 1 = 0.0488. This *continuous compounding* is not usual in finance. Monthly compounding has about the same effect as continuous compounding, and avoids exponentials.

Suppose an amount P is borrowed at t = 0, and we wish to pay it back in n equal installments A, which is an *annuity*. This is one example of a *sinking fund* for the repayment of a debt; our problem is finding the payment A. At the end of the first interval, the amount remaining is P(1 + i) - A, since a payment of A is made at that time and we have had to pay interest on the principal. At the end of the second interval, we owe [P(1 + i) - A](1 + i) - A, and so on, until the n^{th} interval, when the debt is extinguished. Multiplying this all out, we have at this time P(1 + i)^{n} - A[(1 + i)^{n - 1} + (1 + i)^{n - 2} + ... + (1 + i) + 1] = 0. The trick of summing this series is to multiply it by (1 + i) and subtract the original series from the result. All the terms in the sum cancel except the first and the last, so P[(1 + i)^{n + 1} - (1 + i)^{n}] = A[(1 + i)^{n} - 1]. The same trick is used in summing the geometric progression. Therefore, A = iP(1 + i)^{n}/[(1 + i)^{n} - 1]. The factor multiplying P is called the *capital recovery factor*. Since F = P(1 + i)^{n}, we also have A = iF/[(1 + i)^{n} - 1] in terms of the future value F of P. Regular deposits of the amount A will result in an accumulation of F in n periods. We have assumed that the first payment occurs at the end of the first period, and the last payment at the end of the n^{th} period. It is easy to make adjustments for other arrangements when necessary by working from first principles. The HP-48 makes calculations easy, so that tables are no longer required.

As an example, suppose we borrow $10,000 at 5%, and wish to pay it off in 10 equal installments. If the interest rate were 0%, then we see that A = P/10, as expected (expand (1 + i)^{n} by the binomial theorem and keep only 1 + ni). At 5%, the capital recovery factor is 0.1295, so the payments will be $1295 per year, a total of $12,950.

The average, or arithmetic mean, of n quantities a_{k} is the sum of the quantities divided by n. The geometric mean of the same n quantities is the n^{th} root of their product. The harmonic mean is the reciprocal of the average value of the reciprocals of the quantities. If the quantities are all equal, then these three means are the same, and equal to the quantity. If they are not the same, the means are, in general, different. The arithmetic mean is greater than or equal to the geometric mean, which is greater than or equal to the harmonic mean (the equalities holding only if the quantities are equal).

As an illustration, consider the problem of finding the average speed during a journey, which is the total distance divided by the total time. If we divide the journey into n equal time intervals, with an average speed of v_{i} in each interval, then it is easy to show that the average speed for the entire journey is v_{av} = (1/n)Σ v_{i}, so that the average speed is the arithmetic mean. On the other hand, if the journey is divided into n equal space intervals (which is often more convenient), then 1/v_{av} = (1/n)Σ(1/v_{i}). That is, the average speed is the harmonic mean. The reader may derive formulas for the average speed when the intervals of time or space are not equal. These will be *weighted* means in which the intervals have different weights in the summation.

In statistics, a mean is called a *measure of location*. For its proper appreciation, a mean should be accompanied by a *measure of dispersion* as well, such as the average square of the deviations from the mean. For detailed information on such statistical averages, refer to statistics texts.

Factorial n, written n!, is the product of the integers from 1 to n: n! = (1)(2)(3)...(n-1)n. 4! = 4 x 3 x 2 x 1 = 24. When it appears in formulas, 0! = 1 is the normal meaning. In older American and British work, factorial n was denoted by |__n__, but the continental notation n! is now universal. The factorial function can be extended to an arbitrary complex variable as the Gamma function Γ(x), where Γ(n) = (n - 1)!, from the property xΓ(x) = Γ(x + 1). The gamma function, since it is usually expressed as an integral, is beyond the scope of algebra.

n! increases very rapidly with n, roughly as n^{n}. For large n, n! is asymptotically approximated by Stirling's formula, n! = (n/e)^{n}√(2πn)(1 + 1/12n + 1/288n^{2} + ...). This is normally used to express the logarithm of n!, so that ln n! = (n + 1/2)ln n - n + ln√2π. When used for really large numbers (as in statistical mechanics, where even the logarithms are really large numbers), everything can be neglected except ln n! = n ln n - n.

If we have a bag of n different objects, we can pull them out and set them in a row in many ways. The first object can be any of n, the second any of the n - 1 remaining, and so on, so the number of distinct arrangments in a row is n(n - 1)(n - 2)...2 1 = n!. This is the number of *permutations* of n distinct objects, P_{n}. If some of the objects are identical, and cannot be distinguished, there are fewer ways. If there are n_{k} objects of type k, and Σn_{k} = n, then the number of distinct arrangements is n!/Π(1,k)n_{j}!. Π(1,k) means the product for j = 1 to k. If we make arrangements of only p objects out of n distinct objects, then the number of arrangments is n(n - 1)(n - 2)...(n - p + 1) = n!/(n - p)!. If the objects are not distinct, the problem is much more difficult and will not be worked out here. Finally, if we want to know how many sets of p objects can be selected from a bag of n distinct objects, in any order, we must divide by p!, since in all possible selections the p objects selected will be in any order. The result is C(n,p) = n!/p!(n - p)!, usually called the combinations of n things taken p at a time. These countings have many uses in probability theory, and often appear there. Probability theory is another of the contributions of Pierre de Fermat, in addition to analytic geometry, the theory of maxima and minima, and much number theory. Fermat's celebrated Last Theorem is that x^{n} + y^{n} = z^{n} has no integer solutions for x,y,z when n > 2. There was not room on the margin of the page where he recorded it for the proof, which may still be lacking.

Newton's Binomial Theorem, a remarkable result with many useful applications, dates from 1665 or 1666. It is easily seen to be valid for integral powers, when it is a finite series, but is also valid for arbitrary powers, when it does not terminate. The formula for integral powers is (a + b)^{n} = Σ(0,n)C(n,k)a^{n-k}b^{k}. The coefficients C(n,k) = n!/k!(n-k)! are the binomial coefficients. Setting a = b = 1, we find that Σ(0,n)C(n,k) = 2^{n}. The coefficients may be written in the symmetrical Pascal's triangle where each coefficient is the sum of the two on either side in the row above.

For fractional exponents, the terms in the resulting infinite series are written out formally, from 1, n, n(n - 1)/2!, n(n - 1)(n - 2)/3!, and so on. For example, (1 + x)^{1/2} = 1 + (1/2)x - (1/8)x^{2} + (1/16)x^{3} - ... , or (1 + x)^{-1/2} = 1 - (1/2)x + (3/8)x^{2} - (5/16)x^{3} + ... . These two series are particularly useful in applications, converging for |x| < 1.

The binomial theorem with integral exponents was usually the final topic in a school algebra course. The binomial theorem can be proved by Mathematical Induction for integral values of n.

If we can show that if an assertion is assumed true for the n^{th} member of a sequence, then it is true for the (n + 1)^{st} member, and in addition it is true for the first member of the sequence, then it is true for all n. For example, is the sum of the first n integers, 1 + 2 + 3 + ... + n = n(n + 1)/2? If this is true, then the sum of the first n + 1 integers is n(n + 1)/2 + (n + 1) = (n + 1)(1 + n/2) = (n + 1)(n + 2)/2, the same formula with (n + 1) substituted for n. Also, 1 = (1)(2)/2 = 1, and this also is given by the formula for n = 1. Therefore, the assertion is proved, and the sum of the first n integers is n(n + 1)/2. This method of proof is called *mathematical induction*, not a particulary good name. It is also called *complete induction*, since all cases are considered, in contrast to ordinary inductive reasoning.

The reader may find it interesting to prove that Σ(1,n) k^{3} = [n(n + 1)/2]^{2} (sum of the cubes of numbers from 1 to n) by the method of mathematical induction.

A brief familiarization with complex numbers should certainly go somewhere in early mathematical training. If it is not presented in algebra, it must be mentioned in trigonometry, and is essential for calculus. Complex numbers can be treated by introducing the imaginary unit i = √(-1) as another element of algebra, with a complex number written as z = x + iy. The *conjugate* z* = x - iy, and zz* = z*z = x^{2} + y^{2} = |z|^{2}. Complex numbers are number pairs that obey the algebraic rules for real numbers, if the operations are properly defined. Simple algebra gives us (x + iy)(u + iv) = (xu - yv) + i(xv + yu), the rule for multiplication, and also the rule for the division of complex numbers. Of course, these numbers arose from investigations of the solution of the quadratic equation, to explain absurd things like √(-5), which turns out to be just i√(5).

Complex numbers can be expressed as vectors in the complex plane, and the polar representation can be introduced through x = |z| cos θ and y = |z| sin θ, or x + iy = |z|(cos θ + i sin θ). Polar to rectangular, and rectangular to polar, conversion is a useful tool. In the HP 48, it is simply a matter of choosing one or the other form of display. By considering the multiplication of two complex numbers, we can see that wz = |w||z|[cos (θ + φ) + i sin (θ + φ)], where z = |z|/_θ and w = |w|/_φ, which certainly suggests Euler's formula e^{iθ} = cos θ + i sin θ, and the writing of a complex number as z = |z|e^{iθ}. All this involves a little trigonometry, of course. Trigonometry itself is a fertile field for algebraic manipulation.

Sets of four numbers may also be made to behave like real numbers, and so are hypercomplex numbers. For more information, see Hamilton's Quaternions.

Analytic geometry is the solution of geometric problems by the methods of algebra; it is really just algebraic geometry. The fundamental idea is the representation of a point in the plane or space by coordinates, usually rectangular coordinates x,y or x,y,z. This idea has already been used in several connections above. It is easy to extend analytic geometry to any number of dimensions, though visualization is still difficult. The quadratic plane curves known as Ellipses, Parabolas and Hyperbolas are discussed in their own articles. It is much easier to study these curves with algebra than geometrically as conic sections. In three dimensions, quadratic forms can be classified as ellipsoidal, paraboloidal and hyperboloidal, as we discussed above.

In the plane, we may define a line by its endpoints P(x_{1},y_{1}) and Q(x_{2},y_{2}). The length of the line is d = √[(x_{2} - x_{1})^{2} + (y_{2} - y_{1})], and its slope m = (y_{2} - y_{1})/(x_{2} - x_{1}) = tan θ, where θ is the angle the line makes with the x-axis. The angle between two lines is φ = tan^{-1}[(m_{1} - m_{2})/(1 + m_{1}m_{2})], from the formula for the tangent of the difference of two angles. If two lines are perpendicular, m_{1} = -1/m_{2}, from the vanishing of the denominator in this case.

There are several ways to represent a line by an equation. If the slope, and one point P(x_{1},y_{1}) are known, then (y - y_{1})/(x - x_{1}) = m represents the line. This reduces to y = mx + b, where b is the y-intercept of the line (the value of y when x = 0). Another standard equation is Ax + By + C = 0. The preceding line has A = m, B = -1 and C = b. Any multiple of A,B,C describes the same line. Two lines Ax + By + C = 0 and A'x + B'y + C' = 0 are parallel or coincident if the determinant |A B, A' B'| = 0, intersecting if the determinant is nonzero. If two points are known on the line, then (y - y_{1})/(y_{2} - y_{1}) = (x - x_{1})/(x_{2} - x_{1}). If the x- and y-intercepts a and b are known, then (y - b)/(x - 0) = (0 - b)/(a - 0), or x/a + y/b = 1.

If **e** is a unit vector from the origin perpendicular to the line, and p is the perpendicular distance from the origin to the line, then the equation of the line is **r**·**e** = x cos θ + y sin θ = p, where cos θ and sin θ are the components of the unit vector. This is very easy to see if a diagram is drawn. In terms of the standard equation, p = -C/D and cos θ = A/D, where D = ±√(A^{2} + B^{2}). The sign is to be taken opposite to the sign of C if C ≠ 0, and if C = 0, the same as that of B. The perpendicular distance from the line to the point (x_{1},y_{1}) is d = x_{1} cos θ + y_{1} sin θ - p, which again is easy to show.

In three dimensions, the equation of a line joining two given points is (y - y_{1})/(y_{2} - y_{1}) = (x - x_{1})/(x_{2} - x_{1}) = (z - z_{1})/(z_{2} - z_{1}). However, the equation x/a + y/b + z/c = 1 now describes a plane, not a line, with intercepts a,b,c. A vector equation is **r** X **e** = **n**p, where **e** is a unit vector in the direction of the line, and **n** is a unit normal to the plane containing the line and the origin.

The ratio of division r of a line PQ divided internally at a point S is PS/SQ. When r = 1, S is the midpoint of PQ. r varies from 0 at P to ∞ at Q. If S is on PQ extended, it is said to divide PQ externally. r then varies from -∞ at Q to -1 at infinity. Ratios like (x - x_{1})/(x_{2} - x_{1}) are equal to r/(1 + r).

Polar plots are often useful and entertaining. They can be made on prepared forms, like log plots, or can almost as easily be made on ordinary paper with a protractor and scale. The functions plotted are of the form r = f(θ). In polar plots, the radius r may be negative, and the angle may be multiple-valued, multiples of 2π giving the same information. The function r = 2a cos θ is a circle passing through the origin with centre on the θ = 0 axis at radius a. The angle between the tangent to a curve and the radius vector is given by tan ψ = r(dθ/dr), and the element of area swept out with an increment Δθ by the radius vector is (r^{2}/2)dθ, which the reader can easily verify with simple diagrams. For the first, let a point P' approach the point P on the curve; the secant joining P' with P is the tangent in the limit, and there is a little right triangle there with sides rdθ and dr and an angle ψ in the limit.

As an interesting example of a polar plot, we shall consider the *trisectrix of Maclaurin*. This plot solves the famous problem of trisecting an angle by using a curve that can be constructed with compass and straightedge. Since the intersection necessary for trisection cannot be constructed, and must be estimated by eye, purists do not consider this a trisection with compass and straightedge alone, but it is close. Drawing the curve accurately is a challenge to my graphics tools, so I encourage the reader to make an accurate drawing and prove that it works. Draw a circle of radius 2a as shown in the figure, and the line x = a as well. To plot a point P on the curve, draw any line OQ from the origin to the circle, cutting the line x = a at point S. Now, using dividers, make OP equal to SQ. P' at (3a,0) and P" at (0,0) are points on the curve. It is only necessary to draw the upper loop from P' to P", but the curve can be continued beyond P" to approach the asymptote x = -a. To use the curve, place the angle to be trisected φ at the centre of the circle. It will intersect the curve at some point P, and the angle θ at O will be φ/3. You have to have the curve already drawn before you trisect an angle, but an angle can be tripled straightforwardly. A parametric equation of the curve is x = a(4 cos^{2}θ - 1), y = x cot θ, or x = a(3 - t^{2})/(1 + t^{2}), y = tx, with t = tan θ, or x^{3} - 3ax^{2} + xy^{2} + ay^{2} = 0. The last equation can be solved easily for y, so the curve can be plotted by the HP-48. It should not be difficult to prove algebraically that this curve actually does trisect angles, since θ = tan^{-1}(y/x) and φ = tan^{-1}[y/(x - 2a)].

The trisection of any angle, the area of a circle of any radius (determination of π), and the doubling of the cube (finding 2^{1/3}) were classic problems in ancient mathematics, and all were solved at that time by geometry, if not by "compass and straightedge." Of course, their practical solution was always easy, as it is now. Archimedes found the useful everyday approximation to π, 22/7. Archimedes did the hard part of squaring the circle by proving that its area is equal to that of a right-angled triangle in which one leg was equal to the radius, the other to the circumference, of the circle. In the proportion 1/x = x/y = y/2, x = 2^{1/3}, so doubling the cube was a matter of finding two mean proportionals. This is an extension of the fact that a mean proportional between 1 and x is √x, so there is an easy geometrical way to find square roots by drawing a semicircle with diameter 1 + x, and erecting a perpendicular at the point of meeting. Angles were trisected first by the *quadratrix* of Hippias of Elis (420 BCE), which was criticized as assuming the result, but later by the conchoid of Nicomedes, which avoided this criticism. Maclaurin's solution was not known in ancient mathematics.

It is often convenient to tabulate values of a function if the function cannot be evaluated rapidly when required, as in astronomical tables. Before pocket calculators, tables of trigonometric functions and logarithms were very commonly used. It is very inefficient to tabulate a function so finely that all values can be taken directly from the table. Instead, for values of the variable between those tabulated, the functional value is *interpolated*, by assuming how the function varies between tabulated values. *Inverse interpolation* is finding the value of t when f(t) is given. It should be clear that interpolation cannot yield more than the basic accuracy of the table.

Let us first assume the function f(t) is tabulated for t at equally-spaced values differing by h. Such a tabulation is shown in the figure, where the function is actually sin(t), and t is in degrees. In this table, h = 1°. The columns labelled δ, δ^{2} and δ^{3} are the first, second and third *differences*. This scheme can be extended indefinitely. The superscript here is not an exponent! The differences are identified by subscripts, as shown. Note that the differences decrease rapidly as the order increases, which is characteristic of a *smooth* function. Differencing is one way to detect errors in tabulated values of a continuous function.

In such a table, intermediate values may be estimated from Bessel's interpolation formula, which is f(p) = f(0) + pδ_{1/2} + [p(p-1)/4](δ_{0}^{2} + δ_{1}^{2}) + ... , where p = (t - t_{0})h. The first two terms give the familiar *linear interpolation*, where the function is assumed to vary linearly between the tabulated values. Most trigonometric tables were tabulated finely enough that their full accuracy could be realized with linear interpolation. The first differences varied so slowly that *tables of proportional parts*, which are nothing more than p times the difference between tabulated values, eased the computation.

Let us evaluate f(29.5) from the table, using Bessel's formula. Linear interpolation gives 0.4848096 + (0.5)(0.0151904) = 0.4924048. The precise result is 0.4924236. Linear interpolation gives an approximately correct result, but only to four digits, not to the full 7 digits of the table. The next term, including second order differences, adds (-0.6250000)(0.0002999) = 0.0000187, so that second-order interpolation gives 0.4924235, differing only by 1 in the 7^{th} digit, close to the full accuracy of the table. A table of trigonometric functions tabulated at 1° intervals (which is not a very big table), together with second-order interpolation, gives results as precise as a much larger table with linear interpolation. The reader can determine if a 7-place table to 0.1° would give 7-place accuracy with linear interpolation. I feel that tabulation to 0.01° would be necessary for this, a table 100 times bigger. Interpolation can be used to calculate intermediate values for refining a table.

For inverse interpolation, p can be estimated by linear interpolation to be p = [f(t) - f(0)]/δ_{1/2}. Then, this value can be refined by iterating p' = p - [p(p - 1)/4](δ_{0}^{2} + δ_{1}^{2}]/δ_{1/2} until it converges.

If the tabular values are not equally spaced, Lagrange's interpolation formula may be used. In this formula, n^{th} order interpolation is given by f(x) = Σ(i: 0,n)L_{i}(x)f(x_{i}), where the coefficients L_{i} are the products Π(j: 0,n, j ≠ i)(x - x_{i})/(x_{j} - x_{i}). This may look complicated, but it is easy to write out the coefficients after the pattern is perceived. For n = 3, f(x) = [(x - x_{2})(x - x_{3})/(x_{1} - x_{2})(x_{1} - x_{3})]f(x_{1}) + [(x - x_{1})(x - x_{3})/(x_{2} - x_{1})(x_{2} - x_{3})]f(x_{2}) + [(x - x_{1})(x - x_{2})/(x_{3} - x_{1})(x_{3} - x_{2})]f(x_{3}. The sum of the coefficients of the function values is unity, of course (consider a constant function!). Of course, this formula works for equally-spaced values as well.

Lagrange's interpolation formula for n = 3, applied to finding sin (29.5), gives f(29.5) = 0.3750 f(29) + 0.7500 f(30) - 0.1250 f(31) = 0.49242, where 5-place values were used for the tabulated values. In Lagrange's method, a polynomial of n^{th} order is fitted to n consecutive points. A formula using four consecutive points would be equivalent to Besselian interpolation using second differences.

Higher-order interpolation formulas, such as Langrange's, are very good with accurate function values. If there is random error in the tabulated values, however, such methods may fail badly as the interpolation function tries to deal with the errors, and usually fails. There are *spline* interpolation formulas that mimic the action of an elastic spline and do not try to go exactly through points that may contain errors. These are much better with approximate tables. In general, only linear interpolation can be trusted in these cases unless elaborate means are used to handle the errors.

If there is experimental or other random error in the function values, they may be represented by a simple curve that minimizes the total error. If the simple curve g(x) that approximates f(x) is chosen so that Σ[f(x) - g(x)]^{2} is minimized, then g(x) is a *least-squares* approximation to f(x), and can be used to predict its values just like interpolation in a table. For a linear g(x), g(x) = mx + b, the result is called *linear regression*, which is studied in mathematical statistics, and is beyond the scope of the present article, though well worth a look.

*Heuristic*, or *ars inveniendi*, is the art of finding solutions to problems. The name comes for the Greek for "find," related to the shout "Eureka!" which should really be "Heurika!" There is no general procedure or method for this purpose, only a recognition of the mental processes that are found to be used with advantage. Polya's book is a good introduction, but only the solving of many problems can give real insight. Heuristic is as important in algebra as in geometry.

Four procedures that can often be applied with profit are: generalization, specialization, variation and analogy. Indeed, the very principle of using symbols to represent general quantities in algebra may make a problem easier to solve. Algebraic results may be tested by restricting them to cases where the outcome is simple and known. Variation of a problem may be a restatement of the problem in different terms, or working it out by different means. Finally, analogy is applying a related problem.

Two general methods of proof are *synthesis* and *analysis*, or "putting together" and "taking apart." Analysis now means something rather different, in phrases like "chemical analysis." In our context, analysis was starting with the answer and working backwards to the conditions of the problem. This approach is very often successful when it is unclear how to proceed from the statement of the problem. When this has been done, and the problem understood and solved, one can then work forward logically from the beginning by *synthesis*. Euclid's *Elements* are a perfect example of logical synthesis, and were intended to be a model of it, not a manual on how to solve problems. In a synthesis, *one* path from the statement of the problem to its proof is sufficient, while in solving a problem we consider alternatives at every step.

Methods of proof include not only direct proof, but indirect proof as well, in which all possible alternatives are proved false. Then, what remains must be
true. *Reductio ad absurdum* is a popular method of indirect proof. One starts with an assumption contrary to what is to be proved, and then shows that this assumption leads to an absurdity.

The answer to one problem may provide answers to related problems with very little additional work. Such results are called *corollaries*. The usual American pronunciation of this word is curious, since the word comes from *corolla*, not from *Larry*. A diagram may be a great help in algebraic work to clarify the conditions of a problem and to suggest solutions, just as in geometry. Unlike in geometry, a diagram is not required, however. The main error committed in mathematical reasoning is the failure to examine major opinions and to change them if necessary.

A glance at an elementary algebra book will show that it is largely devoted to the solution of "word problems" in which relations between unknown quantities are formulated and solved. This has always been an important aim of elementary algebra, even though such problems are seldom met with in everyday life, and are really in the nature of puzzles. However, one should not disdain these exercises, since they are a vivid way to teach algebraic manipulation, much more interesting than exercises in the bare operations of algebra. They teach algebraic *reasoning*, not just applying formulas and rules.

Here are just a few algebra word problems for the entertainment of the reader.

- A man in a balloon observes the angle of depression of an object on the ground, bearing south, to be 35° 30'; the balloon drifts 2.5 miles east at the same height, when the angle of depression of the same object is 23° 14'. Find the height of the balloon. (Ans. 7096 ft) [Wentworth and Smith]
- A 100-ft flagpole stands 10 ft from a 10-ft pole. The flagpole breaks and falls over without parting, touching both the top of the 10-ft pole and the ground. At what height did the flagpole break? (Ans. 49.21 ft) [Meserve]
- From a rectangular piece of metal 28" x 16" it is desired to cut two equal circular pieces of the largest possible diameter. Assuming no loss of material in cutting, what will be the diameter of these pieces? (Ans. 14.067") [Rosenbach and Whitman]
- I have a cistern with three unequal cocks containing 60 pipes of water (a pipe is roughly 3 barrels, about 0.5 m
^{3}). If the greatest cock be opened, the cistern will empty in one hour; at the second it will empty in two hours, and at the third in three hours. In what time will it empty if all the cocks be opened? (6/11 hour) [Rosenbach and Whitman, from 1659/1680] - In pursuing a tortoise, Achilles in the first minute reduces the distance between them to half what it was; in the next half-minute reduces the remaining distance to half of itself; in the next quarter-minute reduces the still remaining distance by half, and so on. How long does it take him to overtake the tortoise? (Ans. 2 minutes) [Rosenbach and Whitman]
- If, in the preceding problem, Achilles halves the distance in the first minute, then halves the remaining distance in the second minute, and so on, how long will it take him to overtake the tortoise? (Ans. he never will overtake the tortoise.) [Rosenbach and Whitman]
- To meet the cost of replacing in eight years a machine costing $1200, equal payments are made at the ends of the years. If these payments earn 3% per annum compounded annually, find the amount of the payments. (Ans. $134.95) [Rosenbach and Whitman]
- I have 32 U.S. coins in my pocket--pennies, nickels, dimes and quarters--with a total value of $2.61. They weigh a total of 111.9 g, and placed side by side cover a distance of 62.9 cm. How many of each denomination do I have? [the weights and diameters are: 2.6 g, 19 mm; 5.0 g, 21 mm; 2.2 g, 17 mm; and 5.7 g, 24 mm] (Ans. 11 pennies, 7 nickels, 9 dimes and 5 quarters).

Any algebra textbook for high school or college use will make a good reference, with many exercises. There are many, mostly uninspired. If you are learning algebra, there is nothing like exercises for gaining confidence in your powers and understanding. Also, consider internet resources for learning algebra. BBC Learning at the BBC has, appallingly, almost nothing about algebra (or any other mathematics). This is quite typical of the United States and Britain, where mathematics has nearly vanished amongst the populace. I have not made a search for elementary algebra resources, which the reader may find an interesting task.

J. B. Rosenbach and E. A. Whitman, *College Algebra*, 3rd ed. (New York: Ginn & Co., 1949). A classic college algebra with, of course, many exercises. Shows the scope of "intermediate algebra." Neither calculus nor iterative methods are included, and linear algebra is restricted to the solution of simultaneous linear equations.

Bronshtein and Semendyayev, *A Guide Book To Mathematics* (Zürich: Harri Deutsch, 1973). This excellent and useful little yellow book was once available. I referred to it constantly while preparing this article, and used some of its examples. If it is still in print, I recommend obtaining a copy.

______________, *HP 48G Series User's Guide* (Corvallis, OR: Hewlett-Packard Company, 1994). Part Number 00048-90126. A pocket calculator of this quality is very useful for algebra, as is mentioned several places in the text. Of course, it can plot things like polynomials and do many other useful things. This is the best pocket calculator on the market, and is well worth its price.

A. C. Aitken, *Determinants and Matrices* 9th ed. (Edinburgh: Oliver and Boyd, 1958). There is an especially lucid treatment of permutations.

B. E. Meserve, *Fundamental Concepts of Algebra* (Cambridge, MA: Addison-Wesley, 1953).

B. Greenleaf, *The National Arithmetic* (Boston: R. S. Davis & Co., 1862) and W. J. Milne *Standard Arithmetic* (New York: American Book Co., 1895). Examples of American 19th-century "higher arithmetic" texts that included much that was later part of intermediate algebra, if it dealt with computation rather than theory. An exercise from Milne: "How many cents are there in 3 two-cent pieces?"

Sir T. Heath, *A History of Greek Mathematics*, Vol. I (New York: Dover, 1981). Chapter VII treats the three classcal problems of geometry.

G. Polya, *How To Solve It*, 2nd ed. (Garden City, NY: Doubleday-Anchor, 1957).

Return to Mathematics Index

Composed by J. B. Calvert

Created 19 January 2004

Last revised 4 February 2004