## Normal Correlation

A good application of the plotting facility of the HP-48

### Introduction

The theory of correlation between two variates, linear regression and curve fitting make an important part of mathematical statistics. The article on Linear Regression shows how correlation can be based on least-squares curve fitting, and extended to obtain the correlation coefficient r. This theory does not depend on the distribution of the variates, only on the data sample itself. To assess the significance of the correlation coefficient, some assumption of the distribution must be made, and it is usually assumed that the variates are described by the bivariate normal distribution, which applies approximately in a great number of cases by the nature of the normal distribution.

Variates that are described by the bivariate normal distribution are said to be normally correlated. The "normal" is not the opposite of "abnormal," but refers to the distribution. In this article, I will use the bivariate normal distribution to study correlation. This, in fact, is the way the theory began, and many of its terms and definitions originated this way. Normally correlated variates have certain properties in addition to those arising from correlation. For example, variates that are independently distributed have zero correlation, but zero correlation does not, in general, mean that the variates are independent. However, if two normally correlated variates are uncorrelated (a curious statement, but you know what I mean) then it follows that they are independently distributed.

### The Bivariate Normal Distribution

The bivariate normal distribution is an obvious extension of the familiar univariate normal distribution. To review, the variate x is said to be normally distributed with mean M and variance V if the probability that the variate lies in the interval x,x+dx is f(x) = (1/2πV1/2)exp[(x - M)2/2V]. This function is normalized so that &integ;(-∞,+∞)f(x)dx = 1. Definite integrals of powers of x times this function can be found in Dwight, 860.11-860.17. The standard deviation is S = √V. The variance is the integral of (x - M)2 from -∞ to +∞, and measures the "spread" of the distribution, while M measures its "location."

Instead of continuous variates, we can also speak in terms of a large sample of measurements of size N. Then, Nf(x) is the number of measurements that fall in x,x+dx, or the frequency. A frequency distribution is fully analogous to a probability distribution, and we should be ready to work with one or the other.

Our algebra will be considerably simplified by making the means of all variates equal to zero. This can always be done by adding a constant, and will not restrict us in any way. The definite integral &integ;xpf(x)dx is called the pth moment about the origin, mp. The variance, for example, is m2. All odd moments about the mean are zero by symmetry, since f(-x) = f(x).

A bivariate distribution f(x,y) gives the probability that x lies in x,x+dx while y simultaneously lies in y,y+dy. The moments mpq are &integ;xpyqf(x,y)dxdy. Moments with p + q odd are zero. We will use mainly m02, m20 and m11. The first two are the variances Vx and Vy, while the third is the covariance P. These quantities are easily found from a sample of size N of variates with nonzero means X, Y by the familiar formulas Vx = (1/N)Σx2 - X2, analogously for y, and P = (1/N)Σxy - XY. These are estimates of population parameters. Vx and Vy are slightly better estimators if N is replaced by N - 1, but we will use N consistently.

The bivariate normal distribution is f(x,y) = [1/2πSxSy(1 - r2)2] exp {-[1/2(1 - r2)][(x/Sx)2 - 2rxy/SxSy + (y/Sy)2]. The variates are x,y with standard deviations Sx and Sy, and each variate enters divided by its standard deviation. r is the correlation coefficient, -1 < r < 1. It is good to form a visual appreciation of this function, but it is difficult to draw as a .GIF file for use in this article. Sketches may be an aid, or the use of a plotting calculator such as the HP-48. Yule and Kendall give a good picture. The curve is peaked at the origin, and falls off symmetrically along two orthogonal directions. We'll look at some special cases in a minute.

The moments of normal distributions can be found easily by differentiation under the integral sign in the normalization integral. Each differentiation brings down one power of the variate (since the exponent is quadratic) and leaves the distribution, the exponential, unchanged. To find m11 = <xy>for the bivariate normal distribution, differentiate once with respect to x and then once with respect to y. The polynomial in front of the exponential is [(1 + r2)xy/Sx2Sy2 -rx2/Sx3Sy - ry2/SxSy3]/(1 - r2) + r/SxSy. Taking averages (just replace any term by its average; it's no work, just letting the integral act) and noting that the result must equal zero gives [<xy>(1 + r2)/SxSy - 2r]/(1 - r2) = -r. From this, we find that r = <xy>/SxSy. This gives the familiar formula for calculating the coefficient of correlation of data points of r = Σxy / √Sigma;x2√Σy2. Remember that x and y are deviations from the mean in this formula.

If r = 0, f(x,y) = g(x)h(y), where g(x) and h(y) are normal distributions. The variates x and y are then distributed independently, each with its own variance. The exponent is proportional to (x/Sx)2 + (y/Sy)2, so the probability is constant along contour lines (x/Sx)2 + (y/Sy)2 = C2, which are ellipses with major axis 2CSx along the x-axis, and minor axis 2CSy along the y-axis. Each cross section parallel to an axis is a normal probability curve with variance Sx2 or Sy2. The covariance P is zero, of course, by symmetry.

If |r| → 1, the factor 1/(1 - r2) becomes 1/2δ, where δ = |1 - r|. The other factor in the exponent approaches [(x/Sx)2 ± 2xy/SxSy + (y/Sy)2]. Let's suppose that the relation between y and x in the limit is y = cx. The line perpendicular to this line is y = -x/c. Take new variables, x' along the line y = cx, and y' along y = -x/c. This is a rotation of variables by an angle θ = tan-1c. Along the new y' axis, x = -y' sinθ and y = y' cosθ. Putting these variables in the exponent, we find y'2(1/4δ)[sin2θ/Sx2 + 2 sinθ cosθ/(SxSy) + cos2θ/Sy2 = y'2/2S2, where the standard deviation S is proportional to √δ. Therefore, as |r| → 1, the distribution shrinks down to the line y = cx, so that in the limit, it becomes this line. This is the case of perfect correlation.

By making a coordinate rotation to eliminate the cross term in xy, we find contours of probability that are ellipses. The symmetry axes of the ellipses are at an angle θ = (1/2) tan-1[2r/(Sx - Sy)] with the coordinate axes. If the variances are equal, we see that the angle is 45°.

Now suppose that Sx = Sy = S. The distribution is then f(x,y) = [1/2πS2(1 - r2)2] exp {-[1/2S2(1 - r2)][x2 - 2rxy + y2]. The contours of equal probability are not circles, but ellipses with major and minor axes at 45° to x and y. Let x = (x' - y')/√2 and y = (x' + y")/√2, so we take new axes x',y' along the axes. When this is substituted in the exponent, we find x'2/2(1 + r)S2 + y'2/2(1 - r)S2. The new variates are independent, and have standard deviations S&radic:(1 + r) and S√(1 - r). If r = 0, the variates are independent and normal with standard deviation S. If they are perfectly correlated, then either the sum or difference is sharp, while the standard deviation of the other is 2S.

### Linear Regression

Now consider the distribution of y for a fixed value of x. In the exponent of the distribution, we complete the square on y to find [y - (rSy/Sx)x]2/[2Sy2(1 - r2)] + x2/2Sx2. The second term is a constant for fixed x, so the quantity (y - bx) is distributed normally about 0 with variance Sy2(1 - r2), which is smaller the greater |r| is. This also shows that |r| cannot exceed 1, and |r| = 1 corresponds to a perfect fit, y = bx, where b = rSy/Sx. The line y = bx is the line of regression of y on x.

The same can be done with the distribution of x with y fixed. The line of regression is now x = b'y with b' = rSx/Sy, and variance Sx2(1 - r2). Therefore, bb' = r2.

These results are exactly the same as we found by a least-squares fit of a straight line to the data, which show that the two theories are equivalent in this respect. The bivariate normal distribution made it easier to derive the results, and gave a model for the distribution of the variates. In many cases, normal correlation is to be expected. When we perform measurements, we expect the results to be normally distributed about the mean value, and with the same variance at different points, which is characteristic of normal correlation. We often do not think of the independent variable as a variate, but for the purposes of curve-fitting, it does not matter if it is a random sample or not.

The lines of regression are conjugate diameters of the elliptical probability contours. A diameter of an ellipse is a line bisecting parallel chords, and in this case the chords are vertical and horizontal. This makes it easy to draw the lines of regression if you have a plot of the contours. For circular contours, this reduces to the vertical and horizontal diameters of the circles, and for ellipses lying along the axes, to the major and minor axes, which are, of course, diameters. To draw a diameter of an ellipse, draw two parallel chords, join their mid-points, and extend this line as needed.

### Evaluating and Plotting the Bivariate Distribution

The HP-48 can help greatly in evaluating the bivariate normal distribution numerically, and in visualizing how it appears. The use of the HP-48 is explained in several other pages, so here I'll assume that the reader is moderately familiar with the calculator and has one before him. I'll presume that the reader is not familiar with the plotting feature, however, and will provide some details. I have considered the plotting feature as a relatively useless frill up to now, but here it can be a good visualization tool, and I am glad to have found an application for it.

The value of the bivariate normal distribution is a function of the variables X and Y, and of the parameters A (standard deviation of x), B (standard deviation of y) and the correlation coefficient R. Variables should be created for these quantities, and it is convenient to name them with the single capital letters shown. This is done by putting an initial value on the stack, then pressing ', α, and the letter, followed by pressing STO. Test to see that the values are returned when the menu keys are pressed.

Next, a function must be created that can be evaluated and used in plotting routines. This can be an equation of the form Z = f(X,Y), an expression f(X,Y), or a program in << >> delimiters. The most convenient for the present purpose seems to be an expression, which will appear between ' ' delimiters when called up. This is most easily created in the EquationWriter application, entered with ← EQUATION, and described in Chapter 7 of the User's Guide. It acts like most equation formatters, using an open rectangle as a cursor, and using the cursor movement keys. The → key usually leaves one item and goes on to the next. Parentheses are automatically closed when it is pressed a second time at the end of an item. Fractions are started with ↑, the numerator is entered, then ↓ takes you to the denominator. → ends entry of the denominator and another → completes the fraction. Powers are entered with the yx, the exponent is entered, and then → leaves the item. Enter any of the variables and parameters by pressing the corresponding menu keys. I did not use the normalizing factor in front of the exponential, so the maximum value of the expression will be 1. When you are finished, press ENTER to put the expression on the stack. Enter the name after ', and press STO. I used 'BN' for "binary normal."

Clear the stack and recall the expression by pressing the BN menu key. This will check that it has been properly stored. Store some reasonable values in X, Y, A, B and R. To evaluate the expression using these values, just press EVAL. Check that the number returned is correct. For later use, store A = B = 1 and R = 0.8 in the variables. The easy way to do this is to type in the new value on the command line, press the magenta ←, and then the menu key for the variable.

The plotting routines that are best to use here are Wireframe and Ps-Contour. Press → PLOT to get the PLOT form. With the cursor on TYPE:, CHOOS whichever of the two you want,and press OK. Let's begin with Wireframe. Moving the cursor to EQ:, CHOOS the function BN from the list presented, and press OK. The beginning of the expression will then appear in the field. Move the cursor to INDEP. If X is not there, press ', α and X, then press ENTER and X will appear. Now do the same for DEPND:, but see that it contains Y. The prompt, DEPND seems to ask for a dependent variable, but that is not the case with these plots, which take two independent variables.

To the right of the variables are STEPS: fields. These are not actually steps, but the number of points in the X direction (horizontally, across the display) and the Y direction (vertically). I recommend 11 and 9 here, which will give 99 points in the sampling grid. Each point corresponds to a value of X and Y, for which the expression will be evaluated. The values corresponding to the point are not determined by the numbers entered here.

Now press OPTS. This form contains the actual ranges of the variables in the "plotting box" whose floor is the X,Y plane and whose height is Z. The projection of this box is what is plotted on the screen. The picture plane is vertical, parallel to X,Z, and 1 unit toward the eye from the X,Z side of the box. The eye, from which the projection is made, is at XE, YE and ZE. YE must be beyond the picture plane, so YE is at the least Y-NEAR minus 1. It is best to put it farther away than that. For a start, try XE = 0, YE = -8, ZE = 6. Set the x range to -3 to +3, the y range from -1.5 to +1.5. The plotting screen has an aspect ratio of 2:1. There is no need for the x and y ranges to correspond. Try to picture the relation of the plotting box and its coordinates, the picture plane, and the eye position, so that you can interpret the view correctly. Press OK to go back to the PLOT form.

To leave this form and return to the stack, press CANCEL (this will not erase any parameters). To get back, press → PLOT. The two rightmost menu buttons are labelled ERASE and DRAW, and they do just that. It is a good habit to press ERASE before DRAWing a plot, since we don't want one plot on top of another here. Plotting does not use the stored variables X and Y, only the parameters R, A and B. If they are set as recommended above, after you press DRAW you will see the surface z = f(x,y) drawn as you watch. This shows the probability surface that is approaching a y = x line with correlation coefficient 0.8.

Note the + cursor. To make it move from point to point of the sampling grid, press TRACE. TRAC with a little square shows that you are in TRACE mode. Now press (X,Y) and the menu line will change to INPUT { -3.0000 1.5000 }, giving the coordinates of the point at minimum X and maximum Y, the far left-hand corner of the sampling grid. Every time you press a cursor movement key, the cursor will move 0.375 along Y or 0.600 along X, depending on the key pressed. If you press ENTER, the values of X and Y will be pushed on the stack. NXT brings you back to the menu, but does not cancel the mode. EDIT allows you to decorate the plot, which we won't do here. Return to the plot by pressing PICT. If you press MENU in EDIT, the menu disappears, leaving the unencumbered plot. CANCL takes you back to the PLOT form, and NXT CANCL returns you to the stack. Navigation can be a little confusing, but one gets used to it quickly. ← PICTURE will take you back to the plot so you can enjoy it.

Pressing ENTER in TRACE mode puts INPUT: { X-val Y-val } on the stack. PRG LIST OBJ→ puts the list on level 2 and INPUT: in level 1. Do a ← DROP and repeat the OBJ→ command again. Now X will be on level 3, Y on level 2, and a 3 on level 1. Drop the stack again, and you are ready to store the values in Y and X. Then press the menu key for BN, and EVAL to find the corresponding value. There should be an easier way to do this, but this is straightforward.

Now let's look at the case of uncorrelated variates. While looking at the plot, press CANCL. This leaves PICTURE and returns you to the stack. If you get a form, just press CANCL again. You can use the one on the menu or on the keyboard. Key in 0, ENTER, then α, R, and STO. Now press → PLOT to go to the plot form, ERASE, and then DRAW. A nice, symmetrical hump appears that is typical of uncorrelated variates. Now you can have hours of fun by changing the parameters and seeing what comes out.

The Ps-Contour plot gives a better picture of the shape of the figure. Here you are looking directly downwards on the X-Y plane, and at each sample point a short line is plotted that is tangent to the contour through that point. The plot gives no information as to value, just direction. Press CANCL to get back to the PLOT form. Change TYPE: to Ps-Contour (move cursor, CHOOS, OK). Then, ERASE and DRAW as usual. A Ps-Contour plot takes a long time to draw, much longer than a Wireframe, and the short lines will appear one by one in order, one at each sample point. The shape of the probability surface is easy to appreciate in this kind of plot.

### References

C. E. Weatherburn, A First Course in Mathematical Statistics (Cambridge: Cambridge University Press, 1968). Chapter IV.

R. A. Fisher, Statistical Methods for Research Workers, 14th ed. (Edinburgh: Oliver and Boyd, 1970). Chapter VI.

G. U. Yule and M. G. Kendall, An Introduction to the Theory of Statistics, 14th ed. (London: Charles Griffin & Co., 1950). Chapter 10.

H. B. Dwight, Tables of Integrals and Other Mathematical Data, 4th ed. (New York: Macmillan, 1961). Regrettably, out of print, but perhaps substitutes can be found.