Three different kinds are presented, and then we explain Analysis of Variance and how it can help us grow beans
A grafitto from Roman times is shown at the right. This was a kind of "Kilroy Was Here" message. Until recently, only in the Graeco-Roman world was the general public able to read and appreciate grafitti. It has been found from Hadrian's Wall in England to the desert wastes of Arabia. It says: "The sower Arepo holds carefully the wheels" when read in any direction. Pure nonsense, it was apparently designed to excite the superstitious. It is in Latin, and is a Square, but is not a Latin Square, at least not in the strict sense, because the rows and columns contain different letters, not simply the same letters rearranged. However, it may have provided the name Latin Square.
The table at the left contains the numbers 1, i, -1 and -i, where i is the imaginary unit (i×i = -1). It is a multiplication table, giving the result C of the product AB, where A is the first element in a row, and B the first element in a column. The first row and first column serve as labels as well as table members here, since it is unnecessary to repeat them for this purpose. This is also the "multiplication table" for any group whose four members correspond to 1, i, -1 and -i, though they may not be numbers. In fact, group members are usually transformations, and "multiplication" means performing two transformations in a row. A group is a set of elements that is closed under whatever "multiplication" is defined for them. This means that the result of any multiplication is again a member of the group. Furthermore, the group must contain the identity element (corresponding to 1) and the inverse Y to any element X, such that XY = YX = 1. Y is usually written X-1. Quite importantly, the multiplication must be associative. That is, (AB)C = A(BC) for any three members A, B, C of the group. The requirements for the identity and the inverse mean that no element is repeated in any row or column, so that each row or column contains each element once and once only. This is the strict definition of a Latin Square.
The order n of a group is the number of elements it contains, and so the number of rows and columns in the multiplication table. Since there are n! ways to arrange n elements, there are many possible rows and columns, many more than are necessary to make one multiplication table. However, the requirement that the multiplication be associative greatly restricts the number of possible multiplication tables. In fact, there are really only a few. One must even go to n = 6 before a table is found in which, for some elements, AB ≠ BA. That is, some elements do not commute, and the group is noncommutative. There are many concrete realizations of a group with the same multiplication table, all sharing the same group structure.
The multiplication table for the (only) noncommutative group of order 6 is shown at the right. The element I is the identity. One realization is the group of symmetry operations in space called C3v, which has a 3-fold axis and 3 reflection planes passing through the axis. Then, A and B are rotations of 120° and 240°, while C, D and E are reflections in the vertical planes. The multiplication table is a Latin Square with six rows and six columns. Only one other Latin Square with these dimensions is also a possible group multiplication table, for a group such as C6 with a single six-fold rotation axis of symmetry.
We now turn to a most important application of the Latin Square, to the design of statistical experiments. Suppose we are interested in growing beans, and want the greatest yield possible. To this end, we selectively breed different seeds, apply different fertilizers and disease controls, and grow them using different techniques. These are all called treatments. We want to know what treatments are effective in increasing yield, and by how much, and which are worthless, so we can weigh the economic alternatives. If we naively compare the yields on different plots, we may be misled into selecting worthless treatments, or into missing valuable treatments that have a small, but rewarding, effect. In everyday life, we see examples of this futile activity. Just because treatment A produced 17.3 bushels, while treatment B produced only 15.7 bushels, there is no certainty that A is 10.2% better than B. We need to know more.
Everyone who grows beans knows that the yield under apparently the same conditions varies more or less widely. If we assemble the yields from N plots under similar conditions, we can find an average yield X = (Σ x)/N, where x represents the yield from one of the N plots. The quantity V = Σ(x - X)2 is always positive, and gives an indication of how much the individual values differ from the mean. It is called the variance. If the variation is due to random causes, it is quite possible that the different yields are statistically distributed according to the normal distribution, the bell-shaped curve that is so familiar. If so, then the average of the normal distribution is somewhere around X, and its standard deviation σ somewhere around σ2 = V/(N - 1). The accuracy of these estimates increases as N increases. N - 1 is called the number of degrees of freedom associated with the variance.
Now suppose we have six treatments. We take the bean field and divide it into squares, six rows and six columns, and assign treatments to the subplots so that each row and each columns contain a certain treatment only once. The result is, of course, a Latin Square. The object is only to distribute the treatments somewhat evenly over the test plot, so we do not need a group multiplication table, and any of the many Latin Squares can be used. We now grow our beans, and associate a yield x with each of the 36 plots. The total variance V is partly due to the random effects on yield that would occur with any treatment, and the differences due to the treatments. It is possible to separate the total variance V into components due to rows, columns, treatments, and "error". This is called Analysis of Variance, AOV. The variance due to rows is n (here, n = 6) times the sum of the squares of the deviations of the row averages from the grand average, and similarly for the column and treatment averages. The total variance, less these three partial variances, gives the residual, or error, variance.
If random error were the only reason for the differences in yield, then the yields could be assumed to be distributed according to the same normal distribution, with the same standard deviation. Then, all the estimates of the standard deviation would be about the same, whether from the row, column, treatment, or error variances. The row, column and treatment variances have n - 1 (here, 5) degrees of freedom, and the error has (n-1)(n-2) degrees of freedom (here, 20). The sum of all these degrees of freedom is N -1 (here, 35), the number of degrees of freedom of the overall variance. If we now divide each variance by its degrees of freedom we get the estimates of the population variance σ2.
These estimates will not all be the same. Not only are they just estimates with statistical error, the treatments, for example, might actually be effective. Whether the differences in the estimates are simply due to chance can be investigated by dividing the row, column and treatment estimates by the error estimate. This statistic is called F, and there are tables showing how large F can be just due to chance. One usually takes a value that is exceeded by chance only 1% of the time as the criterion of significance. If the treatments really do have an effect, it will probably show up quite distinctly in the F values. The significant values of F depend on the degrees of freedom of the two estimates used, here 5 and 20. Statistical tables give the 1% value of F as 4.10 in this case. The 5% value is only 2.71. If the ratio of the estimates is larger than 4.1 for the treatments, we can be fairly sure that they have a real effect on the yield of beans. Then we can enquire further into how the yield is affected by the treatments with some appreciation of the reliability of our conclusions. The treatment averages are, of course, the raw materials for this investigation.
An example, taken from Weatherburn (Reference 1) is given here. It may be difficult for the reader to obtain a copy of this book. There are five treatments, A,B,C,D and E, and a 5 x 5 Latin Square is used. Yields of beans are as follows:
|A 7.4||D 8.9||E 5.8||B 12.0||C 14.3|
|C 11.8||B 6.5||A 8.7||E 7.6||D 7.9|
|D 10.1||C 17.9||B 9.0||A 8.5||E 7.1|
|E 8.8||A 10.1||C 15.7||D 11.1||B 7.4|
|B 11.8||E 8.8||D 14.3||C 18.4||A 10.1|
You can find the averages and the variances yourself, using the information given above, and compare your results with those in the following table.
The 1% value of F is 5.41, so the treatments are significant, which is noted by the asterisk beside the value of F. Looking at the treatment averages, we note that treatment C seems the most effective.
Statistical analysis is often deeply disappointing to someone wishing to prove a point. It can never prove that a treatment has the effect claimed, only that any effect is probably not due to chance variations. Moreover, a large amount of data is usually required to show even this. It is, therefore, a popular practice to misuse statistics. Poor sampling and experiment design is relied upon most often to produce desired results. Elaborate methods are another good shift. There is little advantage to complicated statistics and experiment design where it is not warranted. The simple Latin Square Analysis of Variance explained above is quite powerful all by itself. The best ways to get results are either to pick your sample cunningly, or to do experiment after experiment, consigning to silence all contrary results, until finally randomness comes up with the desired result; then you can publish and contact the media. A great deal of "research" is done this way in the health, medical, economic and psychological fields. Agricultural research has, on the other hand, been competently and honestly accomplished, since it is of practical application.
As a general rule, the comparison of averages is meaningless without an estimate of the variance. The Latin Square illustrates this very well. Also, statistics cannot be wisely used unless the theory of it is understood. Most users haven't the faintest idea what a sampling distribution is, and simply punch numbers into their computers.
Composed by J. B. Calvert
Created 17 November 2000