This paper discusses an approach to introducing empirical accounting research design to Ph.D. students....
Empirical Accounting Research Design for Ph. D. Students Author(s): William R. Kinney, Jr. Source: The Accounting Review, Vol. 61, No. 2 (Apr., 1986), pp. 338-350 Published by: American Accounting Association Stable URL: http://www.jstor.org/stable/247264 . Accessed: 09/05/2014 16:35 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp
. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
.
American Accounting Association is collaborating with JSTOR to digitize, preserve and extend access to The Accounting Review.
http://www.jstor.org
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
THE ACCOUNTING REVIEW Vol. LXI, No. 2 April 1986
EDUCATION RESEARCH Frank H. Selto, Editor
Empirical
Research
Accounting For
Ph.D.
Design
Students
WilliamR. Kinney, Jr. ABSTRACT: This paper discusses an approach to introducing empirical accounting research design to Ph.D. students. The approach includes a framework for evaluating accounting experiments as well as studies based on passive observation of subjects or data. Alternative methods of isolating the effect of the "independent" variable of interest from effects of priorto-the-study-period variables and contemporaneous variables are discussed along with the advantages and limitations of each method. Also discussed is the relationship between type I and type 11error risks, sample size, and research design. The importance of research design, including theory development and means for mitigating the effects of extraneous variables, is emphasized as perhaps the only practical way to achieve research objectives in empirical research in accounting.
A
encountered problem in accounting Ph.D. programs is that first-year students do not have background in empirical research in accounting. Few B.B.A., M.B.A. or M.Acc. programs include courses in empirical research and many students have not seriously considered its nature. Yet, such an introduction is necessary if Ph.D students are to efficiently relate other courses to substantive problems in accounting and be able to take full advantage of accounting workshops. The purpose of this paper is to show how a basic framework for evaluating empirical research in accounting can be obtained in a short introduction. This can be done at the start of the first term course and provides a context for further work in philosophy of science and statisticaldesign as well as substantive areas of accounting. FREQUENTLY
The approach is generic-it is not tied to an area of accounting and doesn't depend on prior knowledge of a particular paradigm.' ' Illustrations and extensions from applied areas of accounting are also helpful. Good sources for financial accounting are Ball and Foster [1982], Lev and Ohlson [1982], and Abdel-khalikand Ajinkya [1979].Good sources for behavioral work are Ashton [1982] and Libby [1981]. I would like to acknowledge the helpful comments and suggestions of Vic Bernard, Dan Collins, Grant Clowery, Bob Libby, JerrySalamon and two anonymous reviewers. An earlier version of this paper was presented at the American Accounting Association's Doctoral Consortium in Toronto, Ontario, in August 1984.
William R. Kinney, Jr. is Price Waterhouse Auditing Professor at the University of Michigan. Manuscript received September 1984. Revisions received April 1985 and August 1985. Accepted September 1985.
338
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
Kinney
339
The generic approach focuses attention on the essence of scientific inquiry in accounting. Many of the problems faced by accounting experimenters who can manipulate some (but not all) of the levels of variables to be studied are similar to those faced by "passive observers" of the levels of all variables as set by Nature.2 Thus, the generic approach may help to avoid premature specialization [Boulding, 1956, p. 199]. Section I presents a definition of empirical accounting research and theory, hypothesis, and fact. It also defines "dependent," "independent," and prior and contemporaneous influence variables. Section II discusses alternative means for separating the effects of prior and contemporaneous influence variables from the independent variable(s), and in Section III the interrelationships among significance, power, and research design are explored. Section IV gives a summary and conclusions. I. A FRAMEWORK FOREMPIRICAL RESEARCH IN ACCOUNTING
Research is a purposive activity and its purpose is to allow us to understand, predict, or control some aspect of the environment. Research will be defined here as the development and testing of new theories of 'how the world works' or refutation of widely held existing theories. For accounting research, the theories concern how the world works with respect to accounting practices. Watts and Zimmerman [1984, p.1], state: "The objective of accounting theory is to explain and predict accounting practice." This positive, howthe-world-is approach is in contrast with the more traditional normative view that accounting "theory" is concerned with what accounting practices ought to be. Empirical accounting research (broadly considered) addressesthe question: "Does how we as a firm or as a society account
for things make a difference?"' Clearly, the accounting for items affecting tax payments makes a difference in our individual and collective lives. But does the accounting for, say, depreciation in internal or external reports affect decisions within a business firm or affect stock prices? If it does, then the size of the effect and why it occurs are important follow-up questions. Additionally, the accounting researcher must separate the underlying economic event (or state) from the accounting report of the event. Thus, while a finance researcher may be concerned only with firm characteristics, the accounting researcher must also be concerned with the costs and benefits of alternative accounting reports of those characteristics.4 In essence, empirical research involves theory, hypothesis, andfact. "Facts" are states or events that are observable in the real world. A "theory" offers a tentative explanation of the relationship between or among groups of facts in general. "Hypotheses" are predictions (or assertions) about the "facts" that will occur in a particular instance assuming that the theory is valid. Finally, observing "facts" consistent with the prediction or assertion 2 The problems are not identical. For example, while experimenters have the advantage of being able to specify the values of some variables, they also face the risk of choosing values that are too close together (or too far apart) to allow precise estimates of treatment effects or to allow generalization of conclusions to the real world. 3 Within this definition, relevant questions for auditing include, "Does how precisely we audit and report the state of a firm make a difference?" and, "How can audits of a given precision be conducted efficiently?" The first auditing question is related to the accounting question through the concept of materiality. Parallel questions involving the design of accounting systems also could be developed. 4Accounting professors may conduct research in finance, economics, behavioral science, or statistics. If the accounting question is not addressed, however, the accounting professor may face the disadvantage of being undertrained relative to other researchers. Also, he or she ignores a comparative advantage in the knowledge of accounting institutions and the sometimes subtle role of information.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
The Accounting Review, April 1986
340
made in the hypothesis lends credibility to the theory. Ordinarily, researchbegins with a realworld problem or question. One thinks about or studies the problem, reads about seemingly similar problems in other areas or disciplines such as economics, psychology, organizational behavior, or political science. By immersing himself or herself in the problem, the researcher may, either by genius or by adapting a solution from another area, develop a general theory to explain relationships among facts [Simon, 1976, Chapter 7 and Boulding, 1956]. From this statement of the general relationship among facts, hypotheses about what should be observed in a particular situation can be derived. An experiment or passive observation study then can be designed to support or deny the hypotheses. For example, suppose it is observed that a stock price increase usually follows the announcementthat a firm has changed from straight-line to accelerated depreciation. Why should a mere bookkeeping change seem to lead to an increase in firms' values? An explanation might be that market participants believe that eventsleadingto such an accountingchange also typically lead to better prospects for the firm in the future. With development and elaboration of such a theory, the researchermight develop a passive observation study of past changes or an experiment to test hypotheses based on the proposed explanation. Theories are usually stated in terms of theoretical variables or "principals" while empirical measurement requires observable variables. The difference between the principal and real-world observable variables presents difficulties for accounting researchers since accounting measurements may be either surrogates for some underlying principal of interest or may be the principal itself. For example,
if firm "performance" is the theoretical principal and earnings is chosen as the surrogate measure of firm performance, then straight-line depreciation is a component of the surrogate. As a surrogate, straight-line depreciation contains two sources of potential error that may require consideration by the researcher. One is the surrogation error due to the fact that straight-line depreciation does not "correctly" reflect the relevant performance of the firm for the purpose at hand. The other is application error due to mistakes or imprecision in applying straight-line depreciation. On the other hand, in evaluating possible determinants of managerial behavior, audited earnings using straight-linedepreciation may be specified in a contract and may serve as a principal. For example, if a manager is to receive a bonus or profit share of one percent of audited earnings, then audited earnings is the principal. Surrogation error, and any application error not detected and corrected by the auditor, is ignored for contract purposes. The same number used as a measure of firm performance will likely contain both surrogation and application error.5 To add credibility to a theory, one must not only be able to show hypothesis test resultsthat are consistent with the theory's predictions, but also have a basis to rule out alternativeexplanationsof the observed facts. This requiresconsideration of a reasonably comprehensive list of alternative explanations. Again, knowledge of related disciplines is useful in generating alternative explanations for accounting-related "facts." Some possible explanations can, of course, easily be ruled out as being of 5 Accounting systems designers and financial accounting standards setters can control the first type of measurement error, while auditors and auditing standards setters can control the second. Problems relating to the interaction of accounting and auditing standards setters, users, and auditors is, of course, a matter for accounting research.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
Kinney
341
likely negligible effect, but others will require attention. To be more specific, let Y denote the "dependent" variable to be understood, explained, or predicted. Variables causing Y(or at least related to Y) can be classified into three broad groups as follows: X=the "independent" variable that the proposed theory states should effect Y, Vs = prior-influence (prior-to-study period) factors that may affect Y, and Zs= contemporaneous factors (other than X) that may affect Y.6 That is, Y=f(X, V,, V2,
.
..,ZIZ2,
...
A common Yfor addressingan accounting question is the change in a firm's stock price. Others are a manager's act or decision. A common X is a change or difference in accounting method, whether by management's choice or by a regulatory directive. Prevalent Vs are the firm's prior period state variablessuch as profitability, leverage, liquidity, and size. For tests of theories about decision-making behavior of particular human subjects, relevant Vs often include the subject's personality traits, mathematical ability, education, training, age, firm association, and experience. The most common Z factor in accounting research studies involving stock prices is the market return (Rm). Another common group of Zs for external reporting and managerial performance studies is the unexpected portion of contemporaneous accounting measures for other firms or other divisions. Finally, since the accounting researcher is concerned with the effect of accounting reports, Zs may be underlying characteristics of the "true state" of the firm at the time of the study as measured by contemporaneous nonaccounting reports about the firm.
DISENTANGLINGTHE EFFECTS FROM VS AND ZS FROMTHE EFFECT OF X
For simplicity, assume that X is measuredat only two levels. Eitherthe observed Yis from the "control" group that receives no "treatment" or from the treatment group that receives the treatment. Alternatively, the two groups could simply be different on some relevant dimension (e.g., to test theories about accelerated depreciation, the control group might be defined as those firms that use straightline and the treatment group as those that use accelerated). Also for simplicity, assume that there is a singleprior-influencefactor Vthat effects Yand Vhas the same effect on Ywhether the subject is from the control or treatment group. Furthermore, there are no contemporaneous Zs that affect Y and the model determining Y is: (1) Yij=Bo+BX,+B2Vj+ ej,, where Yij is the value of the dependent variable for the "j"th subject in the "i"th treatment, Bo is the intercept for the control group, Xi is an indicator variable (zero for the control group and one for treatment), Bo+ B1 is the intercept for the treatment group (that is, B1is the effect of treatment), B2is the regression coefficient relating Vto Y, and eij is a random error term. The eij term will include the effects of other Vs and Zs that are here assumed to be negligible and randomly distributed between the two groups, and eijis assumed to have expectation zero and be uncorrelated with either X or V. For the simple model of equation 1, a plot of the expected values of Y given V for both groups will be parallel lines with possibly different intercepts. The difference in intercepts is the effect of the treatment (B1). Figure 1 shows the components 6 Some Zs may be expectations,at the time of the study,of still futurevaluesof X, Y, V, and otherZs.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
The Accounting Review, April 1986
342 FIGURE 1
Y VALUES FORNONEQUIVALENT GROUPSUNDERANOVA, ANCOVA, ANDMATCHING y
I
X+02V ~~~~~~~~~~~~Y=00+ X
ANOVA /-
;
andMatching ~~~NCOVA
|
V
I
~~~~~~~~~~I
lI
_~~~~~~ I
I
V.
Vs
I
I
V
Matches
of equation 1 along with ellipsesthat approximatethe locus of membersof the two groups. An experimentermay ignore V and may randomlyassignsubjectsto groups. On average,the groupswill be equivalent on V. For small samples,however,there is nontrivialrisk that such a procedure
may assign to the treatmentgroup, say, those with high values of V and to the controlgroupthose withlow V. In analyzingresults,the effect of the high Vvalues (i.e., B2VJ) will be mixed with the treatment effect. An experimentermay rule out the possible effect of V by random assignmentof subjects measuredon V
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
Kinney
343
between groups. The sample subjects are measured on V, matched into pairs according to their V values, and one from each pair is randomly assigned to treatment. Thus, even for small samples the two groups will be approximately equivalent on V. The passive observerhas no opportunity to randomlyassignsamplesubjectsto treatment. Even experimenters may have difficulty developing a satisfactory randomized design due to having too many potentially important Vsand Zs that must be simultaneously matched. Thus, in general, researchersface the problem of treatment groups that are not equivalent with respect to V. For nonequivalent groups, there are three basic ways in which the researcher can mitigate the possible effects of the V factor in the model of equation 1. These are: 1. ignoring V(i.e., assuming or hoping that Vis randomized with respect to X),
2. matching on Vex post (i.e., matching after X has been chosen by the subject or assigned by Nature), and 3. using covariance analysis to statistically estimate and remove the effect of V. The first approach ignores V, and results can be analyzed with a single-factor analysis of variance (ANOVA). The second approach physically equates the treatment groups with respect to V, and results can be analyzed using a randomized block design. The third approach "statistically" equates the groups, and results can be analyzed using analysis of covariance (ANCOVA). Each of these approaches is discussed in turn, along with some of the advantages and limitations of each for experimenters and passive observers.
Ignoring V As discussed above, ignoring a potential V is generally inadvisable due to the unknown effect of V. A negligible effect is the hopedfor result for any unmatched, unmeasured, or unknown Vsor Zs. However, most real-worldevents have multiple causes and a negligible overall effect is unlikely. Furthermore, larger samples will not help in researchdesignsthat ignore systematic effects of V. While expost matching and covariance analysis can't account for all possible Vs and Zs, they can reduce the risk that some potentially important Vs and Zs disguise the true effect of X. Figure 1 shows the relevant sampling distributions for the three approaches applied to the example. As shown in the relativelyflat distributions on the left margin, ANOVA is based on the marginal distribution of Y with no consideration of V. Ex Post Matching In many situations, the researcher selects a sample after the phenomenon of interest has taken place. Often, the researcher selects a sample of subjects from the treatment group and then selects a subject from the control group with V equal or similar to V for each treatment subject. This ex post matching on V is probably the most commonly used design for passive observational studies in accounting. For ex post matching, the model assumed to determine Y is: m = Bo +BIX,+ (2) Y, E BjMj +eij, j2
where Bo is the overall mean of Yplus the effect of (arbitrarilydesignated) match 1, Bj (for j> 1) is the differential effect of match compared to match 1, Mj is an indicator variable (equal to one if the sample subject is a member of match j and
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
The Accounting Review, April 1986
344
zero otherwise), and m is the number of matches. ' For passive observational studies, it is impossible to randomly assign subjects to treatments since by definition the subjects have already either "self-selected" into treatmentgroups, or have been so selected by Nature. It is possible that there will be few or even no matches. For example, all the firms using straight-line depreciation may be small firms and all firms using accelerated depreciation may be large. A firm's choice of accounting method (or decision to change methods) may merely reflect its V value. Figure 1 illustrates such a possibility in that only the bottom half of group 1 can be matched with a member of group 0 due to the difference in V for the groups. Even experimenters using ex ante matching with random assignment of subjects often face the lack-of-matches problem for at least some Vsand Zs. Suppose, for example, that a researcherbelieves that an auditor's response in a professional task experiment may be related to the auditor's professional training (X) after accounting for his or her mathematical abilities ( V). It may be difficult to match subjects from different firms based on mathematical abilities. This is because firms may hire and thus train (or students may choose firms and be trained) on the basis of mathematical ability.8 As will be discussedbelow, the efficiency of matching may be less than that for covarianceanalysis. However, expost matching is likely to be superior to covariance analysis when the functional form of the YlIV relationshipis nonlinearor unknown. Given that the treatment effect is not correlated with V, matching can be used for any functional form of Y and V (known or not) and analyzed using a blocked design. Covariance Analysis A researcher using analysis of covari-
ance (ANCOVA) statistically estimates the effect of V on Y and removes it. ANCOVA can be viewed as the result of projecting observed Y values along the regression line to a common point on the V axis, such as V, to yield the conditional distributions as shown in Figure 1.1 Figure 1 shows that part of the difference between the marginal distributions of Yoand Y1(as estimated using ANOVA) is due to a larger Vfor the treatmentgroup. Matching (equation 2) accounts for the difference by subtracting B O+1 BjMj from Y for each subject, and ANCOVA accounts for the difference by subtracting B0+ B2 Vijfrom Y for each subject. Thus, both matching and ANCOVA are seen to mitigate the differential effect of V. Figure 1 also shows that control group subjects with relatively high Vfor the control group are matched to subjects with relatively low Vfor the treatment group. For matched designs, all other potential sample subjects must be omitted due to lack of matches. Matching and ANCOVA yield more efficient (more precise) estimates of the treatment effect than does ANOVA. In general, however, it is unclear which of the two will be more precise. This is due to the fact that while the difference in Voand V, reduces the precision of ANCOVA, the reduction in sample size due to lack of matches reduces precision for matching. ' The matches may be by individual subject ("precision" or "caliper" matching) as discussed above, or by frequency distribution (e.g., equal mean and variance with respect to V for both groups). A test of equality on V is often used as a justification for ignoring V in the statistical analysis. I An alternative design is to limit all subjects in an experiment to a fixed level of V. This equalizes the effect of V but greatly reduces the generalizability of results over the range of reasonable V values that might occur. I The sampling distributions for matching and ANCOVA are shown as the same in Figure 1 since the expectations of estimates of the treatment effects are the same for both. As discussed, however, their standard errors will differ.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
345
Kinney
It is often difficult to predict which will be the greater problem. 10 Equation 1 and Figure I present a very simple situation even for a single V. For example, the YI Vrelationship may differ depending on whether X is at level zero or one. Furthermore, the occurrence of a given level of Vat time t-1 may have a direct effect on Y at time t but may also affect the level of X at time t which, in turn, affects Y at t. Thus, there may be two paths by which V affects Y. A complete approach would include a model of the "selection" process by which V affects X as well as the direct effects of X and V on Y."1
III. ALPHA,
BETA, SAMPLE SIZE AND RESEARCH DESIGN
Planning researchto disentangle X, Vs, and Zs involves four related factors. These are alpha (ce),beta (@),sample size, and what will be called the "research design" factor (denoted D). In a given situation, setting any three of them sets the fourth. The statistical factors of a (the probability of a type I error or incorrectly rejectinga true null hypothesis of no treatment effect), f3(the probability of a type II error or not rejecting the null hypothesis when, in fact, there is a treatment effect), and sample size are well known. The research design factor is the ratio of two subfactors. Its numerator is the hypothesized magnitude of the (X) treatment effect (denoted 6), and its denominator is the standard deviation of the residuals in the equation used to estimate B1 (denoted a). Thus, D = 6/a. The numerator depends on the researcher's theory, and the denominator depends on how the researcher disentangles the Vs and Zs and the inherent variability in the phenomenon under examination. The required sample size is a decreasing function of a, 3,6 and an increasing function of a. Therefore, for a given a and 3, the required sample size will be small if
the proposed theory implies a large effect on Yand/or the researcheris clever in designing a plan to disentangle the effects of the Vsand Zs. For example, usingANOVA (no matching) in a single test with target the reac=.05, 03=.1, and D=6/a=.5, quired sample size is 70 for each of the two groups. If the researcher has a theory yielding a larger 6 that would increase D to .75, then the sample would be 32 each, and if D is 1.0 then the sample size is 18 per group. Alternatively, for D=.5 and holding 6 constant, if matching is used and the Y., Y, correlation is .25 then the required sample is approximately 57 pairs."3 If the Yo, Y1 correlation is .5 (implying a reduction in aof about 18 percent), then the required sample is 36 pairs. The four factors and their implications for accounting research will be discussed throughtwo subtopics.These are: 1) power, and 2) prejudice against the null hypothesis. Power Consider a researcherwho has a theory that the treatment effect (B1) is positive, and who therefore is interested in testing the (null) hypothesis that the true effect of treatment is less than or equal to zero against the alternative that the true effect '0 ANCOVA will usually be less biased, however (see Cook and Campbell [1979, pp. 177-182], and Cochran [1983, pp. 127-128]). " See Cochran's comments on R. A. Fisher's advice to "Make your theories elaborate. " According to Cochran, Fisher meant that when "constructing a causal hypothesis one should envisage as many different consequences of its truth as possible, and plan observational studies to discover whether each of these consequences is found to hold" [Rosenbaum, 1984, p. 43]. This advice is consistent with Boulding's exhortation to develop and test theories that are at the level of the real world. 12 The required sample size is: n =2[(t.,
2X-2
+
6,
2,-2
)
(1 /D)]
2
See Ostle [1963, p. 553] for a table. '" The required sample size is: n=
+ to55]) (f aD)]b' (1963,p
See Ostle [1963, p. 55 1] for a table.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
The Accounting Review, April 1986
346
is greaterthan zero. Assume that a sample of treatment subjects has been matched or "paired" on V with control subjects. Also, based on an assessment of the appropriatesignificance level for the issue at hand, the researcherhas set a at .05, and the researcher has a research design in mind. What the researcher may not consider at the planning stage is the magnitude of the hypothesized effect (i.e., a particular 6 for the alternative hypothesis) and the allowable 3 for that 6. 14 A 6 may not be considered since most theories suggest only the direction of an effect and not its magnitude, and f3 is not considered because no particular 6 is specified. The researcher may proceed to testing with little consideration of whether the planned test has an adequate chance to reject the null hypothesis even if it is false. To illustrate, consider the sampling distributions in Figure 2. For both panels of Figure 2, the left-hand distribution is for the mean of the paired differences if the null hypothesis (B1=0) is true, and the right-hand distribution applies if the particular alternative (B1= 6) is true. Also for both panels, k is the point that yields a =.05 or five percent of the area to the right of the point under the left-hand distribution (i.e., Ho). In panel a, the research design and sample size yield a sampling distribution with a large area (1 - 3)to the right of k under the alternative hypothesis. Thus, there is high probability of rejecting Ho when the alternative hypothesis is true. In other words, the power of the test (1 - f) is high. In Figure 2, panel b, 6 is the same as in panel a, but the sampling distributions are much flatter due either to small sample sizes or a large standard deviation due to remaining effects of Vsand Zs. Rather than the relatively high power test of panel a, the researcher faces a low power test. At a(=.05, 13for the simple alterna-
tive hypothesis is greater than .5, and power is less than .5. Even if Ho is false (i.e., B = 6). and thus the researcher's theory of a positive treatment effect is correct, the researcherhas a less than even chance of rejecting it! Suppose that the researcher in panel b observes a test statistic that is almost significant, and the sample estimate of the treatment effect is equal to 5. He or she then decides to take a follow-up sample. The follow-up sample is also likely to indicate nonrejection due to its small size [Tversky and Kahneman, 1971, p. 107]. The real culprit, of course, is the low power of the test. If the low power is anticipated at the planning stage, an attempt can be made to mitigate its negative effects or else abort the project. In general, power can be increased by 1) increasing the sample size, or 2) increasing the design factor D by developing better theories (yielding larger 6) or by making better use of agiven sample size and theory by careful attention to the Vs and Zs (yielding smaller a). As a practical matter, improved design is often the only alternative in accounting research since the size of samples in accounting frequently is effectively fixed. For experiments, the pool of available auditors, accountants, financial statement users, and even students is effectively limited to fairly small numbers. Subjects' time is not free, and the supply is not inexhaustible. For passive observation, the number of firms for which particular accounting and other required economic data are available may be relatively small. Thus, accounting researchers need to be aware of a variety of analytical techniques appli4 See Tversky and Kahneman [19711. This is in contrast to classical or normal distribution theory-based audit sampling where, in addition to setting a to control the risk of incorrect rejection, the auditor sets i to control the risk of incorrect acceptance and sets 6 based on "intolerable" error (materiality). The auditor then selects an estimator and calculates the minimum sample size subject to the target a, j, and 6.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
347
Kinney FIGuRE2 SAMPLINGDIsRIuBUTIONSOF d
a. HighPower
*
FORHIGH AND Low POWER TESTS
I
I
OI
_
_
_
_
_
Rt
b. LowPower
I
l~~~~~~~~~~~
l.0
l
l~~~~~~~~ RejectHo~
Reec
l~~~~~~~~~~~~~Rc
lY~o)M jlal
Ho
l
cable to a variety of research problems. 15 Furthermore, for a given research paradigm, the problem of low power is likely to become more difficult over time. Other things equal, as knowledge of the effects of accounting expands, the likely size of the effect of each new or more refined theory (B1)will tend to have less additional explanatory power. As knowledge expands, the best potential Xs are investigated and become Vsor Zs. For example, early studies tested hypotheses about the
degreeof owner versusmanagercontrol as an X that affectedaccountingchoices. Laterstudieshave usedthe samevariable as a Vor Z. Absentdevelopmentsthat restructurethe way a particularproblem is addressed, future researcherswill be faced with discoveringnew Xs that have less potentialexplanatorypower. 's In debate on the preferability of parametric vs. nonparametric statistics in research, the ability of parametric methods to accommodate more Vs and Zs through covariance analysis is an often overlooked advantage.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
348
The Accounting Review, April 1986
In fact, it may be unreasonableto expect that a particulartheory based on accounting methods will yield a true differential effect that is very large relative to the variance of Y. How things are accounted for simply can't be expected to explain a large portion of stock price variability or managerial or investor behavior. Under some conditions, the sample size required to yield reasonable power exceeds the size of the known population! A researcher can get some protection by making a tentative calculation of power before investing in expensive data collection or in experimentation. For example, passive observers of accounting changes and stock returns may be able to make reasonable estimates of the standard deviation of return residuals and might make power estimates for various levels of 6. 16 If the estimated power is inadequate even for the maximum 6 that might reasonably exist, then the research can be redesigned or aborted. Experimenters are perhaps more familiar with prospective power calculations and frequently use a pilot sample to assist with the sample size and research design development. Prejudice Against the Null Hypothesis A theory usually specifies the direction of the treatment effect and a researcher generally sets out to reject hypotheses based on the assumptionthat the treatment effect is zero or in the opposite direction from what the theory predicts. The focus on rejecting the null hypothesis is the source of a number of "biases" against the null that may lead to dysfunctional consequences. Greenwald [1975] lists eight such consequences; four that seem most important for accounting researchers are discussed in order to be better able to avoid them. 1. A paper will not be submitted for publication consideration unless the
results against the null are "significant." Especially interesting or innovative results may be submitted on higher than .05 significance (or alternatively, the probability at which the results are significant are reported), but rarely does an editor see results with significance levels above .15. This prejudice need not exist if not rejecting the null gives reasonable credibility to the null."7 2. Ancillary hypotheses will be elevated in the exposition of results. Secondary hypotheses that are significant ex post will receive more attention than other secondary hypotheses and perhaps the primary hypotheses. Suggestions will be made that these results warrant further study, when in fact one would expect about one in ten nonsense relationships to be significant at the .10 level. 3. Alternative operationalization of variables or their functional form will be conducted only if "preliminary" results are insignificant. The extent to which this search activity is justified is open to debate since most theories don't imply a single measurement or functional form. 4. The search for errors will be asymmetric. Outliers that impede rejection of the null hypothesis will tend to receive more diligent attention than those that favor rejecting the null. If significant results are 16 The choice of 6 is somewhat arbitrary, but in planning it is useful to consider reasonable or plausible values for the true treatment effect of X. Alternatively, one might choose the smallest effect that informed persons would agree is empirically "important" and therefore worth knowing about if it exists, or the largest amount that one could reasonably expect. " In classical statistics, not rejecting the null is not equivalent to accepting the null. However, non-rejection by a reasonably powerful test or series of tests does increase one's subjective degree of belief in the null.
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
Kinney
349
obtained on the first analysis of a problem, the neophyte researcher may not consider a search for outliers or for other violations of statistical assumptions underlying the analysis. Nonrejection may lead one to consider such explanations and to search for programming errors and data coding errors. V. SUMMARY AND CONCLUSIONS
In this paper we have stressed consideration of planning for Vsand Zs to be able to isolate the effect of X as a potential explanation of differences in Y. This consideration may allow increased power in tests and may allow more to be learned from a given sample. Planning for Vs and Zs can reduce the risk of not rejecting the null hypothesis when it is false. Such planning may also allow a basis to argue that nonrejection of the null hypothesis may support acceptance of the null. That is, if the treatment has an important effect, then it should be revealedby the test. Thus, something may be learned whether results are statistically significant or not. This should increase the objectivity of the researcher, since the work is valuable whatever the empirical results. There are at least two ways in which the approach discussed in this paper can be useful to Ph.D. students. One is in evaluating the researchdesign of others, and the other is in planning the student's dissertation. 18 Students must evaluate the work of others whether in published articles, working papers, or accounting research workshops. A student applying the approach to the work of others might try to answer the following questions: What is the Y and what is the X? What Vs and Zs
are considered? Are there better ways to account for the effects of Vs and Zs? What are other Vsand Zs that might have important effects? The same approach can be applied by the student to his or her own dissertation proposal. While the basic development of a research proposal is the responsibility of the student, there is much to warrantearly faculty discussion of planned dissertation research. That is, the faculty can evaluate a proposal by considering the reasonableness of the theory and the adequacy of control of potential Vs and Zs. The faculty should be asked: Is the magnitude of the hypothesized effect plausible? Are all importantcompetingexplanationslisted and adequately dealt with in the plan? Will the proposed tests likely uncover evidence of a difference equal to 6 if it exists? Will nonrejection lend credibility to the null? Faculty approval of planned dissertation research reduces the student's risk by 1) ruling out potential topics that have little chance of successful completion, 2) gathering the right data on the first attempt, 3) eliminatingoutcome dependence (thus reducing moral hazard for the student), and 4) reducing the temptation of the student (and faculty) to pursue numerous tangents that may come to light as the research progresses. 1 In planning research or evaluating the research of others, a useful practice is to give early attention to the purpose of the research through preparation of a threeshort-paragraph abstract, synopsis, or working model of the research. The first paragraph answers the question "What is the problem?" The second asks, "Why is it an important problem?" and the third, "How will it be solved?" Alternatively, the questions might be: "What are you (or the researcher) trying to find out?", "Why?", and "How will it be done?"
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions
350
The Accounting Review, April 1986
REFERENCES Abdel-khalik, A. R. and B. B. Ajinkya, Empirical Research in Accounting: A Methodological Viewpoint, Accounting Education Series No. 4 (American Accounting Association, 1979). Ashton, R. H., Human Information Processing in Accounting, Accounting Research Study, No. 17 (American Accounting Association, 1982). Ball, R. and G. Foster, "Corporate Financial Reporting: A Methodological Review of Financial Research," Studies in Current Research Methodologies in Accounting: A Critical Evaluation," Journal of Accounting Research (Supplement 1982), pp. 161-234. Boulding, K. E., "General Systems Theory-The Skeleton of Science," Management Science (April 1956), pp. 197-208. Cochran, W. G., Planning and Analysis of Observational Studies, edited by L. E. Moses and F. Mosteller (John Wiley & Sons, Inc. 1983). Cook, T. D. and D. T. Campbell, Quasi-Experimentation Design & Analysis Issues for Field Settings (Houghton-Mifflin Company, 1979) especially chapters 3 and 4. Greenwald, A. G., "Consequences of Prejudice Against the Null Hypothesis," Psychological Bulletin (January 1975), pp. 1-20. Lev, B. and J. A. Ohlson, "Market-Based Empirical Research in Accounting: A Review, Interpretation, and Extension," "Studies in Current Research Methodologies in Accounting: A Critical Evaluation," Journal of Accounting Research (Supplement 1982), pp. 249-322. Libby, R., Accounting and Human Inlormation Processing: Theory and Applications (Prentice-Hall, Inc., 1981). Ostle, B., Statistics in Research, 2nd Edition (The Iowa State University Press, 1963). Rosenbaum, P. R., "From Association to Causation in Observational Studies: The Role of Tests of Strongly Ignorable Treatment Assignment," Journal of the American Statistical Association (March 1984), pp. 41-48. Simon, J. L., Basic Research Methods in Social Science: The Art of Empirical Investigation, 2nd Edition (Random House, Inc., 1978) especially chapters 3, 7, and 11. Tversky, A. and D. Kahneman, "Belief in the Law of Small Numbers," Psychological Bulletin (August 1971), pp. 105-1 10. Watts, R. L. and J. L. Zimmerman, Positive Accounting Theory (Prentice-Hall, 1986).
This content downloaded from 169.229.32.138 on Fri, 9 May 2014 16:35:44 PM All use subject to JSTOR Terms and Conditions