Scale Development Research

Share Embed Donate


Short Description

Scale Development...

Description

The Counseling Psychologist http://tcp.sagepub.com

Scale Development Research: A Content Analysis and Recommendations for Best Practices Roger L. Worthington and Tiffany A. Whittaker The Counseling Psychologist 2006; 34; 806 DOI: 10.1177/0011000006288127 The online version of this article can be found at: http://tcp.sagepub.com/cgi/content/abstract/34/6/806

Published by: http://www.sagepublications.com

On behalf of:

Division of Counseling Psychology of the American Psychological Association

Additional services and information for The Counseling Psychologist can be found at: Email Alerts: http://tcp.sagepub.com/cgi/alerts Subscriptions: http://tcp.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations (this article cites 51 articles hosted on the SAGE Journals Online and HighWire Press platforms): http://tcp.sagepub.com/cgi/content/abstract/34/6/806#BIBL

Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Scale Development Research A Content Analysis and Recommendations for Best Practices Roger L. Worthington University of Missouri–Columbia

Tiffany A. Whittaker University of Texas at Austin The authors conducted a content analysis on new scale development articles appearing in the Journal of Counseling Psychology during 10 years (1995 to 2004). The authors analyze and discuss characteristics of the exploratory and confirmatory factor analysis procedures in these scale development studies with respect to sample characteristics, factorability, extraction methods, rotation methods, item deletion or retention, factor retention, and model fit indexes. The authors uncovered a variety of specific practices that were at variance with the current literature on factor analysis or structural equation modeling. They make recommendations for best practices in scale development research in counseling psychology using exploratory and confirmatory factor analysis.

Counseling psychology has a rich tradition producing psychometrically sound instruments for applications in research, training, and practice. Many areas of scholarly inquiry in counseling psychology continue to be ripe for scale development research. In a special issue of the Journal of Counseling Psychology (JCP) on quantitative research methods, Dawis (1987) provided an overview of scale development techniques, Tinsley and Tinsley (1987) discussed the use of factor analysis, and Fassinger (1987) presented an overview of structural equation modeling (SEM). Although these articles continue to be cited in counseling psychology research, recent advances require updated information and a comprehensive overview of all three topics. More recently, Quintana and Maxwell (1999) and Martens (2005) provided comprehensive updates of SEM, but their focus was not specifically on its use in scale development research (see also Martens & Hasse, 2006 [this issue]; Weston & Gore, 2006 [TCP, special issue, part 1]). The purpose of this article is threefold: (a) to provide an overview of the steps taken in the scale development process using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), (b) to assess current pracThe authors contributed equally to the writing of this article. We would like to thank Jeffrey Andreas Tan for his assistance with the content analysis. Address correspondence to Roger L. Worthington, Department of Educational, School, and Counseling Psychology, University of Missouri, Columbia, MO 65211; e-mail: [email protected] THE COUNSELING PSYCHOLOGIST, Vol. 34 No. 6, November 2006 806-838 DOI: 10.1177/0011000006288127 © 2006 by the Society of Counseling Psychology

806 Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 807

tices by reporting the results of a 10-year content analysis of scale development research in counseling psychology, and (c) to provide a set of recommendations for best practices in using EFA and CFA in scale development (for more on factor analysis, see Kahn, 2006 [TCP, special issue, part 1]). We assume the reader has basic knowledge of psychometrics, including principles of reliability (Helms, Henze, Sass, & Mifsud, 2006 [TCP, special issue, part 1]), validity (Hoyt, Warbasse, & Chu, 2006 [this issue]), and multivariate statistics (Sherry, 2006 [TCP, special issue, part 1]). We begin with an overview of EFA and CFA, followed by a discussion of the procedure we used in conducting our content analysis. We then embed the findings of our content analysis within more detailed discussions of EFA and CFA, identifying potential problems and highlighting best practices. We conclude with an integrative discussion of best practices and findings from the content analysis.

OVERVIEW OF EFA AND CFA Factor analysis is a technique used to identify or confirm a smaller number of factors or latent constructs from a large number of observed variables (or items). There are two main categories of factor analysis: (a) exploratory and (b) confirmatory (Kahn, 2006 [TCP, special issue, part 1]). Although researchers may use factor analysis for a range of purposes, one of the most prevalent uses of factor-analytic techniques is to support the validity of newly developed tests or scales—that is, does the newly developed test or scale measure the intended construct(s)? More specifically, the application of factor analysis to a set of items may help researchers answer the following questions: How many factors or constructs underlie the set of items? What are the defining features or dimensions of the factors or constructs that underlie the set of items (Tabachnick & Fidell, 2001)? EFA assesses the construct validity during the initial development of an instrument. After developing an initial set of items, researchers apply EFA to examine the underlying dimensionality of the item set. Thus, they can group a large item set into meaningful subsets that measure different factors. The primary reason for using EFA is that it allows items to be related to any of the factors underlying examinee responses. As a result, the developer can easily identify items that do not measure an intended factor or that simultaneously measure multiple factors, in which case they could be poor indicators of the desired construct and eliminated from further consideration. When used for scale development, EFA becomes a combination of qualitative and quantitative methods, which can be either confusing or enlivening for researchers. We have found that novices (and some who are not novice) hope to have the statistical program produce the ultimate solution that will provide them with a set of empirically determined, indisputable Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

808 THE COUNSELING PSYCHOLOGIST / November 2006

dimensions or factors. However, effectively using EFA procedures requires researchers to use inductive reasoning, while patiently and subtly adjusting and readjusting their approach to produce the most meaningful results. Therefore, the process of scale development using EFA can become a relatively dynamic process of examination and revision, followed by more examination and revision, ultimately leading to a tentative rather than a definitive outcome. The most current approach in conducting CFA is to use SEM. Prior to analyzing the data, a researcher must indicate (a) how many factors are present in an instrument, (b) which items are related to each factor, and (c) whether the factors are correlated or uncorrelated (issues that are revealed during the process of EFA). Because the items are generally constrained to load on only one factor in CFA, it is generally intended not to explore whether a given item measures no factors, one factor, or multiple factors but instead to evaluate or confirm the extent to which the researcher’s measurement model is replicated in the sample data. Thus, it is critical to have prior knowledge of the expected relationships between items and factors before conducting CFA—hence the term confirmatory. SEM is a powerful confirmatory technique because it allows the researcher greater control over the form of constraints placed on items and factors when analyzing a hypothesized model. Furthermore, as we discuss later, researchers can also use SEM to examine competing models to assess the extent to which one hypothesized model fits the data better than an alternative model. In our discussion, we provide information about the basic concepts and procedures necessary to use SEM in scale development research. For more advanced discussions of SEM, we refer readers to several existing books and articles (e.g., Bollen, 1989; Kline, 2005; Martens, 2005; Martens & Hasse, 2006; Quintana & Maxwell, 1999; Thompson, 2004).

CONTENT-ANALYSIS PROCEDURE To provide context for our discussion of scale development best practices, we conducted a content analysis of scale development articles in counseling psychology that reflect common practices. In this section, we provide an overview of the article-selection process used in our content analysis. We then integrate the findings of our content analysis into the remainder of the article as we review the literature and recommend best practices for scale development. We reviewed scale development articles published in JCP in the 10 years between 1995 and 2004, inclusive (see appendix for a list of articles). We Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 809

based our selection of articles on two central criteria: We included (a) only new scale development research articles (i.e., we excluded articles investigating only the reliability, validity, or revisions of existing scales) and (b) only articles that reported results from EFA and CFA. A paid graduate student assistant reviewed the tables of contents for each issue of JCP published during the specified time frame. We instructed the graduate student to err on the side of being overly inclusive, which resulted in the identification of 38 articles that used EFA and CFA to examine the psychometric properties of measurement instruments. The first author reviewed these articles and eliminated 15 that did not meet the selection criteria, resulting in 23 articles for our sample. Next, the first author and second author independently evaluated the 23 articles to identify and quantify the EFA and CFA characteristics. The only discrepancies in the independent evaluations of the articles were because of clerical errors in recording descriptive information (as opposed to disagreement in classification), which we jointly checked and verified. We were interested in a number of characteristics of the studies. For studies reporting EFA procedures, we were interested in the following: (a) sample characteristics, (b) criteria for assessing the factorability of the correlation matrix, (c) extraction methods, (d) criteria for determining rotation method, (e) rotation methods, (f) criteria for factor retention, (g) criteria for item deletion, and (h) purposes and criteria for optimizing scale length (see Table 1). For studies reporting CFA procedures, we were interested in the following: (a) using SEM versus alternative methods as a confirmatory approach, (b) sample-size criteria, (c) fit indexes, (d) fit-index criteria, (e) cross-validation indexes, and (f) model-modification issues (see Table 2).

THE PROCESS OF SCALE DEVELOPMENT RESEARCH There are various strategies used in scale construction, often described using somewhat differing labels for similar approaches. Brown (1983) summarized three primary strategies: logical, empirical, and homogeneous. Friedenberg (1995) identified a slightly different set of categories: logicalcontent or rational, theoretical, and empirical, in which the latter contains criterion group and factor analysis methods. The rational or logical approach simply uses the scale developer’s judgments to identify or construct items that are obviously related to the characteristic being measured. The theoretical approach uses psychological theory to determine the content of the scale items. Both the theoretical and rational and logical approaches are no longer popular methods in scale development. The more rigorous empirical approach uses statistical analyses of item responses as Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

810 THE COUNSELING PSYCHOLOGIST / November 2006

TABLE 1: Characteristics of Exploratory Factor Analyses Used in Scale Development Studies Published in the Journal of Counseling Psychology (1995 to 2004) Characteristic

Frequency

Sample characteristics Convenience sample Purposeful sample of target group Convenience and purposeful sampling

5 10 6

Criteria used to assess factorability of correlation matrix Absolute sample size Item intercorrelations Participants per item ratio Barlett’s test of sphericity Kaiser-Meyer-Olkin test of sample adequacy Unspecified

1 1 3 5 7 11

Extraction method Principal-components analysis Common-factors analysis Principal-axis factoring Maximum likelihood Unspecified Combination principal-components analysis and common-factors analysis Unspecified

9 6 3 1 1 1

Criteria for determining rotation method Subscale intercorrelations Theory Both Other Unspecified

2 3 1 3 12

Rotation method Orthogonal Varimax Unspecified Oblique Promax Oblimin Unspecified Both orthogonal and oblique Unspecified

8 1 1 3 4 3 1

Criteria for item deletion or retention Loadings Cross-loadings Communalities Item analysis Other Unspecified No items were deleted

16 13 0 1 3 2 2 (continued)

Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 811

TABLE 1

(continued)

Characteristic

Frequency

Criteria for factor retention Eigenvalues Scree plot Minimum proportion of variance accounted for by factor Number of items per factor Simple structure Conceptual interpretability Other Unspecified Optimizing scale length None attempted Purpose Reduce total scale length Limit total items per factor Balance items per factor Criteria Redundant items Conceptually unrelated items Statistical invariance Cross-loadings Dropped items with lowest loadings Item content

18 17 2 4 5 15 3 2 15 2 3 2 1 1 1 1 4 2

NOTE: Values in each category may not sum to equal the total number of studies because some studies may have reported more than one criterion or approach.

the basis for item selection based on (a) predictive utility for a criterion group (e.g., depressives) or (b) homogenous item groupings. The method described in this article is an empirical approach that employs factor analysis to form homogenous item groupings. A number of authors have recommended similar sequences of steps to be taken prior to using factor-analytic techniques (e.g., Anastasi, 1988; Dawis, 1987; DeVellis, 2003). We review these preliminary steps in the following section because, as is the case in most scientific endeavors, early mistakes in scale development often lead to problems later in the process. Once we have described all the steps in some detail, we address the extent to which the studies in our content analysis incorporated the steps in their designs. Although there is little variation between models proposed by different authors, we rely primarily on DeVellis (2003) as the most current resource. Thus, the following description is only one of several similar models available and does not reflect a unitary best practice. DeVellis (2003) recommends the Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

812 THE COUNSELING PSYCHOLOGIST / November 2006

TABLE 2: Characteristics of Confirmatory Factor Analyses Used in Scale Development Studies Published in the Journal of Counseling Psychology (1995 to 2004) Characteristic

Frequency

SEM versus FA as a confirmatory approach SEM used FA used

14 2

Typical SEM approaches Single-model approach Competing-models approach Nested models compared Nonnested or equivalent models compared

2 8 4 4

Sample-size criteria (SEM only) Participants per parameter Unspecified

1 13

Overall model fit Chi-square Chi-square and df ratio

12 6

Incremental fit indexes reported CFI PCFI IFI NFI NNFI/TLI RNI Absolute fit indexes reported GFI AGFI RMSEA RMSEA with confidence intervals RMR SRMR Hoetler N Predictive fit indexes reported AIC CAIC ECVI BIC Fit index criteria Recommended cutoff Unspecified

8 1 2 4 7 1 10 6 6 1 4 1 1 2 1 2 1 11 3 (continued)

Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 813

TABLE 2

(continued)

Model modification Lagrange multiplier Wald statistic Item parceling

3 0 2

NOTE: Values in each category may not sum to equal the total number of studies because some studies may have reported more than one criterion or approach. AGFI = Adjusted Goodness-of-Fit Index; AIC = Akaike’s Information Criterion; BIC = Bayesian Information Criterion; CAIC = Consistent Akaike’s Information Criterion; CFI = Comparative Fit Index; ECVI = Expected Cross-Validation Index; FA = Common-Factors Analysis; GFI = Goodnessof-Fit Index; IFI = Incremental Fit Index; NFI = Normed Fit Index; NNFI/TLI = Nonnormed Fit Index or Tucker-Lewis Index; PCFI = Parsimony Comparative Fit Index; RMR = Root Mean-Square Residual; RMSEA = Root Mean-Square Error of Approximation; RNI = Relative Noncentrality Index; SEM = Structural Equation Modeling; SRMR = Standardized Root Mean-Square Residual.

following steps in constructing new instruments: (a) Determine clearly what you want to measure, (b) generate an item pool, (c) determine the format of the measure, (d) have experts review the initial item pool, (e) consider inclusion of validation items, (f) administer items to a development sample, (g) evaluate the items, and (h) optimize scale length. In scale development, the first step is to define your construct clearly and concretely, using both existing theory and research to provide a sound conceptual foundation. This is sometimes more difficult than it may initially appear because it requires researchers to distinctly define the attributes of abstract constructs. Nothing is more difficult to measure than an ill-defined construct because it leads to the inclusion of items that may be only peripherally related to the construct of interest or to the exclusion of items that are important components of the content domain. The next step is to generate a pool of items designed to tap the construct. Ultimately, the objective is to arrive at a set of items that clearly represent the construct of interest so that factor-analytic, data-reduction techniques yield a stable set of underlying factors that accurately reflect the construct. Items that are poorly worded or not central to a clearly articulated construct will introduce potential sources of error variance, reducing the strength of correlations among items, and will diminish the overall objectives of scale development (see Quintana & Minami, 2006 [this issue], on dealing with measurement error in meta-analyses). In general, researchers should write items so that they are clear, concise, readable, distinct, and reflect the scale’s purpose (e.g., produce responses that can be scored in a meaningful way in relation to the construct definition). DeVellis (2003) and Anastasi (1988) offer a host of recommendations for generating quality items and choosing a response format that are beyond the scope of this article. It suffices to say that the researcher should not Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

814 THE COUNSELING PSYCHOLOGIST / November 2006

take the quality of the item pool lightly, and a carefully planned approach to item generation is a critical beginning to scale development research. Having the items reviewed by one or more groups of knowledgeable people (experts) to assess item quality on a number of different dimensions is another critical step in the process. At a minimum, expert review should involve an analysis of content validity (e.g., the extent to which a set of items reflects the content domain). Experts can also evaluate items for clarity, conciseness, grammar, reading level, face validity, and redundancy. Finally, it is also helpful at this stage for experts to offer suggestions for adding new items and length of administration. Although it is possible to include additional scales for participants to complete that may provide information about convergent and discriminant validity, we recommend that researchers limit such efforts at this stage of development. We recommend this for two reasons. First, it is wise to keep the total questionnaire length as short as possible and directly related to the study’s central purpose. The longer the questionnaire, the less likely potential participants will be to volunteer for the study or to complete all the items (Converse & Presser, 1986). Scale development studies sometimes include as many as 3 to 4 times the number of items that will eventually end up on the instrument, making inclusion of additional scales prohibitive. Second, there are several ways that items from other measures may interact with items designed for the new instrument to affect participant responses and, thus, to interfere in the scale development process. In particular, it would be very difficult, if not impossible, to control for order effects of different measures while testing the initial factor structure for the new scale. Randomly administering existing measures with the other instruments might contaminate participants’ responses on the items for the new scale, but administering the new items first to avoid contamination eliminates an important procedure commonly used when researchers use multiple self-report scales concurrently within a single study. Thus, we believe that it is important to avoid influencing item responses during the initial phase of scale development by limiting the use of additional measures. Although ultimately a matter of researcher judgment, assessing the convergent and discriminant validity (e.g., correlation with other measures) is an important step that we believe should occur later in the process of scale development. Of the 23 studies in our content analysis, 14 reported a construct or scale definition that guided item generation, and all but 2 studies indicated that item generation was based on prior theoretical and empirical literature in the field. Occasionally, however, we found that articles provided only sparse details in the introductory material articulating the theoretical foundations for the research. The studies in our review used various itemgeneration approaches. All the approaches involved some form of rational Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 815

item generation, with the primary variations involving the combination of rational and empirical approaches. Although the extensiveness and specific approaches of the procedures varied widely, only a few studies (n = 2) did not include (or failed to report) expert review of item sets prior to conducting EFA or CFA. Finally, our content analysis showed three typical patterns with respect to the inclusion of validity items during administration to the initial development sample: (a) administering only the scale items (no validity items being included), (b) assessing only social desirability along with the scale items, or (c) administering numerous other scales along with the scale items to provide additional evidence of convergent and discriminant validity.

THE ORDERING OF EFA AND CFA IN NEW SCALE DEVELOPMENT RESEARCH Researchers typically use CFA after an instrument has already been assessed using EFA, and they want to know if the factor structure produced by EFA fits the data from a new sample. An alternative, less typical approach, is to perform CFA to confirm a theoretically driven item set without the prior use of EFA. However, Byrne (2001) stated that “the application of CFA procedures to assessment instruments that are still in the initial stages of development represents a serious misuse of this analytic strategy” (p. 99). Furthermore, reporting the findings of a single CFA is of little advantage over conducting a single EFA. Specifically, research has shown that exploratory methods (i.e., principal-axis and maximum-likelihood factor analysis) are able to recover the correct factor model satisfactorily a majority of the time (Gerbing & Hamilton, 1996). In addition, a key validity issue is the replication of the hypothesized factor structure using a new sample. Thus, rather than produce a CFA that would ultimately need to be followed by a second CFA, the most logical approach would be to conduct an EFA followed by a CFA in all cases. Thus, when developing new scales, researchers should conduct an EFA first, followed by CFA. Regardless of how effectively the researcher believes item generation has reproduced the theorized latent variables, we believe that the initial validation of an instrument should involve empirically appraising the underlying factor structure (i.e., EFA). Of the 23 new scale development articles we reviewed, a significant majority conducted EFA followed by CFA (n = 10) or only EFA without CFA (n = 8). One article reported using SEM following EFA, but the procedure was inconsistent with CFA. Two smaller subsets of articles reported only CFA (n = 2) or conducted CFA followed by EFA (n = 2). In the two studies in which EFA followed CFA, researchers had produced theoretically derived instruments that they believed required only a confirmation of the Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

816 THE COUNSELING PSYCHOLOGIST / November 2006

hypothesized factor structure (which proved wrong in both cases). As a result, when the hypothesized factor structure did not fit the data using SEM, the researchers reverted to EFA (using the same sample) as a means of uncovering the underlying factor structure—a somewhat questionable procedure that could have been avoided if they had relied on EFA in the first place. The studies that successfully used only CFA included one that reported only a single CFA and another that reported two consecutive CFAs (in which the second replicated the findings of the first).

EFA Development sample characteristics. Representativeness in scale development research does not follow conventional wisdom—that is, it is not necessary to closely represent any clearly identified population as long as those who would score high and those who would score low are well represented (Gorsuch, 1997). Furthermore, one reason many scholars have consistently advocated for large samples in scale development research (see further on) is that scale variance attributable to specific participants tends to be cancelled by random effects as sample size increases (Tabachnick & Fidell, 2001). Nevertheless, samples that do not adequately represent the population of interest affect factor-structure stability and generalizability. When all participants are drawn from a particular source sharing certain characteristics (e.g., age, education, socioeconomic status, and racial and ethnic group), even large samples will not sufficiently control for the systematic variance produced by these characteristics. Thus, it is advisable to ensure the appropriateness of the development sample to the degree possible before conducting an EFA. An important caveat with respect to sample characteristics is that in counseling psychology research, there are many potential populations whose members may be difficult to identify or from whom it is particularly difficult to solicit participation (e.g., lesbian, gay, bisexual and transgender individuals,` and persons with disabilities). Under circumstances where a researcher believes that the sample characteristics might be at variance from unknown population characteristics, she or he may be forced to adjust to these unknowns and simply move forward with a sample that is adequate but not ideal (Worthington & Navarro, 2003). In the studies we reviewed for the content analysis, some form of purposeful sampling from a specific target population was the most common approach, followed by a combination of convenience and purposeful sampling. Only about 25% of the studies used convenience sampling, most often with undergraduate student participants. Three of the studies we reviewed used split samples (i.e., a large sample split into two groups for separate analyses). Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 817

Sample size. Sample size is an issue that has received considerable discussion in the literature. There are two central risks with using too few participants: (a) Patterns of covariation may not be stable, because chance can substantially influence correlations among items when the ratio of participants to items is relatively low; and (b) the development sample may not adequately represent the intended population (DeVellis, 2003). Comrey (1973) has been cited often as classifying a variety of sample sizes from very poor (N = 50) to excellent (N = 1,000) based solely on the number of participants in a sample and as recommending at least 300 cases for factor analysis. Gorsuch (1983) has also proposed guidelines for minimum ratios of participants to items (5:1 or 10:1), which has been widely cited in counseling psychology research. However, other authors have pointed out that these general guidelines may be misleading (MacCallum, Widaman, Zhang, & Hong, 1999; Tabachnick & Fidell, 2001; Velicer & Fava, 1998). In general, there is some agreement that larger sample sizes are likely to result in more stable correlations among variables and will result in greater replicability of EFA outcomes. Velicer and Fava (1998) produced evidence indicating that any ratio less than a minimum of three participants per item is inadequate, and there is additional evidence that factor saturation (the number of items per factor) and item communalities are the most important determinants of adequate sample size (Guadagnoli & Velicer, 1988; MacCallum et al., 1999). Thus, we offer four overarching guidelines: (a) Sample sizes of at least 300 are generally sufficient in most cases, (b) sample sizes of 150 to 200 are likely to be adequate with data sets containing communalities higher than .50 or with 10:1 items per factor with factor loadings at approximately |.4|, (c) smaller samples sizes may be adequate if all communalities are .60 or greater or with at least 4:1 items per factor and factor loadings greater than |.6|, and (d) samples sizes less than 100 or with fewer than 3:1 participant-to-item ratios are generally inadequate (Reise, Waller, & Comrey, 2000; Thompson, 2004). Note that this requires researchers to set a minimum sample size at the outset and to evaluate the need for additional data collection based on the outcomes of an initial EFA. In our content analysis, absolute magnitude of sample sizes and participantper-item ratios were virtually the only references made with respect to sample size, and both varied widely. Absolute sample sizes varied from 84 to 411 (M = 258.95; SD = 100.80). Participant-per-item ratios varied from 2:1 to 35:1 (the modal ratio was 3:1). The authors addressed no other sample-size criteria when discussing the adequacy of their sample sizes. Factorability of the correlation matrix. Although many people are familiar with the previously described standards regarding sample size, the factorability of a data set also has been related to the sizes of correlations in Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

818 THE COUNSELING PSYCHOLOGIST / November 2006

the matrix. Researchers can use Bartlett’s (1950) test of sphericity to estimate the probability that correlations in a matrix are 0. However, it is highly susceptible to the influence of sample size and likely to be significant for large samples with relatively small correlations (Tabachnick & Fidell, 2001). Thus, we recommend using this test only if there are fewer than about 5 cases per variable, but this becomes moot with samples containing fewer than three cases per variable (see earlier). In studies with cases-peritem ratios higher than 5:1, we recommend that researchers provide additional evidence for scale factorability. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is also useful for evaluating factorability. This measure of sampling adequacy accounts for the relationship of partial correlations to the sum of squared correlations. Thus, it indicates the extent to which a correlation matrix actually contains factors or simply chance correlations between a small subset of variables. Tabachnick and Fidell (2001) suggested that values of .60 and higher are required for good factor analysis. In our content analysis of scale development articles in JCP, the largest number of studies (n = 11) did not report using any criteria to assess the factorability of the correlation matrix. Although some studies (n = 5) reported using Bartlett’s test of sphericity, only one of those studies contained a cases-to-items ratio small enough to provide useful information on the basis of Bartlett’s test. Although other studies had cases-to-items ratios less than 5:1, they did not report using Barlett’s test to assess scale factorability. Only 7 of the articles reported the value of KMO as a precursor to completing factor analysis, and a few articles (n = 3) used the participants-per-item ratio as the sole criterion. Extraction methods. There are a variety of factor-extraction methods based on a number of statistical theories, but the two most commonly known and studied are principal-components analysis (PCA) and commonfactors analysis (FA). There has been a protracted debate over the preferred use of PCA versus FA (e.g., principal-axis factoring, maximum-likelihood factoring) as exploratory procedures, which has yet to be resolved (Gorsuch, 2003). We do not intend to examine this debate in detail (see Multivariate Behavioral Research, 1990, Volume 25, Issue 1, for an extensive discussion of the pros and cons of both). However, it is important for researchers to understand the distinct purposes of each technique. The purpose of PCA is to reduce the number of items while retaining as much of the original item variance as possible. The purpose of FA is to understand the latent factors or constructs that account for the shared variance among items. Thus, the purpose of FA is more closely aligned with the development of new scales. In addition, although it has been shown that PCA and FA often produce similar results (Velicer & Jackson, 1990; Velicer, Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 819

Peacock, & Jackson, 1982), there are several conditions under which FA has been shown to be superior to PCA (Gorsuch, 1990; Tucker, Koopman, & Linn, 1969; Widamen, 1993). Finally, compared with PCA, the outcomes of FA should more effectively generalize to CFA (Floyd & Widaman, 1995). Thus, although there may be other appropriate uses for PCA, we recommend FA for the development of new scales. An example of the use of FA versus PCA in a simulated data set might illustrate the differences between these two approaches. Imagine that a researcher at a public university is interested in measuring campus climate for diversity. The researcher created 12 items to measure three different aspects of campus climate (each using 4 items): (a) general comfort or safety, (b) openness to diversity, and (c) perceptions of the learning environment. In a sample of 500 respondents, correlations among the 12 variables indicated that one item from each subset did not correlate with any other items on the scale (e.g., no higher than r = .12 for any bivariate pair containing these items). In FA, the three uncorrelated items appropriately drop out of the solution because of low factor loadings (loadings < .23), resulting in a three-factor solution (each factor retaining 3 items). In PCA, the three uncorrelated items load together on a fourth factor (loadings > .45). This example demonstrates that under certain conditions, PCA may overestimate factor loadings and result in erroneous decisions about the number of factors or items to retain. We should also make clear that there are several techniques of FA, including principal-axis factoring, maximum likelihood, image factoring, alpha factoring, and unweighted and generalized least squares. Gerbing and Hamilton (1996) have shown that principal-axis factoring and maximumlikelihood approaches are relatively equal in their capacities to extract the correct model when the model is known in the population. However, Gorsuch (1997) points out that maximum-likelihood extractions result in occasional problems that do not occur with principal-axis factoring. Prior to the current use of SEM as a CFA technique, maximum-likelihood extraction had some advantages over other FA procedures as a confirmatory technique (Tabachnick & Fidell, 2001). For further discussion of less commonly used approaches, see Tabachnick and Fidell (2001). Among the studies in our content analysis, most used some form of FA (n = 10), but a similar number used PCA (n = 9). One study used a combination of PCA and FA, and another did not report an extraction method. (Note: 2 of the 23 studies used only CFA and are not included in the figures reported earlier.) A cursory examination of the publication dates indicates that the majority of studies using PCA were published prior to the majority of those using FA, suggesting a trend away from PCA in favor of FA. Criteria for determining rotation method. FA rotation methods include two basic types: orthogonal and oblique. Researchers use orthogonal Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

820 THE COUNSELING PSYCHOLOGIST / November 2006

rotations when the set of factors underlying a given item set are assumed or known to be uncorrelated. Researchers use oblique rotations when the factors are assumed or known to be correlated. A discussion of the statistical properties of the various types of orthogonal and oblique rotation methods is beyond the scope of this article (we refer readers to Gorsuch [1983] and Thompson [2004] for such discussions). In practice, researchers can determine whether to use an orthogonal versus oblique rotation during the initial FA based on either theory or data. However, if they discover that the factors appear to be correlated in the data when theory has suggested them to be uncorrelated, it is still most appropriate to rely on the data-based approach and to use an oblique rotation. Although, in some cases, both procedures might produce the same factor structure with the same data, using an orthogonal rotation with correlated factors tends to overestimate loadings (e.g., they will have higher values than with an oblique rotation; Loehlin, 1998). Thus, researchers may retain or reject some items inappropriately, and the factor structure may be more difficult to replicate during CFA. Our content analysis showed that relatively few of the studies in our review reported an adequate rationale for selecting an orthogonal or oblique rotation method, with only 2 using subscale intercorrelations, 3 using theory, and 1 using both. Twelve studies did not specify the criteria used to select a rotation method, and 3 studies actually reported criteria irrelevant to the task (e.g., although the factors were correlated, the orthogonal solution matched the prior expectations for the factor solution). Also, 8 studies used orthogonal rotations despite reporting moderate to high correlations among factors, and 4 studies did not provide factor intercorrelations. Criteria for factor retention. Researchers can use numerous criteria to estimate the number of factors for a given item set. The most widely known approaches were recommended by Kaiser (1958) and Cattell (1966) on the basis of eigenvalues, which may help determine the importance of a factor and indicate the amount of variance in the entire set of items accounted for by a given factor (for a more detailed explanation of eigenvalues, see Gorsuch, 1983). The iterative process of factor analysis produces successively less useful information with each new factor extracted in a set because each factor extracted after the first is based on the residual of the previous factor’s extraction. The eigenvalues produced will be successively smaller with each new factor extracted (accounting for smaller and smaller proportions of variance) until virtually meaningless values result. Thus, Kaiser (1958) believed that eigenvalues less than 1.0 reflect potentially unstable factors. Cattell (1966) used the relative values of eigenvalues to estimate the correct number of factors to examine during factor analysis—a procedure known as the scree test. Using the scree plot, a researcher examines the descending values of eigenvalues to Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 821

locate a break in the size of eigenvalues, after which the remaining values tend to level off horizontally. Parallel analysis (Horn, 1965) is another procedure for deciding how many factors to retain. Generally, when using parallel analysis, researchers randomly order the participants’ item scores and conduct a factor analysis on both the original data set and the randomly ordered scores. Researchers determine the number of factors to retain by comparing the eigenvalues determined in the original data set and in the randomly ordered data set. They retain a factor if the original eigenvalue is larger than the eigenvalue from the random data. This has been shown to work reasonably well when using FA (Humphreys & Montanelli, 1975) as well as PCA (Zwick & Velicer, 1986). Parallel analysis is not readily available in commonly used statistical software, but programs are available that conduct parallel analysis when using principal-axis factor analysis and PCA (see O’Connor, 2000). Approximating simple structure is another way to evaluate factor retention during EFA. According to McDonald (1985), the term simple structure has two radically different meanings that are often confused. A factor pattern has simple structure (a) if several items load strongly on only one factor and (b) if items have a zero correlation to other factors in the solution. SEM constrains the relationships between items and factors to produce simple structure as defined earlier (which will become important later). McDonald (1985) differentiates this from what he prefers to call approximate simple structure, often reported in counseling psychology research as if it were simple structure, which substitutes the word small (undefined) for the word zero (definitive) in the primary definition. Researchers can estimate approximate simple structure by using rotation methods during FA. In EFA, efforts to produce factor solutions with approximate simple structure are central to decisions about the final number of factors and about the retention and deletion of items in a given solution. If factors share items that cross-load too highly on more than one factor (e.g., > .32), the items are considered complex because they reflect the influence of more than one factor. Approximating simple structure can be achieved through item or factor deletion or both. SEM approaches to CFA assume simple structure, and very closely approximating simple structure during EFA will likely improve the subsequent results of CFA using SEM. The larger the number of items on a factor, the more confidence one has that it will be a reliable factor in future studies. Thus, with a few minor caveats, some authors have recommended against retaining factors with fewer than three items (Tabachnick & Fidell, 2001). It is possible to retain a factor with only two items if the items are highly correlated (i.e., r > .70) and relatively uncorrelated with other variables. Under these conditions, it may be appropriate to consider other criteria (e.g., interpretability) in deciding whether to retain Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

822 THE COUNSELING PSYCHOLOGIST / November 2006

the factor or to discard it. Nevertheless, it may be best to revisit item-generation procedures to produce additional items intended to load on the factor (which would require a new EFA before moving on to the CFA). Conceptual interpretability is the definitive factor-retention criterion. In the end, researchers should retain a factor only if they can interpret it in a meaningful way no matter how solid the evidence for its retention based on the empirical criteria earlier described. EFA is ultimately a combination of empirical and subjective approaches to data analysis because the job is not complete until the solution makes sense. (Note that this is not necessarily true for the criterion-group method of scale development.) At this stage, the researcher should conduct an analysis of the items within each factor to assess the extent to which the items make sense as a group. Although uncommon, it may be useful to submit the item-factor combinations to a small group of experts for external interpretation to avoid a situation in which a factor makes sense to the researcher eager for a viable scale but not to anybody else. In our content analysis of JCP articles, it appeared that numerous researchers encountered problems reconciling their EFA findings with their conceptual interpretation of the factor solution and occasionally engaged in rationalizations that led to questionable practices. For example, researchers in one study selected a factor solution that fit their preconceived conceptualization of the scale although some of the factors were very highly intercorrelated (e.g., the data indicated fewer factors than the authors adopted). When a researcher desires a specific factor structure that is not adequately reproduced during EFA, the recommended practice would be (a) to adopt the factor solution supported by the data and engage in meaningful interpretation based on those findings and (b) to return to item generation and go back through earlier steps in the scale development process (including EFA). There were a few articles in our content analysis that inappropriately moved forward with CFA after making revisions that were not assessed by EFA. Criteria for item deletion or retention. Although, on rare occasions, a researcher may retain all the initial items submitted to EFA, item deletion is a very common and expected part of the process. Researchers most often use the values of the item loadings and cross-loadings on the factors to determine whether items should be deleted or retained. Inevitably, this process is intertwined with the process of determining the number of factors that will be retained (described earlier). For example, in some instances, a researcher might be evaluating the relative value of several different factor solutions (e.g., 2, 3, or 4 factors). As such, deleting items before establishing the final number of factors could actually reduce the number of factors retained. On the other hand, unnecessarily retaining Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 823

items that fail to contribute meaningfully to any of the potential factor solutions will make it more difficult to make a final decision about the number of factors to retain. Thus, the process we recommend is designed to retain potentially meaningful items early in the process and to optimize scale length only after the factor solution is clear. Most researchers begin EFA with a substantially larger number of items than they ultimately plan to retain. However, there is considerable variation among studies in the proportion of items in the initial pool that are planned for deletion. We recommend that researchers wait until the last step in EFA to trim unnecessary items and focus primarily on empirical scale development procedures at this stage in the process so as not to confuse the purposes of these two similar activities (e.g., item deletion). Thus, researchers should base decisions about whether to retain or delete items at this stage on their contribution to the factor solution rather than on the final length of the scale. Most researchers use some guideline for a lower limit on item factor loadings and cross-loadings to determine whether to retain or delete items, but the criteria for determining the magnitude of loadings and cross-loadings have been described as a matter of researcher preference (Tabachnick & Fidell, 2001). Larger, more frequent cross-loadings will contribute to factor intercorrelations (requiring oblique rotation) and lesser approximations of simple structure (described earlier). Thus, to the degree possible, researchers should attempt to set their minimum values for factor loadings as high as possible and the absolute magnitude for cross-loadings as low as possible (without compromising scale length or factor structure), which will result in fewer cross-loadings of lower magnitudes and better approximations of simple structure. For example, researchers should delete items with factor loadings less than .32 or cross-loadings less than .15 difference from an item’s highest factor loading. In addition, they should also delete items that contain absolute loadings higher than a certain value (e.g., .32) on two or more factors. However, we urge researchers to use caution when using cross-loadings as a criterion for item deletion until establishing the final factor solution because an item with a relatively high cross-loading could be retained if the factor on which it is cross-loaded is deleted or collapsed into another existing factor. Item communalities after rotation can be a useful guide for item deletion as well. Remember that high item communalities are important for determining the factorability of a data set, but they can also be useful in evaluating specific items for deletion or retention because a communality reflects the proportion of item variance accounted for by the factors; it is the squared multiple correlation of the item as predicted from the set of factors in the solution (Tabachnick & Fidell, 2001). Thus, items with low communalities (e.g., less than .40) are not highly correlated with one or more of the factors in the solution. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

824 THE COUNSELING PSYCHOLOGIST / November 2006

In our content analysis, the most common criteria for item-deletion decisions were absolute values of item loadings and cross-loadings, which were often used in combination. None of the studies we reviewed reported using item communalities as a criterion for deletion, and one study used itemanalysis procedures (e.g., contribution to internal consistency reliability). There were no items deleted in two studies, and two others did not specify the criteria for item deletion. Optimizing scale length. Once the items have been evaluated, it is useful to assess the trade-off between length and reliability to optimize scale length. Longer scales of relatively highly correlated items are generally more reliable, but Converse and Presser (1986) recommended that questionnaires take no longer than 50 minutes to complete. In our experience, scales that take longer than about 15 to 30 minutes might become problematic, depending on the respondents, the intended use of the scale, and the respondents’ motivation regarding the purpose of the administration. Thus, scale developers may find it useful to examine the length of each subscale to determine whether it is a reasonable trade-off to sacrifice a small degree of internal consistency to shorten its length. Some statistical packages (e.g., SPSS) allow researchers to compare all the items on a given subscale to identify those that contribute the least to internal consistency, making item deletion with the goal of optimizing scale length relatively easy. Generally, when a factor contains more than the desired number of items, the researcher will have the option of deleting items that (a) have the lowest factor loadings, (b) have the highest cross-loadings, (c) contribute the least to the internal consistency of the scale scores, and (d) have low conceptual consistency with other items on the factor. The researcher should avoid scale-length optimization that degrades the quality of the factor structure, factor intercorrelations, item communalities, factor loadings, or cross-loadings. Ultimately, researchers must conduct a final EFA to ensure that the factor solution does not change after deleting items.

CFA SEM versus FA. SEM has become a widely used tool in explaining theoretical models within the social and behavioral sciences (see Martens, 2005; Martens & Hasse, 2006; Quintana & Maxwell, 1999; Weston & Gore, 2006). CFA is one of the most popular uses of SEM. CFA is most commonly used during the scale development process to help support the validity of a scale following an EFA. In the past, a number of published studies have used FA or PCA procedures as confirmatory approaches (Gerbing & Hamilton, 1996). With the increasing availability of computer software, however, most researchers use SEM as the preferred approach for CFA. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 825

In our content analysis, 14 of the studies used SEM as the confirmatory approach. In comparison, 2 studies used PCA as a confirmatory approach (these appeared before SEM was widely applied in counseling psychology research). Typical SEM approaches. Once a researcher obtains a theoretically meaningful factor structure via EFA, the logical next step is to specify the resulting factor solution in the SEM confirmatory procedure—that is, if the researcher obtains a three-factor oblique factor structure in the EFA, specifying the same correlated three-factor model using SEM and finding good fit of the model to the data in a new sample will help support the factorstructure reliability and the validity of the scale. Another approach is to compare competing theoretically plausible models (e.g., different numbers of factors, inclusion or exclusion of specific paths). Thus, the researcher can compare the factor structure uncovered in the EFA with alternative models to evaluate which model best fits the data. The hypothesized model’s fitting the data better than alternative models is further evidence of construct validity. If an alternative model fits the data better than the hypothesized model, the investigator is obligated to explain how discrepancies between models effect construct validity and then to conduct another study to further validate the newly adopted model (or start over). Testing nested or hierarchically related models is another typical SEM approach. A model is nested if it is a subset of another model to which it is compared. For example, suppose a researcher conducted a study on an eight-item, course-evaluation survey in which four items assess satisfaction with the readings and homework assigned in the course and the remaining four items assess satisfaction with the professor’s sensitivity to diversity, resulting in a two-factor correlated model. However, one could assume that the eight items on the survey assess overall satisfaction with the course, resulting in a one-factor model. If this one-factor model was compared with the correlated two-factor model, the one-factor (restricted) model would be nested within the two-factor (unrestricted) model because the correlation between the two factors in the two-factor model would be set to a value of 1.0 to form the one-factor model. When comparing nested models, researchers use a chi-square difference test to examine whether a significant loss in fit occurs when going from the unrestricted model to the nested (restricted) model (for the statistical formula, see Kline, 2005). When structural equation models are not nested (i.e., one model is not a subset of another model), the chi-square difference test is an inappropriate method to assess model fit differences because neither of the two models can serve as a baseline comparison model. Still, there are instances when researchers compare nonhierarchically related models in terms of model fit, such as when testing different theoretical models posited to support the Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

826 THE COUNSELING PSYCHOLOGIST / November 2006

data. In this case, researchers may use fit indices to select among competing models. It is becoming more and more common to compare nonnested models using predictive fit indices (discussed further on), which indicate how well a model will cross-validate in future samples. Some competing models may be equivalent models—that is, these models are mathematically equivalent even when their parameter configurations appear different (MacCallum, Wegener, Uchino, & Fabrigar, 1993), and they will have a different configuration but yield the same chi-square test statistics and goodness-of-fit indices. Thus, theory should play the strongest role in selecting the appropriate model when comparing equivalent models. Another SEM approach that may support the construct validity of a scale is called multiple-group analysis. In multiple-group analysis, the same structural equation model may be applied to the data for two or more distinct groups (e.g., male and female) to simultaneously test for invariance (model equivalency) across the two groups by constraining different sets of model parameters to be equal in both groups (for more on conducting multiple-group analysis, see Bentler, 1995; Bollen, 1989; Byrne, 2001). Of the 10 studies in the content analysis using a confirmatory SEM approach, 2 of them used the single-model approach wherein the model produced by the EFA was specified in a CFA, and 8 of the studies performed model comparisons. Of these 8 studies, 4 evaluated nested models, but only 3 of the 4 used the chi-square difference test when selecting among the nested models. All 4 of the studies used fit indices to select among nonnested competing models. Of the 4 studies comparing alternative nonnested models, 2 used predictive fit indices when selecting among the set of competing models. Researchers compared equivalent and nonequivalent models in 2 of the studies in the content analysis. One of these studies selected a nonequivalent model over 2 equivalent models based on higher values of the fit indices. In the second study, the authors relied on theory when selecting among 2 equivalent models. Sample-size considerations. The statistical theory underlying SEM is asymptotic, which assumes that large sample sizes are necessary to provide stable parameter estimates (Bentler, 1995). Thus, some researchers have suggested that SEM analyses should not be performed on sample sizes smaller than 200, whereas others recommend minimum sample sizes between 100 and 200 participants (Kline, 2005). Another recommendation is that there should be between 5 and 10 participants per observed variable (Grimm & Yarnold, 1995); yet another guideline is that there should be between 5 and 10 participants per parameter to be estimated (Bentler & Chou, 1987). The findings are mixed in terms of which criterion is best because it depends on various model characteristics, including the number of indicator variables per factor (Marsh, Hau, Balla, & Grayson, 1998), Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 827

TABLE 3: Incremental, Absolute, and Predictive Fit Indices Used in Structural Equation Modeling Fit Index

Citation

Incremental fit indices Normed Fit Index (NFI) Incremental Fit Index (IFI) Nonnormed Fit Index (NNFI) or Tucker-Lewis Index (TLI) Comparative Fit Index (CFI) Parsimony Comparative Fit Index (PCFI) Relative Noncentrality Index (RNI) Absolute Fit Indices Chi-square/df ratio Goodness-of-Fit Index (GFI) Adjusted Goodness-of-Fit Index (AGFI) McDonald’s Fit Index (MFI) or McDonald’s Centrality Index (MCI) Gamma hat Hoelter N Root Mean Square Residual (RMR) Standardized Root Mean Square Residual (SRMR) Root Mean-Square Error of Approximation (RMSEA) Predictive Fit Indices Akaike’s Information Criterion (AIC) Consistent AIC (CAIC) Bayesian Information Criterion (BIC) Expected Cross-Validation Index (ECVI)

Bentler & Bonnett (1980) Bollen (1989) Tucker & Lewis (1973) Bentler (1990) Mulaik et al. (1989) McDonald & Marsh (1990) Marsh, Balla, & McDonald (1988) Jöreskog & Sörbom (1984) Jöreskog & Sörbom (1984) McDonald (1989) Steiger (1989) Hoelter (1983) Jöreskog & Sörbom (1981) Bentler (1995) Steiger & Lind (1980)

Akaike (1987) Bozdogan (1987) Schwarz (1978) Browne & Cudeck (1992)

estimation method (Fan, Thompson, & Wang, 1999), nonnormality of the data (West, Finch, & Curran, 1995), as well as the strength of the relationships among indicator variables and latent factors (Velicer & Fava, 1998). However, because there is a clear relationship between sample size and model complexity, we recommend that the researcher should account for the number of parameters to be estimated when considering sample size. Given ideal conditions (e.g., enough indicators per factor, high factor loadings, and normally distributed data), we recommend Bentler and Chou’s (1987) guideline of at least the 5:1 ratio of participants to number of parameters, with the ratio of 10:1 being optimal. In addition, we do not recommend using SEM on sample sizes smaller than 100 participants. Only one study in our content analysis reported using one of the earlier described criteria (5 to 10 participants per indicator) to establish an adequate sample size. The remainder of the studies did not specify whether Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

828 THE COUNSELING PSYCHOLOGIST / November 2006

they used particular criteria to evaluate the adequacy of the sample size to conduct SEM. However, we assessed the sample sizes for all the studies included in the content analysis and determined that the remaining studies met the 5:1 ratio of participants to parameters. Overall model fit. Researchers typically use a chi-square test statistic as a test of overall model fit in SEM. The chi-square test, however, is often criticized for its sensitivity to sample size (Bentler & Bonett, 1980; Hu & Bentler, 1999). The sample-size dependency of the chi-square test statistic has led to the proposal of numerous alternative fit indices that evaluate model fit, supplementing the chi-square test statistic. These fit indices may be classified as incremental, absolute, or predictive fit indices (Kline, 2005). Incremental fit indices measure the improvement in a model’s fit to the data by comparing a specific structural equation model to a baseline structural equation model. The typical baseline comparison model is the null (or independence) model in which all the variables are independent of each other or uncorrelated (Bentler & Bonnett, 1980). Absolute fit indices measure how well a structural equation model explains the relationships found in the sample data. Predictive fit indices (or information criteria) measure how well the structural equation model would fit in other samples from the same population (see Table 3 for examples of incremental, absolute, and predictive fit indices). We should note that there are various recommendations about reporting these indices as well as suggested cutoff values for each of these fit indices (e.g., see Hu & Bentler, 1999; Kline, 2005). Researchers have commonly interpreted incremental fit index, goodness-of-fit index, adjusted goodnessof-fit index, and McDonald’s Fit Index (MFI) values greater than .90 as an acceptable cutoff (Bentler & Bonnett, 1980). More recently, however, SEM researchers have advocated .95 as a more desirable level (e.g., Hu & Bentler, 1999). Values for the standardized root mean square residual (SRMR) less than .10 are generally indicative of acceptable model fit. Values for the root mean square error of approximation (RMSEA) at or less than .05 indicate close model fit, which is customarily considered acceptable. However, debate continues concerning the use of these indices and the cutoff values when fitting structural equation models (e.g., see Marsh, Hau, & Wen, 2004). One reason for this debate is that the findings are mixed in terms of which index is best, and their performance depends on various study characteristics, including the number of variables (Kenny & McCoach, 2003), estimation method (Fan et al., 1999; Hu & Bentler, 1998), model misspecification (Hu & Bentler, 1999), and sample size (Marsh, Balla, & Hau, 1996). Researchers should bear in mind that suggested cutoff criteria are general guidelines and are not necessarily definitive rules. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 829

According to Kline (2005), a minimum collection of these types of fit indices to report would consist of (a) the chi-square test statistic with corresponding degrees of freedom and level of significance, (b) the RMSEA (Steiger & Lind, 1980) with its corresponding 90% confidence interval, (c) the Comparative Fit Index (CFI; Bentler, 1990), and (d) the SRMR (Bentler, 1995). Hu and Bentler (1999) recommend using a two-index combination approach when reporting findings in SEM. More specifically, they recommend using the SRMR accompanied by one of the following indices: Nonnormed Fit Index, Incremental Fit Index, Relative Noncentrality Index, CFI, Gamma Hat, MFI, or RMSEA. Although there is evidence that Hu and Bentler’s (1999) joint criteria help minimize the possibility of rejecting the right model, there is also evidence that misspecified (incorrect) models could be considered acceptable when using the proposed cutoff criteria (Marsh et al., 2004). Thus, we adopt Kline’s (2005) recommendation with respect to the minimum fit indices to report. In addition, because structural equation models approximate truth, we further recommend that researchers compare competing theoretically plausible models whenever possible and report predictive fit indices (see Table 3) to ensure that the model will crossvalidate in subsequent samples. Finally, and most important, researchers should always base their selections of the appropriate model on relevant theory. In our content analysis, 12 of the 14 studies using SEM reported the chisquare statistic. All 14 studies reported at least two fit indices. We list the most commonly reported fit indices in these studies in Table 2. Although 7 articles reported the RMSEA, only 1 of these reported its corresponding 90% confidence interval (regarding confidence intervals around the RMSEA, see Quintana & Maxwell, 1999; for more on confidence intervals, see Henson, 2006 [TCP, special issue, part 1]). All but 3 studies assessed model fit using various suggested cutoff criteria (e.g., Bentler, 1990, 1992; Byrne, 2001; Comrey & Lee, 1992; Hu & Bentler, 1999; Kline, 2005; Quintana & Maxwell, 1999). Several of the studies were published after the seminal Hu and Bentler (1999) cutoff-criteria article and referred to the less stringent cutoff criteria suggested by previous researchers (e.g., .90 for incremental fit indices). Only 3 of the 8 studies in the content analysis comparing competing models (nested or nonnested) reported predictive fit indices. Model modification. When structural equation models do not demonstrate good fit, researchers often modify (respecify) and subsequently retest models (MacCallum, Roznowski, & Necowitz, 1992). This results in the confirmatory approach’s reverting to an exploratory approach again but that is of less consequence than not knowing the reasons behind poor model fit. Modification indices are sometimes used to either add or drop parameters in the process of model respecification. For example, the Lagrange Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

830 THE COUNSELING PSYCHOLOGIST / November 2006

Multiplier Modification index estimates the decrease in the chi-square test statistic that would occur if a parameter were to be freely estimated. More specifically, it indicates which parameters could be added to increase model fit by significantly decreasing the chi-square test statistic of overall fit. In contrast, the Wald statistic estimates the increase in the chi-square test statistic that would occur if a parameter were fixed to 0, which is essentially the same as dropping a nonsignificant parameter from the model (Kline, 2005). Researchers have examined the performance of these indices in terms of helping the researcher arrive at the correct structural equation model and have shown these indices to be inaccurate under certain conditions (e.g., Chou & Bentler, 2002; MacCallum, 1986). Thus, applied researchers are warned as to the accuracy of respecified models when modifications are made using the Lagrange Multiplier and the Wald statistic. In the end, theory should guide model respecification, and respecified models should be tested using new samples. Researchers may also modify models in terms of the unit of analysis used, such as item parcels. Parceling means either summing or averaging two or more items together to create parcels (sometimes referred to as bundles). These parcels are then used as the unit of analysis in SEM instead of the individual items. It is crucial, however, that researchers in the scale development process do not use item parceling, because item parcels can hide the true relationships among items in the scale (Cattell, 1974). In addition, model misspecification may be hidden when using item parceling (Bandalos & Finney, 2001). The data-driven methods for model respecification in SEM are more appropriate for fine-tuning a model than they are for large-scale respecification of severely misspecified initial models because multiple misspecification errors interact with each other, making respecification more difficult (Gerbing & Hamilton, 1996). For similar reasons, Gorsuch (1997) suggested that it is possible to use FA procedures as an appropriate alternative to adjusting the confirmatory model when finding misspecification, but this does not imply reversing the typical order of FA prior to SEM in scale development research. Finally, we highly recommend cross-validation of respecified structural equation models to establish predictive validity (MacCallum et al., 1992). Thus, another sample of data should be collected and the respecified model tested in a confirmatory approach. Of the 14 studies conducting SEM, three examined modification indices (e.g., the Lagrange Multiplier) to assess if they should add parameters to the model to significantly improve the fit. In two of these three studies, the authors implemented modifications and retested the models. These two studies allowed the errors to covary, and one study also allowed the factors Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 831

to covary. Neither of the two studies that modified the original structural equation model cross-validated the respecified model in a separate sample. Researchers in two of the studies in the content analysis used item parceling to avoid estimating large number of parameters and to reduce random error, an approach we do not recommend.

CONCLUSIONS In this article, we have examined common practices in counseling psychology scale development research using EFA and CFA techniques. We conducted a content analysis of new scale development articles in JCP during 10 years (1995 to 2004) to assess current practices in scale development. We used data from our content analysis to provide information about the typical procedures used in counseling psychology scale development research, and we compared these practices to current literature on EFA and CFA to make recommendations about best practices (which we summarize further on). We found that counseling psychology scale development research employed a wide range of procedures. Although we did not conduct a formal trend analysis, our impressions were that the content-analysis data indicated that counseling psychology scale development research became increasingly more rigorous and sophisticated during the evaluation period, especially through the attenuation of PCA procedures and the increased employment of SEM as a confirmatory procedure. However, we also found a variety of practices that seemed at odds with the current literature on EFA and SEM, which indicated a need for even more rigor and standardization. Specifically, we found the use of the following new scale development practices to be problematic: a) employing SEM prior to using EFA, b) using criteria that varied widely (or were not reported) with respect to determining the adequacy of the sample for both EFA and SEM, c) failing to report an adequate rationale for selecting orthogonal versus oblique rotation methods, d) using orthogonal rotation methods during EFA despite clear evidence that the factors were moderately to highly correlated, e) using inappropriate rationales or ignoring contrary data when identifying and reporting the final factor solution during EFA (e.g., ignoring high factor intercorrelations to retain a preferred factor structure), f) using questionable criteria as the basis for decisions about item deletion or retention, g) failing to consider the extent to which the final factor solution achieved adequate approximation of simple structure, h) making revisions to item content or adding or deleting items between the conclusion of EFA and the initiation of SEM, i) using criteria and fit indices that varied widely to determine overall model fit during SEM, j) failing to report confidence intervals when Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

832 THE COUNSELING PSYCHOLOGIST / November 2006

using the RMSEA, k) using item parcels (bundles) in scale development, and l) failing to engage in additional cross-validation following model misspecification and modification during SEM. We offer a number of caveats for the earlier described critique of scale development practices. First, some of these recommendations do not transfer directly to other approaches to empirical scale development (e.g., criterion group) and should be understood as primarily referring to the homogenous item-grouping approach. Second, it is important to note that EFA is intended to be a flexible statistical procedure to produce the most interpretable solution, which can lead to acceptable variations in practice. Thus, some researchers may disagree on how stringently to use criteria to constrain the process of EFA, and we acknowledge that the subjective and interpretive aspects of scale development may justify variations that arise in specific contexts. Finally, the current literature on both EFA and SEM continue to contain debates and conflicting recommendations that may be at variance with our conclusions. We provide recommendations for best practices here to increase standardization and rigor rather than as a resolution of those ongoing debates and data-driven improvements in best practices. RECOMMENDED BEST PRACTICES 1. Always provide a scale definition of the construct intended to be measured. 2. Use expert review of items prior to submitting them to EFA. 3. In general, EFA should precede CFA. 4. When using EFA, set a preestablished minimum sample size (³ 100) and then evaluate the need for additional data collection on the basis of an initial EFA using communalities, factor saturation, and factorloadings criteria: (a) sample sizes of 150 to 200 are likely to be adequate with data sets containing communalities higher than .50 or with 10:1 items per factor with factor loadings at approximately |.4| and (b) smaller samples sizes may be adequate if communalities are all .60 or greater or with at least 4:1 items per factor and factor loadings greater than |.6|. 5. Verify the factorability of data via a significant Bartlett’s test of sphericity (when the participants to items ratio is between 3:1 and 5:1), the KMO measure of sampling adequacy (values greater than .60), or both. 6. Recognize and understand the basic differences between PCA and FA extraction methods. For the purpose of scale development, FA is generally preferred over PCA in most instances.

Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 833

7. Even when theory suggests that factors will be uncorrelated, it is good practice to use an oblique rotation when factors are correlated in the data. Consider using an oblique rotation in the first run of an EFA with each factor solution to empirically establish whether factors might be correlated. 8. Establish which criteria to use for factor retention and item deletion or retention in advance (e.g., delete items with factor loadings less than .32 or cross-loadings less than .15 difference from an item’s highest factor loading; approximate simple structure; parallel analysis; delete factors with fewer than two items unless the items are correlated, for example, r > .70). 9. Avoid allowing the influence of preconceived biases (e.g., how the researcher wants the final solution to look) to override important statistical findings when making judgments. Consider using independent judges to assist in decision making if it seems difficult to disentangle researcher bias from conceptual interpretation of EFA results. 10. If conducting scale-length optimization, it is essential to rerun the EFA to ensure that item elimination did not result in changes to factor structure, factor intercorrelations, item communalities, factor loadings, or cross-loadings, so that all of the originally established criteria for these outcomes are still met. 11. Avoid making changes to the scale produced by the final EFA prior to conducting a CFA (e.g., adding new items, deleting items, changing item content, altering the rating scale). If you feel that the outcomes of the EFA are unsatisfactory or that changes to the scale are necessary, it is most appropriate to conduct a new EFA on the revised scale before moving to CFA. 12. Competing-models approaches in SEM in seem to be gaining favor in the literature over single-model approaches, indicating that researchers should consider evaluating the theoretical plausibility among either nested or nonnested or equivalent models. 13. When using SEM, use model complexity as the central indicator to establish the minimum sample size required before conducting CFA; we recommend a minimum of 5 cases per parameter to be estimated. 14. At a minimum, report the following SEM fit indices: (a) the chisquare with corresponding degrees of freedom and level of significance, (b) the RMSEA with corresponding 90% confidence intervals, (c) the CFI, and (d) the SRMR. 15. When comparing competing models with SEM, add an appropriate predictive fit index to the standard set described earlier (see Table 3). 16. Data-driven methods for model respecification in SEM are more appropriate for fine-tuning than for large-scale respecification of severely misspecified models. 17. The Lagrange Multiplier Modification index may be used for respecifications in which parameters are being added to the model; the Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

834 THE COUNSELING PSYCHOLOGIST / November 2006

Wald statistic may be used for decisions about eliminating parameters from the model. In the end, however, theory should accompany modification procedures using these modification indices. 18. We recommend against item parceling (bundling) in SEM for scale development research because item parcels can hide (a) the true relationships among items in the scale and (b) model misspecification (which runs contrary to the underlying purposes of CFA). 19. Clearly report all of the decisions, rationales, and procedures when using EFA and SEM in scale development research.

APPENDIX Journal of Counseling Psychology Scale Development Articles Reference List (1995 to 2004) Barber, J. P., Foltz, C., & Weinryb, R. M. (1998). The central relationship questionnaire: Initial report. Journal of Counseling Psychology, 45, 131-142. Dillon, F. R., & Worthington, R. L. (2003). The Lesbian, Gay, and Bisexual Affirmative Counseling Self-Efficacy Inventory (LGB-CSI): Development, validation, and training implications. Journal of Counseling Psychology, 50, 235-251. Heppner, P. P., Cooper, C., Mulholland, A., & Wei, M. (2001). A brief, multidimensional, problemsolving psychotherapy outcome measure. Journal of Counseling Psychology, 48, 330-343. Hill, C. E., & Kellems, I. S. (2002). Development and use of the helping skills measure to assess client perceptions of the effects of training and of helping skills in sessions. Journal of Counseling Psychology, 49, 264-272. Inman, A. G., Ladany, N., Constantine, M. G., & Morano, C. K. (2001). Development and preliminary validation of the Cultural Values Conflict Scale for South Asian women. Journal of Counseling Psychology, 48, 17-27. Kim, B. K., Atkinson, D. R., & Yang, P. H. (1999). The Asian Values Scale: Development, factor analysis, validation, and reliability. Journal of Counseling Psychology, 46, 342-352. Kivlighan, D. M., Multon, K. D., & Brossart, D. F. (1996). Helpful impacts in group counseling: Development of a multidimensional rating system. Journal of Counseling Psychology, 43, 347-355. Lee, R. M., Choe, J., Kim, G., & Ngo, V. (2000). Construction of the Asian American Family Conflicts Scale. Journal of Counseling Psychology, 47, 211-222. Lehrman-Waterman, D., & Ladany, N. (2001). Development and validation of the evaluation process within supervision inventory. Journal of Counseling Psychology, 48, 168-177. Lent, R. W., Hill, C. E. & Hoffman, M. A. (2003). Development and validation of the Counselor Activity Self-Efficacy scales. Journal of Counseling Psychology, 50, 97-108. Liang, C. T. H., Li, L. C., & Kim, B. S. K. (2004). The Asian American Racism-Related Stress Inventory: Development, factor analysis, reliability, and validity. Journal of Counseling Psychology, 51, 103-114. Mallinckrodt, B., Gantt, D. L., & Coble, H. M. (1995). Attachment patterns in the psychotherapy relationship: Development of the client attachment to therapist scale. Journal of Counseling Psychology, 42, 307-317. Miville, M. L., Gelso, C. J., Pannu, R., Liu, W., Touradji, P., Holloway, P., & Fuertes, J. (1999). Appreciating similarities and valuing differences: The Miville-Guzman UniversalityDiversity Scale. Journal of Counseling Psychology, 46, 291-307. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 835

Mohr, J. J., & Rochlen, A. B. (1999). Measuring attitudes regarding bisexuality in lesbian, gay male, and heterosexual populations. Journal of Counseling Psychology, 46, 353-369. Neville, H. A., Lilly, R. L., Duran, G., Lee, R. M., & Browne, L. (2000). Construction and initial validation of the Color-Blind Racial Attitudes Scale (CoBRAS). Journal of Counseling Psychology, 47, 59-70. O’Brien, K. M., Heppner, M. J., Flores, L. Y., Bikos, L. H. (1997). The Career Counseling Self-Efficacy Scale: Instrument development and training applications. Journal of Counseling Psychology, 44, 20-31. Phillips, J. C., Szymanski, D. M., Ozegovic, J. J., & Briggs-Phillips, M. (2004). Preliminary examination and measurement of the internship research training environment. Journal of Counseling Psychology, 51, 240-248. Rochlen, A. B., Mohr, J. J., & Hargrove, B. K. (1999). Development of the attitudes toward career counseling scale. Journal of Counseling Psychology, 46, 196-206. Schlosser, L. Z., & Gelso, C. J. (2001). Measuring the working alliance in advisor-advisee relationships in graduate school. Journal of Counseling Psychology, 48, 157-167. Skowron, E. A., & Friedlander, M. L. (1998). The differentiation of self inventory: Development and initial validation. Journal of Counseling Psychology, 45, 235-246. Spanierman, L. B., & Heppner, M. J. (2004). Psychosocial Costs of Racism to Whites Scale (PCRW): Construction and initial validation. Journal of Counseling Psychology, 51, 249-262. Utsey, S. O., & Ponterotto, J. G. (1996). Development and validation of the index of racerelated stress. Journal of Counseling Psychology, 43, 490-501. Wang, Y., Davidson, M. M., Yakushko, O. F., Savoy, H. B., Tan, J. A., & Bleier, J. K. (2003). The Scale of Ethnocultural Empathy: Development, validation, and reliability. Journal of Counseling Psychology, 50, 221-234.

REFERENCES Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317-332. Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan. Bandalos, D. J., & Finney, S. J. (2001). Item parceling issues in structural equation modeling. In G. A. Marcoulides & R. E. Schumacker (Eds.) New developments and techniques in structural equation modeling (pp. 269-296). Mahwah, NJ: Lawrence Erlbaum. Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology, 3, 77-85. Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246. Bentler, P. M. (1992). On the fit of models to covariances and methodology to the Bulletin. Psychological Bulletin, 112, 400-404. Bentler, P. M. (1995). EQS: Structural equations program manual. Encino, CA: Multivariate Software. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606. Bentler, P. M., & Chou, C.-P. (1987). Practical issues in structural modeling. Sociological Methods & Research, 16, 78-117. Bollen, K. A. (1989). A new incremental fit index for general structural equation models. Sociological Methods & Research, 17, 303-316. Bozdogan, H. (1987). Model selection and Akaike’s information criteria (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345-370. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

836 THE COUNSELING PSYCHOLOGIST / November 2006

Brown, F. G. (1983). Principles of educational and psychological testing (3rd ed.). New York: Holt, Rinehart, & Winston. Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods and Research, 21, 230-258. Byrne, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications and programming. Mahwah, NJ: Lawrence Erlbaum. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276. Cattell, R. B. (1974). Radial item parcel factoring vs. item factoring in defining personality structure in questionnaires: Theory and experimental checks. Australian Journal of Psychology, 26, 103-119. Chou, C., & Bentler, P. M. (2002). Model modification in structural equation modeling by imposing constraints. Computational Statistics and Data Analysis, 41, 271-287. Comrey, A. L. (1973). A first course in factor analysis. New York: Academic Press. Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Converse, J. M., & Presser, S. (1986). Survey questions: Handcrafting the standardized questionnaire. Newbury Park, CA: Sage. Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34, 481-489. DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks, CA: Sage. Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 56-83. Fassinger, R. E. (1987). Use of structural equation modeling in counseling psychology research. Journal of Counseling Psychology, 34, 425-436. Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286-299. Friedenberg, L. (1995). Psychological testing: Design, analysis, and use. Boston, MA: Allyn and Bacon. Gerbing, D. W., & Hamilton, J. G. (1996). Viability of exploratory factor analysis as a precursor to confirmatory factor analysis. Structural Equation Modeling, 3, 62-72. Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Gorsuch, R. L. (1990). Common factor analysis versus principal components analysis: Some well and little known facts. Multivariate Behavioral Research, 25, 33-39. Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of Personality Assessment, 68, 532-560. Gorsuch, R. L. (2003). Factor analysis. In J. A Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology (Vol. 2, pp. 143-164). Hoboken, NJ: John Wiley. Grimm, L. G., & Yarnold, P. R. (1995). Reading and understanding multivariate statistics. Washington, DC: American Psychological Association. Guadagnoli, E., & Velicer, W. F. (1988). The relationship of sample size to the stability of component patterns. Psychological Bulletin, 103, 265-275. Helms, J. E., Henze, K. T., Sass T. L., & Mifsud, V. A. (2006). Treating Cronbach’s alpha reliability as data in nonpsychometric substantive applied research. The Counseling Psychologist, 34, 630-660. Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34, 601-629. Hoelter, J. W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods & Research, 11, 325-344. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

Worthington, Whittaker / SCALE DEVELOPMENT RESEARCH 837

Hoyt, W. T., Warbasse, R. E., & Chu, E. Y. (2006). Construct validation in counseling psychology research. The Counseling Psychologist, 34, 769-805. Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424-453. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. Humphreys, L. G, & Montanelli, R. G. (1975). An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behavioral Research, 10, 193-205. Jöreskog, K. G., & Sörbom, D. (1981). LISREL V: Analysis of linear structural relations by the method of maximum likelihood. Chicago: International Educational Services. Jöreskog, K. G., & Sörbom, D. (1984). LISREL 6: A guide to the program and applications. Chicago: SPSS. Kahn, J. H. (2006). Factor analysis in counseling psychology research, training, and practice: Principles, advances, and applications. The Counseling Psychologist, 34, 684-718. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187-200. Kenny, D. A., & McCoach, D. B. (2003). Effect of the number of variables on measures of fit in structural equation modeling. Structural Equation Modeling, 10, 333-351. Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford. Loehlin, J. C. (1998). Latent variable models: An introduction to factor, path, and structural analysis (3rd ed.). Mahwah, NJ: Lawrence Erlbaum. MacCallum, R. C. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 107, 247-255. MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490-504. MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114, 185-199. MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4, 84-99. Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of incremental fit indices: A clarification of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 315-353). Mahwah, NJ: Lawrence Erlbaum. Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391-410. Marsh, H. W., Hau, K.-T., Balla, J. R., & Grayson, D. (1998). Is more ever too much? The number of indicators per factor in confirmatory factor analysis. Multivariate Behavioral Research, 33, 181-220. Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesistesting approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling, 11, 320-341. Martens, M. P. (2005). The use of structural equation modeling in counseling psychology research. The Counseling Psychologist, 33, 269-298. Martens, M. P., & Hasse, R. F. (2006). Advanced applications of structural equation modeling in counseling psychology research. The Counseling Psychologist, 34, 878-911. McDonald, R. P. (1985). Factor analysis and related methods. Hillsdale, NJ: Lawrence Erlbaum. McDonald, R. P. (1989). An index of goodness-of-fit based on noncentrality. Journal of Classification, 6, 97-103. Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

838 THE COUNSELING PSYCHOLOGIST / November 2006

McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin, 107, 247-255. Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stilwell, C. D. (1989). Evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-445. O’Connor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments, and Computers, 32, 396-402. Quintana, S. M., & Maxwell, S. E. (1999). Implications of recent developments in structural equation modeling for counseling psychology. The Counseling Psychologist, 27, 485-527. Quintana, S. M., & Minami, T. (2006). Guidelines for meta-analyses of counseling psychology research. The Counseling Psychologist, 34, 839-876. Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287-297. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. Sherry, A. (2006). Discriminant analysis in counseling psychology research. The Counseling Psychologist, 34, 661-683. Steiger, J. H. (1989). EzPATH: A supplementary module for SYSTAT and SYGRAPH. Evanston, IL: SYSTAT. Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA. Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). New York: Harper & Row. Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association. Tinsley, H. E. A., & Tinsley, D. J. (1987). Uses of factor analysis in counseling psychology research. Journal of Counseling Psychology, 34, 414-424. Tucker, L. R., Koopman, R. F., & Linn, R. L. (1969). Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika, 34, 421-459. Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10. Velicer, W. F., & Fava, J. L. (1998). Effects of variable and subject sampling on factor pattern recovery. Psychological Methods, 3, 231-251. Velicer, W. F., & Jackson, D. N. (1990). Component analysis versus common factor analysis: Some issues in selecting an appropriate procedure. Multivariate Behavioral Research, 25, 1-28. Velicer, W. F., Peacock, A. C., & Jackson, D. N. (1982). A comparison of component and factor patterns: A Monte Carlo approach. Multivariate Behavioral Research, 17, 371-388. West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage. Weston, R., & Gore, P. A., Jr. (2006). SEM 101: A brief guide to structural equation modeling. The Counseling Psychologist, 34, 719-751. Widaman, K. F. (1993). Common factor analysis versus principal components analysis: Differential bias in representing model parameters? Multivariate Behavioral Research, 28, 263-311. Worthington, R. L., & Navarro, R. L. (2003). Pathways to the future: Analyzing the contents of a content analysis. The Counseling Psychologist, 31, 85-92. Zwick, W. R., & Velicer, W. F. (1986). Factors influencing five rules for determining the number of components to retain. Psychological Bulletin, 99, 432-442.

Downloaded from http://tcp.sagepub.com by Plamen Dimitrov on July 11, 2007 © 2006 Division 17 of Counseling Psychologist Association. All rights reserved. Not for commercial use or unauthorized distribution.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF