Evaluating training effectiveness
Short Description
Download Evaluating training effectiveness...
Description
EVALUATING TRAINING EFFECTIVENESS: AN INTEGRATED PERSPECTIVE IN MALAYSIA
Lim Guan Chong Master of Business Administration (Finance)
International Graduate School of Management Division of Business and Enterprise University of South Australia
Submitted on this 5th of August in the year 2005 for the partial requirements of the degree of Doctor of Business Administration UNIVERSITY OF SOUTH AUSTRALIA
12 JUL 2006 LIBRARY
University of South Australia DOCTOR OF BUSINESS ADMINISTRATION
PORTFOLIO SUBMISSION FORM
Name: Lim Guan Chong
Student Id No: 0111487H
Dear Sir/Madam
To the best of my knowledge, the portfolio contains all of the candidate's own work completed under my supervision, and is worthy of examination.
I have approved for submission the portfolio that is being submitted for examination.
Signed:
14
Dr Travis Kemp/Professor Dr Leo Ann Mean
Date
Supported by:
ge)(2,/2$ Dr Ian Whyte Chair: Doctoral Academic Review Committee International Graduate School of Business
Date
DBA Portfolio Declaration
I hereby declare that this paper submitted in partial fulfillment of the DBA degree is my own work and that all contributions from any other persons or sources are properly and duly cited. I further declare that it does not constitute any previous work whether published or otherwise. In making this declaration I understand and acknowledge any breaches of the declaration constitute academic misconduct which may result in my expulsion from the program and/or exclusion from the award of the degree.
Signature of candidate:
Lim Guan Chong
Date:5th August 2005
11
TABLE OF CONTENTS Portfolio Submission Form Portfolio Declaration Acknowledgements Overview 1
1.1
1.2 1.3 1.3.1
1.3.2 1.3.3 1.3.4 1.3.5 1.3.6 1.3.7 1.3.8
1.3.9 1.3.10 1.4 1.5 1.5.1
1.5.2 1.5.3 1.5.4 1.6 1.7 2
1
Research Paper 1 Methodological Issues 3 In Measuring Training Effectiveness Abstract 4 Introduction 4 Approaches to Training Evaluation 6 Discrepancy Evaluation Model 7 Transaction Model 10 Goal-Free Model 10 Systemic Evaluation 12 Quasi-Legal Approach 13 Art Criticism Model 13 Adversary Model 14 Contemporary Approaches Stufflebeam's 14 Improvement-Oriented Evaluation (CIPP) Model, 1971 Cervero's Continuing Education Evaluation, 1984 15 Kirkpatrick Model, 1959a, 1959b, 1960a, 1960b, 1976, 1979, 1994, 1996a, 1996b, 1998 16 Critical Review 22 Future Research 27 The Transfer Component 27 Evaluating Beyond the 4 Levels 28 Incorporating Competency-based Approach 29 into Training Evaluation Multi-Rater System in Training Evaluation 31 Conclusion 33 References for Paper One 34
Research Paper 2 Evaluating Training Effectiveness: An Empirical Study of Kirkpatrick Model of Evaluation in the Malaysian Training Environment For The Manufacturing Sector
43
2.1
2.2 2.3 2.4 2.5 2.6 2.6.1 2.6.2 2.6.3 2.7 2.8 2.9 2.10 2.11 3
3.1
3.2 3.3 3.4
3.5
3.6 3.7 3.8 3.9
3.10 3.11
3.12 3.13
Abstract Introduction Training Practices in Malaysia Practice of Evaluation in Training Training Evaluation Practices In Malaysia Methodology of Study Questionnaire Construction The Sample and Sampling Questionnaire Responses Findings and Discussion Limitations of Study Conclusion References for Paper Two Appendix A The Questionnaire for Research Paper Two
44 44 45 46 49
Research Paper 3 Multi-rater Feedback For Training And Development: An Integrated Perspective Abstract Introduction The Use of Multi-rater Feedback The Effectiveness Of Multi-rater Feedback For Development The Effectiveness Of Multi-rater Feedback For Appraisal The Variation Of Multi-rater Feedback Information Multi-rater Feedback Practices In Malaysia Integrating Multi-rater Feedback With Developmental Tool Multi-rater Feedback: Process Consultation As A Developmental Tool Micro Perspective Of Conversation Theory In Process Consultation An Integrated Approach for Post Multi-rater Feedback Development Conclusion References for Paper Three
74
iv
53
54 55
56 56 63 64 65 69
75 75 76 79 81 81 83
86
87 89 91
96 97
Acknowledgements I am sincerely grateful to my supervisors, Dr Travis Kemp and Professor Leo Ann Mean, who have been so supportive, by taking their time to look through my papers and gave me tremendously useful feedback and suggestions. First and foremost, thanks to my spouse, Linda Liew Mei Ling who acted as my research assistant and has put in her late nights and thoughtful moral support throughout this endeavor. Special thanks are reserved for my friends who acted as my proof readers who never let me produce less than the best I had to offer. In particular, my sincerest thanks to my respondents, relatives, families and other parties who have supported me along the way and helped me find the time to complete my thesis. Finally, my utmost appreciation to University of South Australia, International Graduate School of Management for their support and enthusiasm to achieving excellence in education.
Lim Guan Chong
Overview The majority of organizations realize that training must be a worthwhile effort; there must be
returns towards labour productivity after training. Evaluation is possibly the least developed aspect of the training cycle. This research portfolio looks at the effectiveness of Kirkpatrick Four-Levels of Evaluation with emphasis on the assessment of the methodology within the training perspective.
Evaluating training is typically linked with measuring change and quantifying the degree of
change which leads to performance. Measuring gains in organization effectiveness that resulted from training interventions is probably the most difficult task in training evaluation.
This research portfolio, as a partial fulfillment of the requirement of the degree of Doctor of Business Administration, develops a series of ideas that expand on traditional approaches to
training evaluation. The research portfolio is divided into three papers.
Paper 1 critically reviews the methodological problems faced when adopting the evaluation model developed by Donald L Kirkpatrick in 1959. A series of industrial research conducted shows little application of this definite approach. The literature provides little understanding about the transfer of the learning component when using Kirkpatrick model to determine
training effectiveness. Most current researchers find that future research on training evaluation lies in the effectiveness of transfer of the skills learned. The objective of this research portfolio through the anatomy of this classical theory is to effectively address the weaknesses by re-focusing the issue of transfer of learning as a major key to unlock the model's practicality and validity.
Paper 2 adopts a survey method to track the history, rationale, objectives, implementation and evaluation of training initiatives in the Malaysian manufacturing sector. It utilizes the survey research to triangulate reliable and convincing findings.
The research looks at the extensiveness of Kirkpatrick model as practised in the Malaysian manufacturing sector. This paper reports the practice of Kirkpatrick's 4 levels of evaluation and the effectiveness of this evaluation model within the Malaysian manufacturing sector.
1
Paper 3 is on the effective use of the multi-rater feedback system in providing multi-source information and creating self awareness based on individual strength and weaknesses. One
underlying rationale to such system is their potential impact on the individual's self awareness which is thought to enhance performance at the development stage.
This paper serves as a conceptual paper, which studies how multi-rater feedback could effectively lead to a successful developmental process through process consultation in the
context of Malaysia training environment. Through the years, training evaluation culture in Malaysia has not been properly developed. A comprehensive approach is necessary for organizations to see the benefits of conducting pre training analysis. This should be followed by an effective development plan so that a comprehensive training approach could be instilled in the Malaysian environment.
The process consultant holds the key to effective development process by using a multi-rater assessment as a pre-training gap analysis. Process consultation provides the opportunity to 'check and balance' the degree of learning and development activities through reflections, problem solving capabilities and application of theory throughout the developmental process. Good conversation was introduced as an intervention tool to complement double loop learning during process consultation.
This portfolio systematically discusses the issue of training evaluation faced by the Malaysian
manufacturing sector. It is recommended that an integrated model approach comprising preliminary and post assessment using multi-rater feedback, followed by a developmental process using process consultation, complemented by good conversation as an intervention tool, may serve as a rational balance between training financial outlays and development outcome.
2
Research Paper I
METHODOLOGICAL ISSUES IN MEASURING THE TRAINING EFFECTIVENESS
Lim Guan Chong Master of Business Administration (Finance) University of Hull
International Graduate School of Management University of South Australia
3
Methodological Issues in Measuring Training Effectiveness Lim Guan Chong International Graduate School of Management University of South Australia
1.1 Abstract This literature review examines the effectiveness and the methodological issues related to
Kirkpatrick's four-level model of evaluation and its application to training. The paper first measures the extent that the Kirkpatrick's evaluation model has been used by organizations to measure learning outcomes, reactions towards development, transfer learning, change of behavior and return of investment after training. Research was conducted to determine the weaknesses of this model faced by most practitioners. An examination of this classical theory was carried out to address the weaknesses of this model by re-focusing the issue of transfer learning as a key to unlock the model's practicality and validity.
1.2 Introduction Training evaluation is regarded as an important human resource development strategy. However, there seems to be widespread agreement that systematic evaluation is the least well
carried out training activity. Chen and Rossi (1992) commented that evaluation knowledge found in the literature has not been fully utilized in program evaluation. This reveals that
training evaluation has not been culturally embedded in most organizations. The first reason could be that companies have no knowledge in conducting training evaluation. Secondly, the available training evaluation models are not sufficient in providing a total approach for effective training evaluation. This is further evidenced by a study on the benefits of training
in Britain, which revealed that 85 percent of British companies make no attempt to assess the benefits gained from undertaking training (HMSO, 1989).
4
Since evaluation started in the area of education, most of the early definitions were in that
area. Tyler (1949) was the first researcher to define evaluation as a process of determining to what extent the educational objectives are actually being realized by the curriculum and instruction. The early researchers emphasized the need to look at attaining objectives as an important process in determining the effectiveness of any programs. This was found in the
study by Steel (1970), who compared effectiveness of the program with its cost. Boyle and Jahns (1970) defined evaluation as the determination of the extent to which the desired objectives have been attained or the amount of movement that has been made in the desired direction. Further study by Provus (1971) conceptualized the need to have a certain standard of performance as an objective-based criterion to judge the success of the program. His model made comparisons between this preset standard and what actually exists. Noe (2000) defined evaluation by referring to training evaluation as the process of analyzing the outcomes needed to determine if training was effective. However, Goldstein and Ford (2002), were of the opinion that evaluation is a systematic collection of descriptive and judgemental information necessary to make effective training decisions which are related to the selection, adoption, value, and modification of various activities.
After many in-depth studies were conducted on training evaluation and the high costeffective expectation from training, the term evaluation has been given a broader perspective in which it no longer focuses on achieving program objectives but mainly covers the methodology element of evaluation (Brinkerhoff, 1988; Goldstein, 1986; Junaidah, 2001; Shadish & Reichardt, 1987; Stufflebeam & Shinkfield, 1985). The basis of goal-based process formed only part of the overall evaluation process, unlike in the past when researchers used one preferred methodological principle to assess the degree to which
training had attained their goal. With the availability of a wider range of philosophical principles and scientific methodologies, many social scientists emphasized scientific rigor in their evaluation models, and this is reflected in their definition of the field (Junaidah, 2001). The evaluation model of these social scientists involves primarily the application of scientific methodologies to study the effectiveness of the programs. These evaluators emphasized the importance of experimental designs (Goldstein & Ford, 2002), quantitative measures (Rossi & Freeman 1993) and qualitative assessment (Wholey, Hatry & Newcomer, 1994). Contemporary social scientists, Cascio (1989), Mathieu and Leonard (1987), Morrow, Jarrett
5
and Rupinski (1997), Tesoro (1998) even adopted utility analysis in evaluating the worthiness and effectiveness of the programs.
In brief, the concept of evaluation consists of two distinct definitions; congruent and contemporary definitions (Junaidah, 2001). The congruent definition is more concerned with meeting the desired objectives. It is a process of collecting information, judging the worth or value of the program and ensuring training objectives are met.
The contemporary definition
of evaluation places emphasis on scientific investigation to facilitate decision-making. Stufflebeam (1971) mentioned that evaluation is the process of delineating, obtaining and providing useful information for judging decision alternatives. This can be seen from the evolution of the early 70s models to the current contemporary evaluation models.
1.3 Approaches to Training Evaluation Evaluation in its modern form has developed from attempts to improve the educational process (Bramley, 1996). Evaluating the effectiveness of people became popular at about the same time as scientific management, and school officials began to see the possibility of
applying these concepts to school improvement (Bramley, 1996). Tyler (1949) model is generally considered an early prominent evaluation model which was planned to evaluate the value of progressive high-school curricula with more conventional ones (Stufflebeam & Shinkfield, 1985).
Tyler (1949) introduced the Basic Principles of Curriculum and Instruction, which is organized around four main concerns: What educational purposes should the organization seek to attain? How to select learning experiences that are likely to be useful in achieving these purposes? How can the selected learning experiences be organized for effective instruction? How can the effectiveness of these learning experiences be evaluated?
Tyler laid the foundation for an objective-based style of evaluation. Objectives were seen as being critical because they were the source for planning, guiding the instruction and
6
preparing the test and measurement procedures. Tyler's objective-based evaluation model concentrates on clearly stated objectives by changing the evaluation from appraisal of
students to appraisal of programs. He defined evaluation as assessing the degree of attainment of the program objectives. Decisions made on any program had to be based on the goal congruence between the objectives and the actual outcomes of the program (Stufflebeam & Shinkfield, 1985).
1.3.1 Discrepancy Evaluation Model The Discrepancy Evaluation Model, developed by Provus (1971) is used in situations where a program is examined through its development stages with the understanding that each stage (which Provus defines as design, installation, process, product and cost-benefit analysis) is measured against a set of performance standards (objectives). The cost-benefit analysis
identifies the potential benefits of the training before it is carried out. The expected behaviours which result from the training are agreed upon between the trainer and the
trainees. The analysis also establishes training objectives, which are defined as changes in work behaviour and increased levels of organizational effectiveness (Bramley & Kitson, 1994). The program developers had certain performance standards in mind regarding how the
program should work and how to identify if it were working. The discrepancies that are observed between the standards and the developed design are communicated back to the relevant parties for review or further corrective action. A discrepancy evaluator's role is to determine the gap between what is and what should be. This model helps the evaluators to make decisions based on the difference between preset standards and what actually exists (Boulmetis & Dutwin, 2000).
Provus's Discrepancy Evaluation Model can be considered an extension to Tyler's earlier objective-based model where a set of performance standards must be derived to serve as the objectives to which the evaluation of the program is based. Furthermore, the model may be also viewed as having properties of both the formative and summative evaluation (Boulmetis & Dutwin, 2000). The design stage comprises the needs analysis and program planning stages; installation and process are parts of the implementation stage where formative
7
evaluation is done; and product and cost-benefit analysis stages comprises a summative evaluation stage.
Formative evaluation focuses on the process criteria to provide further information to understand the training system so that the intended objectives are achieved (Goldstein &
Ford, 2002). Brown, Werner, Johnson and Dunne (1999) note several potential benefits of formative evaluation. The program could be assessed half way through to see whether it is on track, effectively performed, and whether the activities are meeting the needs of the
training. The evaluator determines the extent to which the program is running as planned, measures the program progress in attaining the stated goals, and provides recommendations
for improvement. The evaluation findings in these reports and the monitoring data could be used to end a program in midstream (Goldstein & Ford, 2002). Unlike formative evaluation, summative evaluation is fairly stable and does not allow adjustments during the program cycle. Summative evaluation involves evaluating and determining whether the program has experienced any unplanned effects. It helps organizational decision makers decide whether
to use the program again or improve it in some way. Campbell (1988) discriminates between two types of summative evaluations; the first evaluation simply questions whether a
particular training program produces the expected outcome. The second evaluation compares and investigates the benefits and viability of programmed instruction procedures. By comparing the two evaluations, it was found that programmed instruction produces quicker mastery of the subject, but the eventual level of learning retention is the same with either technique (Campbell, 1988).
Provus Discrepancy Evaluation Model provides information for establishing measures of training success by determining whether the actual content of the training material would develop knowledge, skill and ability (KSA) and eventually lead to a successful job performance. However, there are too many subjective issues that exist, especially on the setting up of the performance criterion. The chosen criterion is based on the relevance of three components: knowledge, skill and ability which are necessary to succeed in the training and eventually on the job. Considering that modem approaches to assessing training programs must be examined with a multitude of measures, including participant reactions, learning, performance, and organizational objectives, it is necessary for training evaluators to view the performance criteria as multidimensional (Goldstein & Ford, 2002). Training can
best be evaluated by examining many independent performance dimensions. However, the 8
relationship between measures of success should be closely scrutinized because the inconsistencies that occur often provide important insights into training procedures (Goldstein & Ford, 2002). Decisions and feedback processes depend on the availability of all
sources of information. There are many different dimensions in which the performance criteria can vary. Issues like relevance and reliability of the criterion are important to consider should one wish to adopt this discrepancy evaluation model. There are several considerations in the evaluation of the performance criteria. These include acceptability to the organization, networks and coalition that can be built between trainees and realistic measures (Goldstein & Ford, 2002).
Responsive approaches used in the goal-free model are better evaluative approaches as there is considerable variation in what the objectives of a program are thought to be. Responsive approaches are a form of action research which involves the stakeholders in the data collection process (Bramley, 1996). The intention is not to attribute causality, but to gain a
sense of the value of program from different perspectives. The term "responsive evaluation" was first used by Stake (1977) to describe a strategy in which the evaluator is less concerned with the objectives of the program than its effect in relation to the concerns of interested parties, namely the stakeholders.
The responsive approach involves protracted negotiations with a wide range of stakeholders in constructing the report. It is thus more likely to reflect their reality and be useful for them. However, the underlying philosophy of responsive evaluation is different from the goal-based approach. Evaluators are seen as subjective partners and the evaluation is based upon a jointcollaborative effort which results in findings being constructed rather than revealed by the investigation. Truth is a matter of consensus among informed parties. Facts have no meaning except within some value framework. Phenomena can only be understood in the context in which they are studied, generalization is not possible.
The suggested method intends to achieve progressive focus by giving more attention to emerging issues rather than seeking the truth. Legge (1984) introduced a model similar to goal free evaluation which evaluates planned organizational change. The evaluation is a joint, collaborative process, which results in something more constructed than revealed by the
investigation. Legge (1984) suggests that instead of attempting evaluation as a thoroughly
9
monitored research, a contingency approach should be adopted. The contingency approach is used to decide which approach is more appropriate or best matches the functional
requirements of the evaluation exercise. Campbell (1988) revealed that internal validity of the scientific approach may not be so crucial. To increase internal validity, the legitimate stakeholders should agree on the evaluation approach. The highlight on internal validity in the scientific approach will frequently imply controlling key aspect of the context and many
organizational variables. This may lead to rather simplified information which clients find difficult to use because it does not reflect their perception of organizational reality. Due to this strong bipolarity between practitioners and academics, not many responsive evaluations have been described in the training literature (Bramley, 1996).
1.3.2 Transaction Model The Transaction Model developed by Stake (1977) affords a concentration of activity among the evaluator, participants and the project staff (Madaus, Scriven & Stufflebeam, 1986). This model combines monitoring with process evaluation through regular feedback sessions
between evaluator and staff. The evaluator uses a variety of observational and interview techniques to obtain information and the findings will be shared with all the relevant parties to improve the overall program. The evaluator participates and provides project activities. Besides trying to obtain objectivity, the evaluators use subjectivity in the transaction model.
This model may have a goal-free or a goal-based orientation. Findings are shared with the staff of all the projects in order to improve both individual and overall projects (Boulmetis & Dutwin, 2000).
1.3.3 Goal-Free Model Unlike early models, the goal-free model developed by Michael Scriven is a model that involved methodological studies and processes (Popham, 1974). The evaluation model examines how the program is performing and how the program could address the needs of the client population. Program goals are not the criteria on which evaluation is based. However, it is a data gathering process which studies actual happenings and evaluates the effectiveness
10
of the program meeting the client's needs. The evaluator has no preconceived notions regarding the outcome of the program (as opposed to the goal-based model). Categories of evaluation naturally emerge from the evaluator's actual observation. Once the data have been collected, the evaluator attempts to draw conclusions about the impact of the program in addressing the needs of the stakeholders.
However, this model has its weakness in terms of its subjective measures. There are some preconceived notions that the evaluator must be an expert in his respective field and some say
no expertise is better (Rossi & Freeman, 1993). Some researchers said that an evaluator who is not familiar with the nuances, ideologies and standards of a particular professional area will presumably not be biased when observing and collecting data on the activities of a
program. They maintain, for example, that a person who is evaluating a program to train dental assistants should not be a person trained in the dental profession. But other researchers allege that a person who is not aware of the nuances, ideologies and standards of the dental profession may miss a good deal of what is important to the evaluation. Both sides agree that an evaluator must attempt to be an unbiased observer and be adept at observation and capable of using multiple data collection methods (Wholey, Hatry 8z Newcomer, 1994). This is a topic of debate among many experts. Scriven suggested using two goal-free evaluators, each working independently to address the preconceived issues and reduce the possible biasness in evaluation (Scriven, 1991).
A study by O'Leary (1972) illustrates the importance of considering other dimensions of the
criteria. She used a program of role-playing and group problem-solving sessions with hardcore unemployed women. At the conclusion of the program, the trainees had developed positive changes in attitude toward themselves. However, it also turned out that these changes did not reflect the lack of positive attitudes toward their tedious and structured jobs. These trainees apparently raised their levels of aspiration and subsequently sought
employment in a working setting consistent with their newly found expectations. It was obvious that the trainees were leaving the job as well as experiencing positive changes in attitude. However, there are many other cases in which the collection of a variety of criteria related to the objectives is the only way to effectively evaluate the training program
(Goldstein & Ford, 2002). This has caused goal-based evaluation lost ground during the last 20 years because of the growing conviction that evaluation is actually a political process and
11
that the various values held in the society are not represented by an evaluative process which implies that a high degree of consensus is possible (Bramley, 1996).
Further studies by Parlette and Hamilton (1977) rejected the classical evaluation system, which focuses on objective reality, assumed to be equally relevant to all stakeholders in
acknowledging the diversity posed by different interest groups. They suggested the
"illuminative evaluation", with description and interpretation rather than with measurement and prediction.
1.3.4 Systemic Evaluation Systemic evaluation analyses the effectiveness of the whole system and enhances the interfaces between the sub-systems in such a way as to increase the effectiveness of the
system. That is what the "system approach" sets out to do (Rossi & Freeman 1993). The most
comprehensive purpose of systemic evaluation is to find out to what extent training has contributed to the business plans of various parts of the organization and consider whether the
projected benefits obtained outweigh the likely cost of training.
The main questions, which this strategy sets out to answer, are (Bramley, 1996): Is the program reaching the target population?
Is it effective? How much does it cost?
Is it cost effective?
These questions are used to derive facts about the evaluation by defining the size of the target
population and working out the proportion that have attended the training and not opinions of whether useful learning has taken place. Effectiveness is difficult to measure as the word may
imply different meanings to different people. However, the model seems to measure quantity rather than the quality of what is being done.
In the system analysis model, the evaluator looks at the program in a systematic manner,
studying the input, throughput and output (Rivlin, 1971).
12
Input are elements that come into the system (i.e. clients, staff, facilities and resources). Throughput consists of things that occur as the program operates, for example, activities, client performance, staff performance, and adequacy of resources such as money, people and space. Output is the result of program-staff effectiveness, adequacy of activities etc. The evaluator mainly examines the program efficiency in light of these categories.
1.3.5 Quasi-Legal Approach Quasi-legal evaluation operates in a court of inquiry manner. Witnesses are called to testify and tender evidence. Great care and attention is taken to hear a wide range of evidence
(opinions, values and beliefs) collected from the program. This approach is basically used to evaluate social programs rather than formally evaluate training or development activities. Quasi-legal evaluation was reported flawed by Porter and McKibbin (1988) in the area of
management education in the USA. The substantial information received from stakeholders was analysed by a small group of professors from a business school. The students were basically satisfied with the qualification which they have obtained and found course worthwhile and useful. However, the researchers criticized that young graduates who attend MBA courses have never worked in an organization and thus do not understand the sort of issues, which should be the basic discussion material of MBA courses. A similar problem
arose with Constable and McCormick's (1987) report on the demand for and supply of management education and training in the UK. The researchers found that judgement by insufficiently impartial judges in the quasi-legal approach may be irrelevant, biased or inconclusive (Bramley, 1996).
1.3.6 Art Criticism Model In the Art Criticism Model developed by Eisner (1997), the evaluator is a qualified expert in
the nuances of the program and becomes the expert judge of the program's operation. The success of this model depends heavily upon the evaluator's judgment. The intended outcome
may come in the form of critical reflection and/or improved standard. This model could be
13
used when a program wishes to conduct a critical review of its operation prior to applying for funding or accreditation.
1.3.7 Adversary Model In Owen's Adversary Model, the evaluator facilitates a jury that hears evidence from
individuals on particular program aspects (Madaus, Scriven & Stufflebeam, 1986). The jury uses multiple criteria to "judge" evidence and make decisions on what have happened. This model can be used when there are different views of what is actually happening in a program such as arguments for and against program components.
1.3.8 Contemporary Approaches - Stufflebeam's Improvement-Oriented Evaluation (CIPP) Model, 1971 Stufflebeam considers the most important purpose of evaluation is not to prove but to improve (Stufflebeam & Shinkfields, 1985). The four basic types of evaluation in this model are context (C), input (I), process (P) and product (P). Context evaluation defines relevant environment and identifies training needs and opportunities of specific problems. Input evaluation provides information to determine usage of resources in the most efficient way to meet program objectives. The results of input
evaluation are often seen as policies, budgets, schedules, proposals and procedures. Process evaluation provides feedback to individuals responsible for implementation. It is accomplished through providing information for preplanned decisions during implementation and describing what actually occurs. This includes reaction sheets, rating scales and content analysis. Ultimately, product evaluation measures and interprets the attainment of program
goals. Contemporary approaches could take place both during and after the program with the aim to improve program evaluation by expanding the scope of evaluation through its four basic types of evaluation (Madaus, Scriven & Stufflebeam, 1986). The CIPP model was conceptualized as a result of attempts to evaluate projects that had been
funded through the Elementary and Secondary Act of 1956 (Stufflebeam, 1983). To conduct CIPP model evaluation, the evaluator needs to design preliminary plans and deal with a wide
14
range of choices pertaining to evaluation. This requires collaboration between clients and evaluators as a primary source for identifying the interest of the various stakeholders.
1.3.9 Cervero's Continuing Education Evaluation, 1984 In Cervero's book titled "Effective continuing education for professionals" he suggested seven categories of evaluation questions organized around seven criteria to determine
whether the programs were worthwhile (Cervero, 1988). The seven criteria are (a) program design and implementation, (b) learner participation, (c) learner satisfaction, (d) learner knowledge skills and attitudes, (e) application of learning after the program, (t) impact of application of learning and (g) program characteristic associated with outcomes. Program design and implementation is concerned with what was planned, what was actually
implemented and the congruence between the two. Factors such as the activities of learners and instructors and the adequacy of the physical environment for facilitating learning are common questions which are asked in this category.
Learner participation has both quantitative and qualitative dimensions. The quantitative dimension deals with evaluative questions that are most commonly asked in any formal
program. The data is not used to infer answers in the other categories. Qualitative data is collected in an anecdotal fashion by unobtrusively observing the proceedings of the educational activities.
Learner satisfaction is concerned with the participants' reaction and is collected according to various dimensions, such as content, educational process, instructor's performance, physical environment and cost.
Learner knowledge, skills and attitudes focus on changes in the learner's cognitive, psychomotor and affective goals. Normally, the evaluator will adopt a pen and paper test to
judge the effectiveness of these categories. Application of learning addresses the degree of skill transfer to the actual work place. The impact of application of learning focuses on the second-order effects, which means the transfer and impact on the public (Cervero, 1988).
15
Program characteristics are associated with the outcome of the program. There are two kinds of evaluative questions: the implementation questions and the outcome questions. Implementation questions are useful for determining what happened before and during the program. Outcome questions are useful for determining what occurred as a result of the program.
The seven categories in this model are not viewed as a hierarchy (Junaidah, 2001). Cervero's ideas have several antecedents in the evaluation literature. His framework was influenced by
Kirkpatrick's (1959) and Tyler's (1949) models. It is considered to be a comprehensive model as it covers all the stages involved in starting from the program design stage to the outcome stage. However, this model evaluation may be viewed as being too tedious to implement due to its complexity. The author is too immersed in getting facts of the entire process and ignores the efficiency of the whole evaluation process. This makes the model more summative than formative in nature.
1.3.10 The Kirkpatrick Model, 1959a, 1959b, 1960a, 1960b, 1976, 1979, 1994, 1996a, 1996b, 1998 One of the most widely used model for classifying the levels of evaluation, used by Barclays
Bank PLC, Reeves in 1996 and others, was developed by Kirkpatrick. His model looks at four levels of evaluation, from the basic reaction of the participants to the training and its impact to the organizational. The intermediary levels examine what people learned from the
training and whether learning has affected their behaviour on the job. Level one (Level 1) concerns itself with the most immediate reaction of participants and is easily measured by
simple questionnaires after the training. Level two (Level 2) is harder to measure and is concerned with measuring what people understood and how they were able to demonstrate
their learning in the work environment. Level two (Level 2) can be measured by pen and paper tests or through job simulations. Level three (Level 3) looks at the changes in people's behaviour towards the job. For example, after a writing skills course, did the individual make fewer grammatical and spelling errors and were their memos easier to understand? Level
four (Level 4) measures the "result" gained from the training. It focuses on the impact of the training on the organization rather than the individual.
16
Kirkpatrick (1959) developed this coherent evaluation model by producing what was thought to be a hierarchy system of evaluations which indicates effectiveness through:Level 1 (Reaction) Level 2 (Learning) Level 3 (Behaviour) Level 4 (Results)
Kirkpatrick's (1994) Training Evaluation Model Reaction
How did the participants react to the training?
Learning
What information and skills were gained?
Behaviour
How have participants transferred knowledge and skills to their jobs?
Results
What effect has training had on the organization and the achievement
of its objectives? (Timely and quality performance appraisals are corporate goal) Kirkpatrick was the first researcher to develop a coherent evaluation strategy by producing what was thought to be a hierarchy of evaluations, which would indicate benefit (Plant & Ryan, 1994).
Level 1: Reaction Evaluation Kirkpatrick proposed the use of a post course evaluation form to quantify the reactions of
trainees. Evaluation at this level is associated with the terms "happiness sheet" or "smile sheet" because reaction information is usually obtained through a participatory questionnaire administered near or at the end of a training program (Smith, 1990). Studies on evaluation mechanisms have shown that such evaluation sheets are not held in high esteem, despite their general use by trainers of many organizations and in institutions
of higher learning (Bramley 1996; Clegg, 1987; Love, 1991; Rae, 1986;). Clegg (1987) found that training evaluation was conducted for 75 percent of training programs done in
17
organizations. A study by Dawson (1993) found that Level 1 evaluation sheets were ubiquitous.
Level 2
Learning Evaluation
The learning level is concerned with measuring the learning principles, facts, techniques and skills presented in a program (Kirkpatrick, 1994). Tyler (2002) found that 32 percent of companies in America have carried out post-training evaluation on Level 2.
Another research conducted by Mathews, Ueno, Kekale, Repka, Pereira and Silva (2001) on 450 companies in UK, Portugal and Finland which focused on training quality and training evaluation showed that 40 percent of UK companies, 31 percent of Finland companies and 51 percent of Portugal companies conduct formal assessment on learning of the principles, facts, skills and attitudes which were specified as training objectives. This level evaluates the knowledge, skills development and attitudinal changes that have
taken place. Examination of both knowledge and attitudinal outcomes is important to increase coverage of training impacts because the pattern of change can vary between the pre-test and post-test (Basadur, Graen & Scandura, 1986; Kraiger, Ford & Salas, 1993). Researchers either assessed change before and after a program (Basadur et al., 1986; Bretz & Thompsett, 1992), or they look merely at the post-training attainment score
(Davis & Mount, 1984; Warr & Bunce, 1995). Measures of learning should be objective, with quantifiable indicators of how new requirements are understood and absorbed. This data is used to confirm that participant learning has occurred as a result of the training initiative (Phillips & Stone, 2002).
Level 3
Behavioural Evaluation
Job performance after training is referred to as behavioural by Kirkpatrick (1959, 1976)
and transfer by Alliger, Tannenbaum, Bennett, Traver and Shotland (1997). Level 3 evaluates the extent to which the "transfer" of knowledge, skills and attitudes has
18
occurred. Tyler (2002) reported that only 9 percent of America industries have carried out post training evaluation at this level. The focal point is on performance at work after
a program. It is essential to record before and after performance but sometimes selfreport are obtained if information are unavailable to an evaluator (Wexley & Baldwin, 1986). It determines the extent of change in behaviour that has taken place and how this behaviour would be transferred to the workplace. It further encourages one to take into account the possible factors in the job environment that could prevent the application of the newly learned knowledge and skills since a positive climate is important for transferring.
Level 4
Results Evaluation
The evaluation of a particular training program becomes more complex as one progress
through every level of Kirkpatrick model. Results can be defined as the final results that occurred because the participants attended the training program. This includes increased production, improved quality, increased sales and productivity, higher profits and return on investment. Level 4 evaluation observes changes in the performance criteria (i.e. key results area) of organizational effectiveness. This level anticipates the gains the organization can expect from a training event. This level of evaluation is made more difficult as organization often demand that the explanation be given in financial terms with measurable quantifiers (Redshaw, 2001).
For the past 30 years since Kirkpatrick's first idea was published in 1959, much debate had been recorded on this model. Despite criticism, Kirkpatrick model is still the most generally accepted by academics (Blanchard & Thacker, 1999; Dionne, 1996; Kirkpatrick, 1996a; 1996b; 1998; Phillips, 1991). However, research conducted in the United States has
suggested that US organizations generally have not adopted all of Kirkpatrick's 4-level evaluation (Geber, 1995; Holton, 1996). This is especially true for the last two, more
difficult, levels of Kirkpatrick's hierarchy (Geber, 1995). In a survey of training in the USA, Geber (1995) reported that for companies with 100 or more employees, only 62 percent
assessed behavioural change. Geber's (1995) results also indicated that only 47 percent of US companies assess the impact of training on organizational outcomes. This poses a good
19
research question about the model's methodology and it forms the basis for epistemological studies around the methodology.
Kirkpatrick's work has received a great deal of attention within the field of training evaluation (Alliger & Janek, 1989; Blanchard & Thacker, 1999; Campion & Campion, 1987; Connolly, 1988; Dionne, 1996; Geber, 1995; Hamblin, 1974; Holton, 1996; Kirkpatrick, 1959; 1960; 1976; 1979; 1994; 1996a; Newstrom, 1978; Phillips, 1991). His concept calls
for four levels of evaluation namely reaction, learning, behaviour and results. His four levels of training effectiveness stimulated a number of supportive and conflicting models of varying
levels of sophistication (Alliger & Janek, 1989; Campion & Campion, 1987). There are models and methods that incorporate financial analyses of training impact (Swanson &
Holton, 1999). However, Warr, Allan and Birdi (1999) conducted a longitudinal study of the first three levels of training evaluation. The study correlated the following: relationships between evaluation levels, individual and organizational predictors of each level and the
differential predictions of attainment vs change score. The study showed that immediate and delayed learning were predicted by the trainee's motivation, confidence and use of learning
strategies. The researchers highlighted that it is preferable to measure training outcomes in terms of change from pre-test to post-test, rather than merely through attainment (post-test) scores (Warr, Allan & Birdi, 1999).
A review of the most popular procedures used by US companies to evaluate their training programs showed that over half (52 percent) use assessments about participants' satisfaction
with the training. 17 percent assessed application of the trained skills to the job and 13 percent evaluated changes in organizational performance following the training. 5 percent tested for skill acquisition immediately after training while 13 percent of American companies carried out no systematic evaluation of their training programs (Mann &
Robertson, 1996). Many of these procedures reflect Kirkpatrick's four levels of reactions, learning, behaviour and results of which will be further discussed.
More than 50 evaluation models available use the framework of Kirkpatrick model (Phillips,
1991). Currently, majority of the employee training is evaluated at Level 1. Evaluation at Level 1 is associated with the terms smile sheet or happiness sheet, because reaction
information is usually obtained through a participatory questionnaire administered near the
end or at the end of a training program (Smith, 1990). The specific indication of the smile 20
sheet or happiness sheet is enjoyment of the training, perceptions of its usefulness and its perceived difficulty (Warr & Bunce, 1995).
Phillips and Stone (2002) enhanced the popularity of the Kirkpatrick model by inserting the fifth level into the existing 4-level model, though he further argued the inadequacy of this
model in capturing the return on investment aspect of the training outcome. Phillips and Stone's (2002) 5-level evaluation model was seen as an extension of Kirkpatrick's 4-level evaluation model as different companies have their own definition of pay offs to measure the
training results. Return on investment compares the training's monetary benefits with the cost of the training, so that the true value of the training to the organization can be assessed. Converting data to monetary values is the first phase in putting training initiatives on the
same level as other investments that organizations make (Phillips, 2002). It cannot be used to cover other variables that may affect the results (i.e. culture, productivity, etc). Kirkpatrick (1994) refuted this idea by claiming that there are many ways to measure training results. This raises the question whether training evaluation be varied only as a measure of financial benefits? Lewis and Thornhill (1994) are of the opinion that there should be 5 levels of evaluation measuring the training effects on the department (i.e. Level 4) and its effects on
the whole organization (i.e. Level 5). Lewis and Thornhill (1994) emphasized the need to look at the value and the organization cultures as the variables to measure training effectiveness.
In recent times others have tried to make the system easier to deal with. Warr et al. (1999) came up with the context, input, reaction and outcome (CIRO) evaluation system with the context part going someway towards front-loading the evaluation and partly towards
mirroring Kirkpatrick model. Dyer (1994) proposed an evaluation system that suits all organizations, irrespective of size or diversity of operation. It is a system that is relatively easy to come to terms with and can be implemented at all the hierarchical stages of an
organization. It fits the individual and it fits the whole organization. The system puts
Kirkpatrick's evaluation system against a mirror. The benefits of using Kirkpatrick's Mirror should be self-evident to anyone involved in management. Application of the paradigm allows the individual to become more business focused, and if adopted universally should provide efficient and effective training throughout any organization (Dyer, 1994).
21
A different model was used in a study by Shireman (1991) on the evaluation of a hospital based health education program. The study adopted the CIPP model in examining the type of evaluation which was being conducted in the hospital. A structured questionnaire was sent to a stratified random sample of 160 hospitals of four different sizes in four mid-western states. The result showed that 48 percent of the respondents reported that product evaluations were usually done and less than 25 percent reported that other types (i.e. context, input, process) of evaluations were done. The product evaluation is outcome-based and quite similar to Kirkpatrick's end process evaluation. Both types of evaluations require appropriate data collection activities.
Kirkpatrick model was used by most researchers as an initial framework of evaluation model generation. This paper addresses the methodological issues surrounding the taxonomy of
Kirkpatrick model as an area for epistemological study. The theoretical an empirical literature of Kirkpatrick model will be critically evaluated and further research opportunities will be outlined.
1.4 Critical Review Phillips (1991) concluded that out of more than 50 evaluation models available, the evaluation framework that most training practitioners used is the Kirkpatrick model. Though the model seemed to be weathered well, it has also limited our thinking on training evaluation and possibly hindered our ability to conduct meaningful training evaluation (Bernthal, 1995). More than ever, training evaluation must demonstrate improved performance and financial
results. But in reality, according to Garavaglia (1993), training evaluation often assessed whether the immediate objectives have been met; specifically, how many items were answered correctly on the post-test. Some based their evaluation only on trainee reaction; the first level of Kirkpatrick model developed in 1959 (Brinkerhoff, 1988). Such information gave organization no basis for making strategic business decisions (Davidove & Schroeder, 1992). Most practitioners are familiar with Kirkpatrick's 4-level evaluation model but many never seemed to get beyond Levels 1 and 2 (Regalbutto, 1992). Numerous organizations have adopted the model presented by Kirkpatrick to suit their own situations; the solution seems to cause the growth of generic models (Dyer, 1994).
22
Kirkpatrick called for a definite approach to the evaluation model. All 4 levels must be measured to ensure effectiveness of the whole evaluation system since each level provides different kinds of evidence.
This view was supported by Hamblin (1974), who suggested that reaction leads to learning and learning leads to change in behaviour, which subsequently leads to changes in the organization. He further stated that each level can be broken at any link and having positive
reaction is necessary to create positive learning. According to Bramley and Kitson (1994), there is not much evidence to support this linkage. Further research carried out by Alliger and Janek (1989) found only 12 articles which attempted to correlate the various levels
advocated by Kirkpatrick. Although there are problems in external validity with such a small data, the tentative conclusion was that there was no relationship between reaction and the
other three levels of evaluation criteria. A correlation study, which was run on these four levels of evaluation showed insignificant results. A literature search based on Kirkpatrick's name, yielded 55 articles but only 8 described evaluation results and none described
correlations between levels (Toplis, 1993). This concluded that good reactions did not predict learning, behaviour or results.
A series of industrial surveys conducted in the last 30 years show little application of all 4 levels of Kirkpatrick model. Surveys conducted since 1970 showed that most industrial trainers rely on student reaction, fewer on test learning and almost none on test application and benefit (Brandenburg, 1982; Plant & Ryan 1994; Raphael & Wagner, 1972). In the last 20 years, a number of writers claimed to have performed a full Kirkpatrick evaluation; however, the linkages described in connecting the training event with the outcome are subjective and tenuous (Salinger & Deming, 1982; Sauter 1980).
A survey conducted by the Bureau of National Affairs and American Society of Training and Development (ASTD) in 1969 using questionnaires indicated that most of the companies conducted Level 1 evaluation and unsystematic approaches to Level 2 evaluation (Raphael &
Wagner, 1972). The survey indicated that problems of evaluation at higher levels were mainly due to a lack of understanding of the approach used. Kirkpatrick model seems to offer a one-size fits all solution to measure training effectiveness. However, there has been little contribution and reliability of this model despite great industrial emphasis in this area. 23
Kirkpatrick model focuses mainly on immediate outcome rather than the process leading to
the results. The following questions were never successfully addressed. In fact the improvement of these processes is the main forces of effectiveness (Murk, Barrett & Atchade, 2000).
How well a person's motivation level affects the learning behaviour The degree of superiors' support after the training The extent to which training interventions was appropriate for meeting needs Longer-term effects of the training, the pay-off in determining a course's overall impact and cost-effectiveness The conduciveness of the training environment
An empirical study by Warr, Allan and Birdi (1999) showed that external processes like increasing confidence and motivation levels of trainees as well as use of certain learning
strategies are important contributing factors towards training effectiveness. A 2-day training course was studied on 23 occasions over a 7-month period in the Institute of Work Psychology, UK. Technicians who attended the training courses which involved operating electronic tools were asked to complete a knowledge test questionnaire on arrival and at the end of the course. A follow up questionnaire was mailed to the trainees one month later.
More than 70 percent of the respondents returned the questionnaire. The questionnaire was designed to capture what the researches defined as third factors (i.e. confidence, perception, motivation, learning strategies, age, etc). The results showed a non-significant correlation
between reactions towards the course and job behaviour. Perceptions of course difficulty were significantly negatively associated with frequency of use of equipment. Correlation between levels two and three evaluation were small. Learning scores and changes in those score - Level 2 were strongly predicted by trainee's specific reactions to the course, but those reactions were not significantly associated with later job behaviour - Level 3 (Warr, Allan & Birdi, 1999).
Alliger et al. (1989) carried out a meta-analysis of studies where reaction measures had been
related to measures of learning (11 studies) and changes in behaviour (9 studies). They found that positive reactions did not predict learning gains better than negative ones (the average
24
correlation between reactions and amount of learning was .02 nor were they any better at predicting changes in behaviour after the program was .07).
Bramley and Kitson (1994) asserted that measuring learning is problematic because designing a reliable measuring instrument is difficult and the necessary skills are often not available. Grove and Ostroff (1990) pointed out that training directors often do not possess the essential
skills to conduct training evaluation. This could be part of the reason why companies are reluctant to evaluate their training effectiveness.
Though Kirkpatrick's traditional assessment methods were widely used on Level 1 and 2
evaluations, the benefits of collecting data at each level are unclear. This uncertainty may result in organization failing to evaluate training completely or selecting forms of evaluation
that may not be reliable. Inadequacy in Kirkpatrick model on each level forces one to look for other possible measures. Therefore, one may argue that to make Kirkpatrick model definite, a more detailed assessment method must be conducted at each level to ensure practicality, validity and applicability (Mann & Robertson, 1996).
Mann and Robertson (1996) undertook to investigate the utility of various methods used in
evaluating training programs. Twenty-nine subjects were selected from a three-day training seminar for the European National Run in Geneva, Switzerland. The seminar was a computer training event (on e-mail and the Internet) for youth workers, and trainees were asked to complete training evaluation forms before and after the training program and by post one
month later. Sixteen people returned this final questionnaire. Each questionnaire contained three sets of questions designed to measure knowledge, attitudes and self-efficacy. The results showed doubt over the value of the data received from reaction and learning levels.
Recommendations were made based on the following findings:-
Measuring learning (Level 2) as a method of evaluating training effectiveness is
important. The study showed that not all of what is learned immediately after training is retained one month later. This denotes that the practitioner should be aware of the short-term training effectiveness.
25
To ensure a more realistic evaluation at Level 2, one must be prudent of the pre and
post course evaluation method proposed by Kirkpatrick. The time frame for learning to take place was never specified. An appropriate measuring model is necessary to determine the extent of learning has taken place. In another words, Kirkpatrick model lacks longitudinal considerations.
Measuring changes in learning through data collection as prescribed by Kirkpatrick (absolute term) gained no value in predicting how well a person can perform the skills attained from the training after a one-month period.
A positive attitude does not show any relevance on how well a person can perform a
trained task after a month. Reaction evaluation that shows positive attitude attained have no direct linkage to performance.
However, individual self-efficacy did not decrease over time. Empirical studies shown that self-efficacy correlates with actual performance (Kraiger, Ford & Salas
1993). One might look at the possibility of measuring self- efficacy instead of reaction evaluation. In another words, self-efficacy offers more tangible results as compared to reaction evaluation.
The reasons for Kirkpatrick failure in Level 3 and Level 4 evaluation was due to lack of a defined framework and specific tools that are appropriate for measuring transfer of learning since its first introduction 40 years ago. It is necessary, at the most basic level, to have a body of case studies from which the generalizations can be drawn and thus hypotheses formed. However, this body of information has not been published (Bramley & Kitson, 1994).
The issue here is whether or not the knowledge taught during training is being transferred or
demonstrated by the trainees on the job. The transfer component of training evaluation was examined by Olsen (1998) in a study conducted in 1996. Transfer is evidence of whether what has been learned is actually being used on the job for which it was intended.
The survey asked questions regarding how Kirkpatrick's 4-level evaluation were performed, what percentage of payroll was spent on training, how much training was actually transferred
26
to the job and what specific items would enhance the level of transfer. A content analysis was carried out on the 138 survey comments received on how the respondents made estimates of the percentage of transfer value they reported. Follow up interviews were also undertaken to provide additional clarification on responses and record impressions and opinions about
the data collection. The results showed that the percentage of transfer depended on the types of training. Technical training showed the best rate of transfer, soft skills (interpersonal) do not transfer as readily and are not easily observed. Transfer is not so readily apparent in the effective work areas (Olsen, 1998).
Bramley (1996) offered an explanation why evaluation is not being carried out at the
behaviour and result levels. Traditionally most trainers use individual and educational models of training process. The process has its limitation as emphasis is on encouraging individuals to learn something rather than to find uses (if any) for the learning.
1.5 Future Research Bramley and Kitson (1994) argued that the problems of evaluation at Levels 3 and 4 were not
well understood because not enough evaluation of this kind has been carried out. This is due to the fact that effective measurement methods for Levels 3 and 4 are not available and the amount of work in setting up the criteria for measuring these two levels is time consuming. It is apparent that the incompleteness of Kirkpatrick model lies in its Levels 3 and 4 of evaluation.
1.5.1 The Transfer Component The transfer component is a potential area for future research. Transfer of training can be defined as 'the application of knowledge, skills and attitudes learned from training on the job and subsequent maintenance of them over a certain period of time (Baldwin & Ford, 1988;
Xiao, 1996). This process does not appear to have received much attention since most organizations were apparently looking primarily at Levels 1 and 2 evaluations. Early studies lacked theoretical framework to guide these investigations (Baldwin & Ford, 1988).
27
A survey conducted by Cheng and Ho (1998) revealed that there were inconsistent findings
on the variables that promised positive training transfer. The main intention of further research is to develop common variables that are critical to different training and transfer situations, including the establishment of common scales or instruments that can be used in different research settings.
The current approach which uses variables such as individual ability, motivation and environmental favourability has shown a profound effect on training transfer research (Noe &
Schmidt, 1986). However, this approach raises the question of application. This is because individual differences (e.g. self efficacy and locus of control) are expected to extent considerable influence on transfer outcome (Cheng & Ho, 1998).
A longitudinal study would be a better way of measuring the effectiveness of transfer
learning. It is argued that trainees who show similar levels of transfer performance after a short period of training, may differ substantially in the long run (Kraiger & Ford, 1993). Therefore, another major aspect of transfer research is to examine the level of newly acquired knowledge, skills or behaviour retained in the transfer settings after a longer period of time. For example, research should record the changes in terms of levels of skill proficiency as a function of time after training.
1.5.2 Evaluating Beyond the 4 Levels In considering the above studies, an effective evaluation should measure beyond the aspect of
reaction, learning, behaviour and results. Lewis and Thornhill (1994) suggested that an effective training evaluation needs to be integrated and matched to the culture of the
organization. This integrated culturally related approach is advocated because it would be able to minimize the risk of not meeting the objectives of carrying out training at the input stages as well as evaluating reactions and impact at the outcome stage. This brings more strategic approaches in identifying and prioritizing training needs, in relation to organizational objectives.
28
To justify the training evaluation results, we may consider Brinkerhoff s (1987) criticism on
Kirkpatrick model, which only concentrates on the outcome of training. This is further supported by Bernthal (1995) who found necessary to look for a broader linkage between
training and the organization context. Bernthal (1995) introduced the training-impact tree method in measuring organization context. This is done by listing the barriers of training and the factors that facilitate training next to their associated values and practices which are aligned with the organization objectives.
Although Kirkpatrick model focuses on the attainment of tangible outcomes, it is important to note that the question of measuring intangible outcomes that are related to training
effectiveness must not be ignored. Kirkpatrick (1994) revisited his 4-level evaluation model and states that as long as the evidence collected is beyond a reasonable doubt, one should be satisfied with the evidence. Perhaps an experienced training practitioner may want to explore the possibility of interacting the absolute 4-level evaluation model with other process models. As a result of this, the gap that exists in short and long term measures of training evaluation
may be minimized. Future research may be built upon deriving the integrated model that would complement both absolute and process evaluation on training effectiveness.
1.5.3 Incorporating Competence-based Approach into Training Evaluation The aim of future research is to develop a comprehensive training evaluation by
incorporating the absolute Kirkpatrick model with the competence-based process. The competence-based assessment system could be used in collecting sufficient evidence to determine whether individuals are performing competently in their jobs.
Strebler, Robinson and Heron (1997) classified two different meanings of the term competency namely expressed as behaviours that an individual needs to perform a job and as minimum standards of performance. The term competency has been used to refer to the
meaning expressed as behaviours and performance standards. Competence-based assessment is helpful to provide a behaviourist framework for learning in training evaluation. A behaviourist approach to learning provides simpler tasks for the trainer and clarity of outcome for the learner (Hoffmann, 1999). Another definition of competencies is the quality
29
of outcome which may be used to evaluate gains in productivity or efficiency in the workplace as a result of training (Strebler et al., 1997).
Further research by Sternberg and Kolligian (1990) defined competency as the underlying
attributes of a person such as their knowledge, skills or abilities. The use of this definition created a focus on the required inputs of individual in order for them to produce competent
performances. This is aligned with the traditional training evaluation approach of measuring knowledge, skills and abilities of a person after training. Rowe (1995) suggested that competence-based assessment which looks at evaluating the whole process of learning should consist of:-
Objective:
The trainer should exhibit clear learning objectives and methods for obtaining those objectives.
Evidence:
Evidence must be provided to indicate competent performance.
Observation:
An assessor looks out for competent performance.
Peers'
Comments are obtained from work colleagues, peers.
Comments:
and customers.
The key point is that a competence-based model supplements knowledge-based achievements. Programs will be designed by permitting competence-based models to build
on knowledge-based achievement. In this way knowledge supports work, learning supports skill and theory supports practice (Rowe, 1995).
The competence-based method would be able to assess whether knowledge and skills learned are being effectively applied in the workplace and whether the trainee can now be described as competent after completion of a training program.
This integrated model could also be used prior to designing a training program in order to establish development needs and to determine training program content.
30
1.5.4 Multi-Rater Feedback System in Training Evaluation There does not appear to be a distinct individual who founded or invented this process and
according to Moses, Hollenbeck and Sorcher (1993), the term multi-rater feedback is misleading as it suggests a newly discovered concept, whereas they argue that perceptions of people have been available as long as there have been people to observe them.
Nowack (1993) presents a useful summary of some of the reasons for the increased use of multi-rater feedback in organizations:
The need for a cost-effective alternative to assessment centers; The increasing availability of assessment software capable of summarizing data from multiple sources into customized feedback reports; The need for continuous measurement of improvement efforts; The need for job-related feedback for employees affected by career plateauing; and The need to maximize employee potential in the face of technological change, competitive challenges and increased workforce diversity.
From the organizational perspective, multi-rater feedback can be used solely for
developmental purposes. Romano (1994) and Atwater et al. (1993) found that the most common use is in the area of training and development. The overall net effect of training and development should enhance organizational performance.
From the individual perspective, the feedback is invaluable because it comes from numerous sources, providing multiple perspectives and opinions. Each opinion and perspective may provide relevant yet different feedback (Atwater et. al, 1993; Hazucha et. al, 1993; Tornow, 1993). This form of feedback can increase the reliability, fairness and acceptance of the data by the person being rated (London, Wohlers & Gallagher, 1990). This occurs because the feedback is received from multiple sources and not just from one ratee.
One of the advantages of using multi-rater feedback is that it provides the opportunity for individuals who are being assessed to compare their self perceptions against the perceptions of others regarding their behaviour (Rosti & Shipper, 1998).
31
The difference in perspective between the rater and the ratee is not treated as an error but is a
source of information which can enhance personal learning. Ratees can learn from the discrepancy between self rating and the rating of others.
The use of multi-rater feedback provides a natural method for both enhancing learning of the
participants and improving the evaluation process. Feedback is seen as a critical element in affecting change (Bennis, Benne & Chin, 1969). Multi-rater feedback could be used to serve
as an unfreezing process in Lewin's (1948) model of change. This would enhance the ratee's learning by creating doubts on the ratee's current performance standard and provides an
opportunity for prospective development. Most training evaluation models emphasize the absolute outcome of training. However, multi-rater feedback involves the change process where the resultant behaviour involved reinforcement of past performance and also provides
an opening for future learning. Thus, collecting multi-rater feedback before and after training will enhance learning and provide at least part of the data needed to evaluate training.
Moses et al. (1993) provides the following criticism of multi-rater feedback:
It relies on generalized traits as there is a limited or non-existent frame of reference for making rater/observer judgments.
It is based on an individual's memory, which can often be incomplete descriptions of past performance.
The observer may be unable to interpret behaviours It relies on the instrument designers' scoring system, factor analysis or data collection methods to interpret the information for the participant.
The main argument of Moses et al. (1993) is that multi-rater feedback is based on other people's observations and that such observations are often incomplete descriptions of past
performance because the observer does not know what to look for. The unresolved issue is what behaviours to study. Multi-rater feedback has been taken to identify the behaviour of effective management. There is lack of sufficient definitional detail to study managerial proficiency or the effectiveness of training (Morrison & McCall, 1978; Schriesheim & Kerr, 1977). Yulk (1994) argued that further refinement of these constructs is needed by identifying
32
specific skills which make up each construct. Hence, development of construct and its validity is important prior to training.
Multi-rater feedback has been found to be widely used in managerial and leadership development programs (Cacioppe R., 1998; Cacioppe & Albrecht, 2000; Garavan, Morley &
Flynn, 1997; McCauley & Moxley, 1996; Thach, E.C., 2002). However, its usage in other fields needs further research and exploration. This is further supported by Rosti and Shipper (1998) in their study on the impact of training in a management development program based on multi-rater feedback.
1.6 Conclusion It is widely acknowledged that the Kirkpatrick evaluation model has been providing the most basic thoughts on training evaluation throughout this decade. However, there seems to be incomplete application of Kirkpatrick's 4-level evaluation model being carried out by the
industries. No significant success has been identified from the use of 4-level evaluation model by the majority of organizations that have conducted training evaluations.
Based on this literature review, it may be concluded that Kirkpatrick model has not reached a stage of clarity for in-depth training evaluation to be carried out. His model would provide training managers with the idea of what is training evaluation on a systematic approach however the aspect of training measurement method was not well explored or detailed.
While training has been conceptualized as a continually evolving process, the existing literature appears to have failed to provide adequate strategies for organizations wanting to evaluate the immediate, as well as the long-term, effectiveness and value of their training efforts.
At face value, the literature shows that the full Kirkpatrick evaluation strategy is being widely applied; however, more detailed analysis found that none were able to demonstrate Level 4 evaluation and of those who claimed evaluation at Levels 2 or 3, none were able to demonstrate a systematic approach to the problem.
33
Arguably the dilemma in adopting the Kirkpatrick's taxonomy as a comprehensive and integrated approach to evaluation lies in both the qualitative and quantitative attempts that may or may not provide good phenomenological studies. Further analysis of the method shows considerable confusion as to what is, or is not, a valid indicator for evaluation. Clearly, there has been little change in terms of level of confidence towards the reliability of training evaluation, notwithstanding greater emphasis on this key organizational development process.
The weaknesses of Kirkpatrick model have brought opportunity for future research in incorporating competencies and multi-rater feedback approach into the long-term evaluation of training.
These weaknesses have also opened up opportunities for further research in the transfer learning especially in the studies of its longitudinal and application effect.
1.7 References for Paper One Alliger, G.M. & Janek, E.A. 1989, 'Kirkpatrick's levels of training criteria: thirty years later', Personnel Psychology, vol. 42, pp. 331-342. Alliger, G.M., Tannenbaum, S.I., Bennett, W., Traver, H. & Shotland, A. 1997, 'A metaanalysis of the relations among training criteria', Personnel Psychology, vol. 50, pp. 341-358. Atwater, L., Roush, P. & Fishthal, A. 1993, The Impact of Upward Feedback on Self and Follower Ratings of Leaders, Centre for Creative Leadership, New York. Baldwin, T.T. & Ford, J.K. 1988, 'Transfer of training: a review and directions for future research', Personnel Psychology, vol. 41, pp. 63-105. Basadur, M., Graen, G.B. & Scandura, T.A. 1986, 'Training effects on attitudes toward divergent thinking among manufacturing engineers', Journal of Applied Psychology, vol. 71, pp. 612-617. Bennis, W.G., Benne, K.D. & Chin, R.1969, The Planning of Change, 2nd edn, Holt, Rinehart & Winston, New York.
Bernthal, P.R. 1995, 'Education that goes the distance', Training and Development, vol. 49, no. 9, pp. 41.
34
Blanchard, P.N. & Thacker, J.W. 1999, Effective Training, Systems, Strategies and Practices, Prentice Hall Publisher, New Jersey. Blanchard, P.N., Thacker, J.W. & Way, S.A. 2000, 'Training evaluation: perspectives and evidence from Canada', International Journal of Training and Development, vol. 4, no.4, pp. 295-303. Boulmetis, J. & Dutwin, P. 2000, The ABCs of Evaluation: Timeless Techniques for Program and Project Managers, Jossey-Bass Publisher, San Francisco.
Boyle, P.G. & Jahns, I. 1970, 'Program development and evaluation' in Handbook of adult education, eds Smith, R.M., Aker, G.F. & Kidd, J.E., Macmillan Company, New York, pp. 70.
Bramley, P. & Kitson, B. 1994, 'Evaluating training against business criteria', Journal of European Industrial Training, vol. 18, no.1, pp. 10-14. Bramley, P. 1996, Evaluating Training Effectiveness, McGraw-Hill, Maidenhead and New York.
Brandenburg, D. 1982, 'Training evaluation: what is the current status?' Training and Development Journal, pp. 14-19. Bretz, R.D. & Thompsett, R.E. 1992, 'Comparing traditional and integrative learning methods in organizational training programs', Journal of Applied Psychology, vol. 77, pp. 941-951. Brinkerhoff, R. 0. 1987, Achieving results from training, Jossey-Bass Publisher, San Francisco.
Brinkerhoff, R.O. 1988, 'An integral evaluation model for human resource development', Training and Development Journal, vol. 42, no. 2, pp. 66-68. Brown, K.G., Werner, M.N., Johnson, L.A. & Dunne, J.T. 1999, Formative evaluation in Industrial/Organization Psychology: further attempts to broaden training evaluation, presented at a symposium on training evaluation: advances and new directions for research and practice, Society of Industrial and Organizational Psychology, Atlanta. Cacioppe, R. 1998, 'An integrated model and approach for the design of effective leadership development programs', Leadership and Organization Development Journal, vol. 19, no. 1, pp. 44-53. Cacioppe, R. & Albrecht, S. 2000, 'Using 360-degree feedback and the integral model to develop leadership and management skills', Leadership and Organization Development Journal, vol. 21, no. 8, pp. 390-404. Campbell, J.P. 1988, Training Design for Performance Improvement, in Productivity in Organizations, eds Campbell, J.P. & Campbell, R.J., Jossey-Bass Publisher, San Francisco.
35
Cascio, W.F. 1989, Using utility analysis to assess training outcomes, in Training and Development in Organizations, ed. I.L. Goldstein, Jossey-Bass, San Francisco. Cervero, R.M. 1988, Effective Continuing Education for Professionals, Jossey-Bass Publisher, San Francisco. Campion, M.A. & Campion, J.E. 1987, 'Evaluation of an interview skills training program in a natural field setting', Personnel Psychology, vol. 40, no. 4, pp. 675-91. Chen, H.T. & Rossi, P.H. 1992, Using Theory to Improve Program and Policy Evaluations, Greenwood Press, Westport, CT. Cheng, E. & Ho, D. 1998, 'The effects of some attitudinal and organizational factors on transfer outcome', Journal of Managerial Psychology, vol. 13, no. 5/6, pp. 309-317.
Clegg, W.H. 1987, 'Management training evaluation: an update', Training and Development Journal, vol. 41, no. 2, pp. 65-71. Connolly, M.S. 1988, 'Integrating evaluation, design and implementation', Training and Development Journal, vol. 42, no. 2, pp.20-23. Constable, J. & McCormick, R. 1987, The Making of British Managers, BIM, CBI, London.
Davidove, A.E. & Schroeder, P.A. 1992, 'Demonstrating ROI of training' Training and Development Journal, vol. 46, no. 8, pp. 70-71. Davis, B.L. & Mount, M.K. 1984, 'Effectiveness of performance appraisal training using computer assisted instruction and behaviour modeling', Personnel Psychology, vol. 37, pp. 439-452. Dawson, R.P. 1993, Model of evaluations of equal opportunities training in local government with special reference to women, unpublished PhD thesis, South Bank University, London. Dionne, P. 1996, 'The evaluation of training activities: a complex issue involving different stakes', Human Resource Development Quarterly, vol. 7, pp. 279-86.
Dyer, S. 1994, `Kirkpatrick's mirror', Journal of European Industrial Training, vol. 18, no. 5, pp 31-32. Eisner, E.W. 1997, The Enlightened Eye: Qualitative Inquiry and the Enhancement of Educational Practice, 2nd edn., Merrill, New York. Garavaglia, L.P. 1993, 'How to ensure transfer of training', Training & Development Journal, vol. 47, no. 10, pp. 63-68. Garavan, T.N., Morley, M. & Flynn, M. 1997, '360-degree feedback: its role in employee development', Journal of Management Development, vol. 16, no.2, pp. 134-147.
36
Geber, B. 1995, 'Does your training make a difference? Prove it!', Training and Development Journal, vol. 3, pp. 27-34. Goldstein, L.I. 1986, Training in Organizations: Needs Assessment, Development and Education, Cole Publishing Company, California. Goldstein, L.I. & Ford, J.K. 2002, Training in Organizations: Needs Assessment, Development and Evaluation, Thomson Learning, Wadsworth, Canada. Grove, E.A. & Ostroff, C. 1990, Program evaluation, in Developing Human Resources, eds Wexley, K. & Hinnicks, J., BNA Books, Washington D.C. Hamblin, A.C. 1974, Evaluation and Control of Training, McGraw-Hill Publisher, New York. Hazucha, J.F., Hezlett, S.A. & Schneider, R.J. 1993, 'The impact of 360-degree feedback on management skills development', Human Resource Management, vol. 32, pp. 325351.
HMSO 1989, Training in Britain: A Study of Funding, Activity and Attitudes, Her Majesty's Stationery Office, London.
Hoffmann, T. 1999, 'The meanings of competency', Journal of European Industrial Training, vol. 23, no. 6, pp. 275-285. Holton, E.F. III 1996, 'The flawed four-level evaluation model', Human Resource Development Quarterly, vol. 7, pp. 5-21.
Junaidah, H. 2001, 'Training evaluation: clients' roles', Journal of European Industrial Training, vol. 25, no. 7, pp. 374-379. Kirkpatrick, D.L. 1959a, 'Techniques for evaluating training programs: part 1 - reaction', Journal of American Society for Training and Developing, vol. 13, pp. 3-9. Kirkpatrick, D.L. 1959b, 'Techniques for evaluating training programs: part 2 - learning', Journal of American Society for Training and Developing, vol. 13, no. 12, pp. 21-26. Kirkpatrick, D.L. 1960a, 'Techniques for evaluating training programs: part 3- behaviour', Journal of American Society for Training and Developing, vol. 14, no. 1, pp. 13-18. Kirkpatrick, D.L. 1960b, 'Techniques for evaluating training programs: part 4 - results', Journal of American Society for Training and Developing, vol. 14, no. 2, pp. 28-32. Kirkpatrick, D.L. 1976, Evaluation of Training, Training and Development Handbook: A guide to human resource development, 2nd edn, Craig, R.L.O., McGraw-Hill Publisher, New York.
Kirkpatrick, D.L. 1979, 'Techniques for evaluating training programs', Training and Development Journal, vol. 33, pp. 78-92.
37
Kirkpatrick, D.L. 1994, Evaluating Training Programs: The Four Levels, Berrett-Koehler Publishers, San Francisco. Kirkpatrick, D.L. 1996a, 'Great ideas revisited', Training and Development Journal, vol. January, pp. 54-59. Kirkpatrick, D.L. 1996b, 'Invited reaction: reaction to Holton article', Human Resource Development Quarterly, vol. 7, pp. 23-24.
Kirkpatrick, D.L. 1998, Evaluating Training Programs: The Four Levels, BerrettKoehler Publishers, San Francisco.
Kraiger, K., Ford, J.K. & Salas, E. 1993, 'Application of cognitive, skill-based and affective theories of learning outcomes to new methods of training evaluations', Journal of Applied Psychology, vol. 78, no. 2, pp. 311-328.
Legge, K. 1984, Evaluating Planned Organizational Change, Academic Press, London. Lewin, K. 1948, Resolving social conflicts, Harper & Bros Publishers, New York, NY.
Lewis, P. & Thornhill, A. 1994, 'The evaluation of training an organizational culture approach', Journal of European Industrial Training, vol. 18, no. 8, pp. 25-32.
London, M., Wholers, A.J. & Gallagher, P. 1990, '360-degree feedback surveys: a source of feedback to guide management development', Journal of Management Development, vol. 9, pp. 17-31. Love, A.J. 1991, Internal Evaluation: Building Organizations From Within, Sage Publication, California, CA. Madaus, G.F., Scriven, M.S. & Stufflebeam, D.L. 1986, Evaluation Models: Viewpoints on Educational and Human Services Evaluation, Kluwer-Nijhoff Publishing, Boston.
Mann, S. & Robertson, I. T. 1996, 'What should training evaluation evaluate?' Journal of European Industrial Training, vol. 20, no. 9, pp. 14-20. Mathieu, J.E. & Leonard, R.L. Jr. 1987, 'Applying utility concepts to a training program in supervisory skills: a time-based approach', Academy of Management Journal, vol. 30, pp. 316-335. Mathews, B.P., Ueno, A., Kekale, T., Repka, M., Pereira, Z.L. & Silva, G. 2001, 'Quality training: needs and evaluation-findings from a European survey, Total Quality Management, vol. 12, no. 4, pp. 483-490. McCauley, C.D. & Moxley, R.S. Jr. 1996, Developmental 360: How Feedback Can Make Managers More Effective, Jossey-Bass Publisher, San Francisco. Morrison, A.M. & McCall, J.D. 1978, Feedback to Managers: A Comprehensive Review of Twenty-four Instruments, Centre for Creative Leadership, Greensboro, NC.
38
Morrow, C.C., Jarrett, M.Q. & Rupinski, M.T. 1997, 'An investigation of the effect and economic utility of corporate-wide training', Personnel Psychology, vol. 50, pp. 91119.
Moses, J., Hollenbeck, G.P. & Sorcher, M. 1993, 'Other people's expectations', Human Resource Management, vol. 32, Summer Fall. Murk, P., Barrett, A. & Atchade, P. 2000, 'Diagnostic techniques for training and education: strategies for marketing and economic development', Journal of Workplace Learning, vol. 12, no. 7, pp. 296-306.
Noe, R.A. & Schmitt, N. 1986, 'The influence of trainee attitudes on training effectiveness: test of a model', Personnel Psychology, vol. 39, pp. 497-523. Noe, R.A. 2000, Employee Training and Development, McGraw-Hill Publisher, New York.
Nowack, K. 1993, '360-degree feedback: the whole story', Training and Development Journal, vol. 47, no. 1, pp. 69-73.
Newstrom, J.W. 1978, 'The problem of incomplete evaluation of training', Training and Development Journal, vol. 32, no. 11, pp. 22-24. O'Leary, V.E. 1972, 'The Hawthorne effect in reverse: effects of training and practice on individual and group performance', Journal of Applied Psychology, vol. 56, pp. 491494. Olsen, J. H. Jr. 1998, 'The evaluation and enhancement of training transfer', International Journal of Training and Development, vol. 2, no. 1, pp. 61-75. Parlette, M. & Hamilton, D. 1977, 'Evaluation as a new approach to the study of innovative programmes', in Beyond the Numbers Game, eds Hamilton, D. et al., Macmillan, London. Phillips, J.J. 1991, Handbook of Training Evaluation and Measurement Methods, Gulf Publishing Company, Houston, TX. Phillips, J.J. 2002, Return on Investment in Training and Performance Improvement Programs, 2nd edn, Butterworth-Heinemann, Woburn, MA. Phillips, J.J. & Stone, R.D. 2002, How to Measure Training Results, A Practical Guide to Tracking the Six Key Indicators, McGraw-Hill Publisher, New York.
Plant, R.A. & Ryan, R.J.1994, 'Who is evaluating training?', Journal of European Industrial Training, vol. 18, no. 5, pp. 27-30. Popham, W. J. 1974, Evaluation in Education: Current Applications, Berkeley, McCutchan, California. Porter, L., & McKibbin, L. 1988, Future of Management Education and Development Drift Or Thrust Into the 21' Century?, McGraw-Hill Publisher, New York.
39
Provus, M. 1971, Discrepancy Evaluation, Berkeley, McCutchan, California. Rae, L. 1986, How to Measure Training Effectiveness, Gower Publications, Aldershot, London.
Raphael, M. & Wagner, E. 1972, 'Training surveys surveyed', Training and Development Journal, vol. 26, pp. 10-14. Redshaw, B. 2001, 'Evaluating organizational effectiveness', Measuring Business Excellence, vol. 5, no. 1, pp. 16-18. Regalbutto, G.A. 1992, 'Targeting the bottom line', Training and Development Journal, vol. 46, no. 4, pp. 29-32. Rivlin, A.M. 1971, Systematic Thinking for Social Action, Brookings Institution, Washington.
Romano, C. 1994, 'Conquering the fear of feedback', Human Resource Focus, vol. 71, no. 3. Rossi, P.H. & Freeman, H.E. 1993, Evaluation.. A Systematic Approach, 5th edn, Sage Publication, California.
Rosti, R.T. Jr. & Shipper, F. 1998, 'A study of the impact of training in a management development program based on 360 feedback', Journal of Managerial Psychology, vol. 13, no.1/2, pp. 77-89. Rowe, C. 1995, 'Incorporating competence into the long term evaluation of training and development', Industrial Commercial Training, vol. 27, no.2, pp. 3-9.
Salinger, R. & Deming, R. 1982, 'Practical strategies for evaluating education', Training and Development Journal, vol. 4, pp. 20-29.
Sauter, J. 1980, 'Purchasing public sector executive development', Training and Development Journal, vol. 34, no. 4, pp. 92-98. Schriesheim, C.A. & Kerr, S. 1977, 'Theories and measurement of leadership: a critical appraisal of present and future directions', in Leadership: The Cutting Edge, eds Hunt, J.G. & Larson L.L., Southern Illinois University Press, Carbondale, IL. Scriven, M. 1991, Evaluation Thesaurus, Sage Publication, Newbury Park, California. Shadish, W. R. & Epstein, R. 1987, 'Patterns of program evaluation practice among members of the evaluation research society and evaluation network', Evaluation Review, vol. 11, no. 5, pp. 555-590. Shadish, W.R. & Reichardt, C.S. 1987, 'Evaluation studies', Evaluation Review, vol. 12, pp. 13-30.
40
Shireman, J.A.R. 1991, Utilization of program evaluation for decision making regarding hospital based patient/client focused health education programs, doctoral dissertation, University of Iowa, dissertation abstracts international, 52/12A, AA C9212928.
Smith, A.J. 1990, 'Evaluation of management training subjectivity and the individual', Journal of European Individual Training, vol. 14, no. 1, pp. 12-15. Stake, R. 1977, 'Responsive evaluation', in Beyond the Number Game, eds Hamilton, D., Jenkins, D., King, C., MacDonald, B. & Parlett, H.M., Macmillan, London.
Steel, S. 1970, 'Program evaluation: a broader definition', Journal of Extension, vol. 13, pp. 13-20.
Sternberg, R. & Kolligian, J. Jr. 1990, Competence Considered, Yale University Press, New Heaven, CT. Strebler, M., Robinson, D. & Heron, P. 1997, 'Getting the best out of your competencies', Institute of Employment Studies, University of Sussex, Brighton. Stufflebeam, D.L. 1971, Education Evaluation: Decision Making, by the PDK national study committee on education, Itasca, III: F.E. Peacock Publisher Inc, Boston.
Stufflebeam, D.L. 1983, 'The CIPP model for program evaluation', in Evaluation Models, eds Madaus, G.F., Scriven, M.S. & Stufflebeam, D.L., Kluwer-Nijhoff Publishing, Boston, pp. 117-141. Stufflebeam, D.L. & Shrinkfield, J.A. 1985, Systematic evaluation, Kluwer Nijhoff Publishing, Boston. Swanson, R.A. & Holton, E.F. 1999, Results: How to Assess Performance, Learning And Perceptions in Organizations, Berrett-Koehler Publishers, San Francisco.
Tesoro, F. 1998, 'Implementing an ROI measurement process at Dell Computer', Performance Improvement Quarterly, vol. 11, pp. 103-114. Thach, E.C. 2002, 'The impact of executive coaching and 360-feedback on leadership effectiveness', Leadership and Organization Development Journal, vol. 23, no. 4, pp. 205-214.
Toplis, J. 1993, 'Training evaluation reflections on the first steps', European Work Organization Psychology, vol. 2, no. 2, pp. 146-152. Tornow, W.W. 1993, 'Perceptions or reality, is multiple-perspective measurement a means or an end?', Human Resource Management, vol. 32. no. 2 & 3, pp. 209-408.
Tyler, R.W. 1949, Basic Principle of Curriculum and Instruction, University of Chicago Press, Chicago. Tyler, R.W. 2002, 'Evaluating evaluations', Human Resource Magazine, vol. June, pp. 8593.
41
Warr, P. & Bunce, K. 1995, 'Employee age and voluntary development activity', International Journal of Training and Development, vol. 2, pp. 190-204. Warr, P., Allan, C. & Birdi, K. 1999, 'Predicting three levels of training outcome', Journal of Occupational and Organizational Psychology, vol. 72, pp. 351-375. Wexley, K.N. & Baldwin, T.T. 1986, 'Post-training strategies for facilitating positive transfer: an empirical exploration', Personnel Psychology, vol. 29, pp. 503-520.
Wholey, J.S., Hatry, H.P. & Newcomer, K.E. 1994, Handbook of Practical Program Evaluation, Jossey-Bass Publisher, San Francisco. Xiao, J. 1996, 'The relationship between organizational factors and the transfer of training in the electronics industry in Shenzhen, China', Human Resource Development Quarterly, vol. 7, no. 1, pp. 55-73. Yulk, G.A. 1994, Leadership in Organizations, 2nd edn, Englewood Cliffs, Prentice Hall Publisher, New Jersey.
42
Research Paper 2
EVALUATING TRAINING EFFECTIVENESS: AN EMPIRICAL STUDY OF KIRKPATRICK MODEL OF EVALUATION IN THE MALAYSIAN TRAINING ENVIRONMENT FOR THE MANUFACTURING SECTOR
Lim Guan Chong Master of Business Administration (Finance) University of Hull
International Graduate School of Management University of South Australia
43
Evaluating Training Effectiveness: An Empirical Study of Kirkpatrick Model Of Evaluation in the Malaysian Training Environment for the Manufacturing Sector Lim Guan Chong International Graduate School of Management University of South Australia
2.1 Abstract This research adopted an empirical approach to track the history, rationale, objectives and the implementation of training evaluation initiatives in Malaysia's manufacturing sector. Since the establishment of the Human Resource Development Fund, training activities in Malaysia have increased. The majority of Malaysian organizations that conduct training are doubtful about how training activities could add value to the organization performance and justify their training investment. This research provides an understanding of training evaluation culture within the Malaysian manufacturing sector and the effectiveness of this Kirkpatrick's 4-level evaluation model as applied to the Malaysian manufacturing sector.
2.2 Introduction The Malaysian government is committed towards education, training and human resource development. The government recognizes the importance of human resource development in its quest for achieving a fully developed nation status. This commitment has translated into the establishment and growth of the training practice in the country.
Being the sole provider of training previously, the government has adopted the policy of involving private enterprises in all aspects of training. Training needs have become crucial
and vital to the development of capital-intensive and value added industries. Apart from 44
involving enterprise to make training more market-driven, there is a need for enterprise to
share the burden of training. In the Seventh Malaysia Plan, the private sector was expected to play a more active role in upgrading the qualification and skill of its workers (Junaidah, 2001).
2.3 Training Practices in Malaysia Training activities within Malaysian companies are behind countries like Singapore, Japan
and Korea. Training activities in Malaysia are mainly conducted by large multinational companies. The International Labour Organization's study in 1997 showed that Malaysia is in the 12th position in terms of providing in-company training (Junaidah, 2001).
The Malaysian government passed a new Act of Parliament entitled Human Resources Development Act in 1992, to encourage and stimulate the private sector to introduce training
and development for its employees (HRDC, 1992). The objective of this Act is to set aside accumulated funds to promote training activities within the organization. Under this Act, companies with more than 50 employees will have to contribute 1 percent of their total staff's monthly salary to the Ministry of Human Resources through the Human Resources Development Council (HRDC). The fund is known as the Human Resources Development
Fund (HRDF), was launched in January 1993. The government set up the HRDC to manage this fund by identifying the systematic training needs and approving relevant training programs required by organizations. The levy is partially refunded under special schemes known as Training Aid Scheme and Approved Training Program (ATP) Scheme to the
respective organizations once the training program is completed. The policy lays down the parameters for a Human Resource oriented development strategy that is designed to mobilize national effort to increase technological capabilities and competitiveness as well as create
highly skilled, productive, disciplined and efficient workforce. This strategy would aid Malaysia's transition into an industrialized economy. Private sector companies are also expected to enhance their training activities by utilizing the HRDF and participating in skill
development programs run by the state governments (MEPU, 1996). Since the establishment of the HRDC, how has the Malaysian manufacturing sector gained from the training
conducted? With information on how training benefit organizations, it would help the
45
Malaysian government to chart the progress and expected time frame needed for Malaysia to transform into an industrialized economy.
The need to develop a highly trained workforce is evident from the increase of more than 200 management consulting and training institutions, professional associations and management
schools operating in Malaysia (Arthur Anderson & Co, 1991). The number of employees who return to formal education and training has increased consistently since 1972 (Ahmad, 1998). The government set up the National Institute of Public Administration Malaysia (INTAN) which is responsible for training government employees in administration and management (Junaidah, 2001).
There are some real difficulties in assessing the full extent of skill development for government training in Malaysia even after conducting evaluation (Mirza & Juhary, 1995).
Firstly, much of skill development takes place in the private sectors. Most skills even those involving advanced manual skills are acquired on the job. Secondly, skill development during employment tends to be demand-driven (Pillai, 1994). Workers gain experience on the job and upgrade their skills when they are exposed to a higher skill level. A study by Pillai and Othman (1994) showed that the budget for training and education in Malaysia has increased by 40 percent. Company emphasis has been on improving the quality of training to help develop competent labour force that improves the competitiveness of the industrial
sector in Malaysian. This new demand will force employers to further develop employee competencies. Saiyadain (1995) found that as many as 82.6 percent of organizations sponsored their managers for training, and on average these organizations spent 4.65 percent
of the managerial payroll on training managers. This shows that the number of knowledge workers and new knowledge-based opportunities is expected to increase dramatically in the next few years.
2.4 The Practice of Evaluation in Training Although the methodology of evaluating training effectiveness may look fair, it could make it
difficult to express rational criticism. A survey by Wagel (1977) found that 75 percent of companies have no formal method for evaluating training effectiveness. In a subsequent
46
survey by Easterby-Smith (1985), the result showed that out of 15 organizations with 320 300,000 employees, only one conducted some form of evaluation on a regular basis which was a post-course questionnaire. According to Rowe (1992), although every training manual
gives lip service to evaluation, it is notoriously difficult to carry out effectively. The extensive survey by Plant and Ryan (1994) served to further underline the lack of widespread sophistication in evaluation. They point to budget cutting and economies pressures as being
possible explanations. A recent study by Blanchard, Thacker and Way (2000) on 202 organizations in Canada reported that more than half of the organizations are not comprehensively evaluating their training.
According to Carnevale and Schulz (1990), the American Society for Training and Development (ASTD) research indicated that the most popular reasons for evaluation are to gather information to help decision makers improve the training process and facilitate
participants' job performance. This explains why the outcome-based Kirkpatrick model is so popularly used. Evaluation also helps measure the degree of improvement in application and assesses how well the learner achieves the established goals (Attkinsson, Sorenson, Hargreaves & Hororwitz, 1978).
For the past 30 years the Kirkpatrick model had been considered the most prominent training evaluation model (Bernthal, 1995). Phillips (1991) concluded that, out of more than 50 evaluation models available, the evaluation framework that most training practitioners use is
the Kirkpatrick model. It is easy to find firms that practice training evaluation. However, most firms only conduct post course evaluation using Kirkpatrick's Level 1 evaluation.
Another important purpose for training evaluation is to meet the accountability requirements of funding groups or clients (Rossi & Freeman, 1993). The demand for accountability has
been the major impetus for program evaluation since 1980s. Fiscal constraints have increased the competition of companies' activities for available dollars and raised the question of value for money from their activities (Ruthman & Mowbray, 1983).
Training evaluation is more than a set of empirical methods governed solely by the standards
of social science. Judgments on the quality of program evaluation must also be based on criteria that are meaningful both to immediate users and the larger system in which the program is embedded (Corday & Lipsey, 1986). 47
Phillips (1991) stated that when it comes to training evaluation, there still appears to be more talk than action. In many organizations, training evaluation is either ignored or approached in an unsystematic manner. Previous literature (Davidove & Schroeder, 1992; Shelton & Alliger, 1993; Smith, 1990) demonstrated that training evaluation is unsystematic and based on simple means. Gutek (1988) stated that there was little or no demand on the part of the
organization to seriously evaluate a training program. Most organizations evaluate their training programs by emphasizing one or more levels of Kirkpatrick model (Chen & Rossi, 1992). The researchers, however, commented that evaluation knowledge found in the literature is not being fully utilized in evaluation practices.
Admittedly it is difficult to completely ascertain a training program's effectiveness. What works at a particular time at a particular training location with a group of participants may not necessarily work as well when transferred to another time, setting and group (Junaidah, 2001).
Bramley and Kitson (1994) asserted that measuring learning is problematic because it is
difficult to design a reliable measuring instrument. There are also few people who possess the necessary skills to evaluate training however these skills are often not available. Grove and Ostroff (1990) mentioned that training directors often do not possess the necessary skills
to conduct training evaluation. However, Bramley (1996) mentioned that the lack of training evaluation skills could be due to the methodological weakness embedded within the Kirkpatrick model of evaluation.
In addition to the unavailability of a reliable measuring instrument, Barron (1996) commented that why management does not demand evaluation because the management believes that training will be reflected in an employee's work performance. The research by Smith and Piper (1990) supported this view and showed that trainers openly said, "We do just what we are asked to do
deliver training. We do not do what we are not asked to do
improve human performance in the workplace". Smith and Piper (1990) also mentioned this as one of the reasons for providing training but not evaluation. The research found that their clients did not request for an evaluation. This could be the reason why training providers do not evaluate their products.
48
A research by the ASTD in 1990 showed that most companies now conduct some form of
evaluation of their training programs. Practitioners tend to use different methodology and approaches. In examining evaluation methods in business-education partnerships, Erickson (1991) found that there is little standardization in the methodology. Shadish and Epstein (1987) conducted a study to look at program evaluations among members of the Evaluation
Research Society and Evaluation Network. They found that practitioners had different methodologies as well as different assumptions about evaluation. In their study, three patterns of practices emerged from the evaluation practices which they labeled the academic pattern, decision-driven pattern and the outcome pattern.
Heneman and Schurab (1986) stated that the evaluation of training programs is considered
different compared to the theory and models in the literature. Many authors commented that once participants leave the training setting, program providers seldom attempt to determine the effect of their program. Indeed, the word evaluation raises all sorts of emotional defense reactions. Such response indicates a low level of commitment among training professionals toward evaluation. Most of the time, the practices are informal, unsystematic and based on
one popular model. However in the study by Junaidah (2001) on Malaysian training evaluation practices, it was found that evaluation was moderately formal, comprehensive and systematic but could be further improved. Nevertheless, it is uncertain whether this so-called comprehensive approach to training evaluation is within the taxonomy of the Kirkpatrick framework. Currently, there is little literature on the evaluation system within the Malaysian context.
2.5 Training Evaluation Practices in Malaysia Validation of training effectiveness and benefits of training and development programs have
gained importance in public and private sectors in Malaysia. The Malaysian government places great emphasis on program evaluation and appointed two federal agencies to be responsible for evaluation. They are the National Institute of Evaluation and the Evaluation
Unit at the Prime Minister's Department. This unit is responsible for evaluating special
governmental projects and programs (Maimunah, 1990). Another evaluating body is the Publication and Consultancy Bureau which carries out evaluation for government training.
49
There are three types of evaluation process currently being practiced in the agency. The formal training evaluation uses standard evaluation questionnaires and oral evaluation in the form of informal discussions, while the informal evaluation conducted during training (Junaidah, 2001).
The reasons why Malaysian organizations do not evaluate training may lie in the inability to develop relevant measuring tools or the difficulty in determining which performance outcomes are attributed to training.
The rise in the awareness of training evaluation during the Malaysian economic downturn in 1997 has increased the pressure for organizations to justify the investment cost placed on
training (Junaidah, 2001). Organizations realized that training must be a worthwhile effort and this raises the need for measuring training effectiveness. Evaluating training
effectiveness does not seem to be the culture of most organizations in Malaysia. Thousands of training programs have been conducted in Malaysia since the rise of HRDF, (Mirza &
Juhary, 1995). However, effectiveness in terms of productivity, skills improvement, increase in performance standards and return on investment is still unknown. Training should be evaluated to learn the weaknesses of the training program. The selection criteria for evaluation should be able to find out the improvement in the participants' work performance.
The need for greater quality management during the economic downturn forced Malaysian companies to upgrade their current version of International Standard Organization (ISO) to ISO 9001:2000 which emphasized on documenting the training evaluation process. Companies that pursued this latest version of ISO are required to justify their training efforts and money spent by linking skill development with the quality philosophy of the company. As organizations pursue the latest version of ISO, evaluating training ranks high among top management as a means of justifying training investment (Junaidah, 2001). The opportunity cost of foregoing training commitment has become extremely high. More than ever, training
evaluation must demonstrate improved performance and financial results. As the investment spent on training is costly, it is understandable why top managers wish to see value for money and demand justification for training cost. Training providers need to show clients that they are getting good returns on their investment in training. This demand for accountability had been the major impetus for training in the past few years (Junaidah, 2001).
50
Most organizations in Malaysia have sufficient training facilities. Most managers are sponsored to attend training programs on production, general management and human
resources management for an average of 2 days (Mirza & Juhary, 1995). On average organizations spend 4.65 percent of the managerial payroll on training (Saiyadain, 1995). The measurement of training effectiveness varies from organization to organization. A few
organizations have developed systematic plans to follow up on training. The top management's attitude towards training has been identified as a critical factor in effective
operationalization of training (Mirza & Juhary, 1995). In organizations where the top and middle management have been perceived to be supportive, training seems to have contributed
to the overall growth. But how far the evaluation process has been conducted to prove the growth is still questionable. In order to improve the overall effectiveness of training, all organizations should undertake training evaluation effectively. As mentioned by Brinkerhoff (1988), training needs to adopt evaluations and measuring systems that can improve the
feedback mechanism in order to build their response capacity. A system of pre course evaluation followed by post course evaluation may help in setting relevant expectations for improvement.
A serious gap in the Malaysian training context is the insufficient information on the number,
nature and content of training facilities in the country. The skill-level at which the output would fit into the labour market is not known while the syllabus, duration and quality of
training vary from one agency to another. This is due to the lack of collaboration and consultation between industry and training institution. The quality of training is not up to the mark. Trainees have theoretical knowledge but little practical experience (Pillai, 1994).
There has been limited study on training evaluation practices in Malaysia. A training evaluation research by Shamsuddin (1995) was on the contextual factors associated with evaluation practices of selected adult and continuing education providers in Malaysia. According to him even though the management directed an evaluation to be conducted, it was
only for a narrow purpose. It was used to demonstrate program success by showing how good was the training and how many people received the training which is merely Level 1
evaluation. The wider purpose of program evaluation such as measuring the acquired learning (Level 2), program impact (Level 3) and cost effectiveness (Level 4) was not the
management priority. According to Shamsuddin (1995), the clients were not aggressive stakeholders who cared and demanded accountability from the training providers. Their 51
behaviour and characteristics did not push the training provider to examine the real effect of the programs in terms of learning gain and program effectiveness.
Besides Shamsuddin's (1995) study, four other studies conducted locally included the
element of evaluation practice. The first study by Hamid, Mohd, Muhamad and Ismail (1987) asked 235 organizations if management education in Malaysia significantly provides
candidates with a set of skills. Organizations found that 67.6 percent of management programs offered by local universities and colleges are too theoretical. Out of 121 respondents, 60.3 percent indicated that training is important while the rest felt the contrary. This study focused on reaction evaluation (Level 1) to study the participants' satisfaction level towards the overall programs.
Another study conducted by Asma (1994) examined the
design of training practices of four training providers in Malaysia and found that the evaluation practiced by the trainers do not conform to any theory and most of the evaluations used were ad hoc and informal.
Mirza and Juhary (1995) conducted a study on local and multinational organizations and found that in the majority of these organizations even if managers who return from training may write a report, no formal systematic mechanism exists to assess how well they are
utilizing their training in the organizations. The research further found that participants were only encouraged to apply learning at work but do not take the effort to find out what caused
the change. The result indicates that the behaviour towards measuring training effectiveness is not popularly practiced. Organizations feel that if learning does not take place, it would show in the next appraisal report. Participants who have learned something should have
applied it and therefore not necessary to track changes in performance.
Mirza and Juhary's (1995) study also revealed that most organizations in Malaysia evaluate training effectiveness on a superficial level. Some encourage their managers to try out new ideas while others do not show the same kind of support. Unfortunately for most companies, measuring training effectiveness may not be practiced organization wide. This is because
measuring training effectiveness has never been a policy in most organizations. Lack of support by most department heads is deterring most organizations from carrying out post-
training evaluation. Most organizations felt that if they had a more supportive top management they could have established systems for measuring training effectiveness.
52
The most recent study was by Junaidah (2001) on training evaluation practices by training
institutions in Malaysia. The study showed moderately formal training evaluation practices by Malaysian training practitioners. However, the researcher was uncertain whether these training practitioners applied the taxonomy of Kirkpatrick model in training evaluation practices.
Generally, training evaluation practices in Malaysia are either not done or if done, do not
follow any theory suggested in the literature. There is a paucity of detailed evidence of direct causal links between investment in training and the resultant return in the form of increased
performance. Brandenburg (1982) suggested that part of the reason training practitioners tended not to conduct evaluation or if they did, they relied heavily on soft information
evaluation methods and did not disseminate the results widely. Pauzi (1985) felt that part of the problem lies in the attitude of the top management who do not show full commitment to the evaluation process.
A further study is needed to study current training evaluation practices in Malaysia and to understand updates of this practice. It is important to understand training effectiveness in
Malaysia as it is worthwhile to analyze the training evaluation process which has undergone
in the country. This study would contribute to the existing body of knowledge as current information on training evaluation is inadequate. Since a large number of professional associations, private consultants and management schools in universities are organizing training programs in Malaysia, the results of the study would indicate areas where training evaluation could be practiced for different training programs.
2.6 Methodology of Study Most recent surveys of training and evaluation practices in Malaysia were conducted by Hamid et al. (1987), Asma (1994), Mirza and Juhary (1995), Shamsuddin (1995) and Junaidah (2001). The dearth of published materials on training and development activities of managers in Malaysia has prompted this study.
53
This explorative study was conducted to understand the evaluation culture and the
extensiveness of training evaluation practices in Malaysia. The lack of baseline information prevented the evaluation of transfer learning. This prompted the use of empirical approach in
this study. The study evaluates the perceptual effects on both management and nonmanagement levels of training programs in the manufacturing sector. This survey asked the level of training evaluation performed, the percentage of payroll spent on training, the
impediments to training and the percentage of training transferred to the job. Follow up interviews were also undertaken to provide additional clarification and interpretation on responses and enabled impressions and opinions about the data to be recorded accurately.
2.6.1 Questionnaire Construction A comprehensive survey of the literature was done to find out the degree of training evaluation being conducted by training practitioners in Malaysia. The survey questions asked the degree that training evaluation practices were conducted in Malaysia based on Kirkpatrick's 4-level of evaluation (Kirkpatrick, 1959a, 1959b, 1960a, 1960b, 1976, 1979). Examples of questions are:Reaction
How did the participants react to the training?
Learning
What information and skills were gained?
Behavior
How have participants transferred knowledge and skills to their jobs?
Results
What effect has training had on the organization and achievement of its objectives?
The instrument was designed primarily based on the published work of Blanchard, Thacker
and Way (2000) with modification based on the Malaysian training environment. The modifications from Blanchard et al. questionnaire include rephrasing and simplifying
question structure to suit local linguistic understanding. Words which were ambiguous or misunderstood were replaced. These modifications were applied in order to encourage a more
accurate response. Care was taken to ensure that simple and clear questions were used to
54
seek information on significant areas of training evaluation activity in Malaysia. The questionnaire can be found in Table 4.
The questionnaire is made of 34 questions. There are 8 questions in Level 1, 5 in Level 2, 13
in Level 3 and 8 in Level 4. Level 3 was constructed with the most questions as it asked about practices for measuring transfer learning. Practitioners could use a variety of assessment to measure transfer learning hence the survey questions require detailed practices undertaken by practitioners.
The questions in the questionnaires were randomly sorted to avoid biasness caused by the
order of the questions. The survey questions used a 5-point Liken scale to permit good scale discrimination.
A panel of experts which consisted of training professionals from the Malaysia Institute of
Management was used to evaluate the items in the questionnaire. Extensive pilot testing was undertaken by the training professionals to ensure that the questions were easily understood.
The internal consistency was determined using the Cronbach alpha method. The Cronbach alpha coefficient is 0.8458.
2.6.2 The Sample and Sampling To improve the effectiveness and efficiency in terms of time and resources, a purposeful
sampling technique was employed. The sample was manufacturing based companies found in the HRDC Directory. The HRDC Directory listed approximately 5000 organizations but only 40 percent from the listing are manufacturing based companies. The questionnaires were sent to 2000 manufacturing based companies with more than 50 employees. The questionnaires were posted between December 2003 and January 2004. The questionnaires were addressed to the Personnel and Human Resources Managers of the organizations. A self-addressed stamped envelope was enclosed to maintain anonymity on the return of the completed questionnaires through the postal service.
55
2.6.3 Questionnaire Response The questionnaires were posted to 2000 of manufacturing organizations in Malaysia found in
the HRDC Directory. The appeal highlighted the focus of the study, i.e. training evaluation activities that relate to the benefits of training.
Of the 2000 questionnaires posted 94 were returned with a note that the organizations were closed down or had moved to a new address. This reduces the original samples of 2000 to 1906. Reminder notes were sent out three weeks after the first posting in order to encourage
greater response rate. However there were only 109 completed questionnaires returned. The overall lack of organizational response can be attributed to a variety of causes: low interest, lack of time to respond, current restructuring of the organization, unavailable contact person, and outdated addresses.
2.7 Findings and Discussion Data was analysed using SPSS for XP Windows (Version 13). Statistical significance was accepted at the 0.05 level of confidence. A total of 5.5 percent of the questionnaires were returned. Part 1 of the questionnaire gathered information on the background of the companies. It was found that out of the 109 companies, 46 percent are multinational companies while 54 percent are Malaysian companies. Part 2 of the questionnaire gathered information on the organization's commitment to training. The results are shown in Table 1.
56
Table 1. Commitment to Training Commitment to Training
Statistics (n =109)
Does your organization conduct training programs for employee development Does your organization conduct training needs analysis before conducting any training programs
What type of training is conducted by your organization Management e.g. Leadership, supervisory, managing change, communication, human relations and interpersonal skills
Organization Specific e.g. training programs related to policies, values, cultures, goals and objectives of the whole organization
Yes No
= 100 percent = 0 percent
Yes = 41.3 percent No = 58.7 percent Multinational = 39 Malaysian companies = 6
45.9 percent
18.9 percent
Technical e.g. quality, productivity, product training, IT training, accounting system and job related training
64.3 percent
Personal Improvement e.g. motivation, time management, self development, managing self, presentation skills and business communication skills
24.2 percent
Others
0 percent
A total of 41.3 percent of organizations agreed that a training needs analysis was conducted
prior to conducting any training program. The rest of the organizations conduct training to meet the needs of the organization such as low productivity or a morale problem, reaction to
a crisis and frequently not coordinated with other functions of the organization. The lack of baseline information prevents evaluation and no meaningful comparison of the participant's performance before and after training can occur.
The results indicate that 64.3 percent of organizations organized technical training. A large number of organizations felt the need to upgrade the technical competence of their employees in the areas of quality, productivity, product training, IT training, accounting system and job
related training. Of all the organizations interviewed, 65 percent reported that they have
57
extended their range of products during the last two years and 88 percent had made changes to machinery and equipment.
Management training was ranked the second at 45.9 percent followed by personal
development at 24.2 percent. One fifth of the organizations are also concerned with management training. Many feel that skills such as leadership, supervision, managing change, communication, human relations and interpersonal skills are needed for management
development. Although organization specific training is an emerging area, only about 18.9 percent of the organizations feel the need to impart training in this field.
Table 2 shows the level of evaluation conducted on management and non-management training by the organization.
Table 2. Training Evaluation Practices in Organization Training Evaluation Practices in Organization
Statistics (n =109)
Level 1
reaction evaluation
35 percent
Level 2
learning evaluation
25 percent
Level 3
behavioural evaluation
Level 4
results evaluation
16.5 percent 11 percent
No training evaluation practices
12.5 percent
The results indicate that out of 109 companies, 35 percent of organizations conducted Level 1 evaluation by measuring the participant's reactions towards the training program while 25 percent of the organizations conducted Level 2 evaluation by measuring the participant's degree of learning as the result of the training initiatives.
Only 16.5 percent of organizations
conducted Level 3 evaluation by measuring the changes in the participant's behaviour
towards the job after each training program. However, 11 percent of organizations quantified the results of training and calculated its return on investment in training which is
classified as level 4 evaluation. The remaining 12.5 percent of organizations have never conducted training evaluation after each training program. The results indicate that more
58
than half of the organizations do not evaluate their training at the behavioural or the results
levels. The reason for this is that sometimes training function is seen as an isolated and peripheral function, which is not truly integrated into the job setting (Olsen, 1998).
The means and standard deviations of the four levels of training evaluation for all 109 companies are shown in Table 3.
Table 3: Means and Standard Deviations of the Four Levels of Training Evaluation
Level
Mean + SD
Level 1
3.63 ±0.62
Level 2
3.41 + 0.62
Level 3
3.26 + 0.63
Level 4
2.99 + 0.68
Note: Likert scale: where 5 = strongly agree; 4 = agree; 3 = neutral; 2 = disagree; 1 = strongly disagree
The majority of the organizations agree that they conduct Level 1 evaluation after each
training program. The average for Level 1 evaluation is 3.63 which suggest that the majority of organizations conduct Level 1 evaluation. The average score for Level 2 evaluation is 3.41 which indicate that some companies conduct Level 2 evaluation selectively and the
majority is done on technical training. The average for Level 3 evaluation is 3.26. The result indicates that the degree of measuring behavioral changes in the job after training is not that
popular among these manufacturing organizations. This could be due to the unavailability of specific tools to measure the subjective changes in behavior. The average score for level 4 evaluation is 2.99 indicating that the majority of these manufacturing organizations do not
conduct result evaluation. The result was further confirmed by an interview which mentioned that the benefits of training are not easily measured in quantitative terms and most benefits cannot be measured immediately.
The means and standard deviations of the 34 questions in the instrument for all 109 companies are shown in Table 4.
59
Table 4: Means and Standard Deviations of 34 Questions in the Instrument Mean Score
L evel 1
-
React ion Evaluation
SD
Departmental heads conducted collective opinions from participants with regards to the training program conducted 2. Evaluate perceptions of participants on key benefits and value arising from training
4.12 3.03
0.689 0.934
3. Conduct training environmental audit to track participants
4.03
0.724
satisfaction after training 4. Focus on perception of trainees towards the training program.
4.38
0.862
5. Measure trainers competency and credibility after each 2.74 training program 6. Most training programs conduct post course reaction 3.89 evaluation after training. 7. Always make an effort to ask participants whether they enjoy 4.20
0.908
1.
0.715 0.815
attending the training programs
8. Measure the accuracy of the training program in addressing 2.67
1.021
the exact requirement of the job
Level 2 -
1.
Learning Evaluation
Allow participants to write down what they have learned
3.69
0.641
Conduct pen and paper test for measuring the amount of 4.28
0.703
which might be useful for their work 2.
knowledge gained from a training program 3.
Administer a test before and after training with regards to the knowledge gained from a training program
3.41
0.912
4.
Identify the principles, facts and techniques learned by
2.98
1.090
participants
Level 3 -
Behavioral Evaluation
5.
Participants were asked if there were any barriers preventing them from using what they have learned
2.69
0.932
1.
0.909
2.
Develop performance-based tests as part of the training 2.89 evaluation Assess the level of transfer of learning to the job 3.04
3.
Measure the success rate of participants performing each item 3.23
0.089
0.994
learned 4.
Define an action plan for participants and
evaluate the 3.43
0.745
Identify specific skill improvement as a result of a training 3.93
1.079
implementation success rate 5.
program
positive changes effectiveness after training
6.
Measure
and
3.77
0.931
7.
Measure the behavior changes resulting from the training
3.51
1.099
Organize the trainer's follow up session to track the 3.28 participant's behavioral change after training 9. Use observation techniques to monitor changes of behavior 2.62 and attitudes resulting from the training program 10. Conduct work performance evaluation in the workplace after 2.71 training
1.141
in
personnel
efficiency
program 8.
1.062
0.703
11. Observing and documenting the practice of knowledge and skills learned by the trainee into the workplace.
3.32
0.773
12. Assess the increase in knowledge and skills as well as attitude change of trainees
2.84
0.842
13. Conduct a preview session with your trainee to specify the
3.79
0.952
expected objectives to achieve from the training
60
Level 4 -
1.
Results
Evaluation
Measure the level of productivity before and after a training
program 2. Link effectiveness of training to financial benefit
2.56
0.721
2.91
0.668
3.10
0.711
3.
Conduct cost-benefit analysis on training programs conducted
4.
Measuring the worthiness of attending training in cost and time away from work
terms of 3.35
0.823
5.
Measure the tangible cost in terms of reduced cost and 2.82
0.913
improved quality after training 6. Calculate the cost of training and its impact towards 2.71 organization improvements 7. Compare the cost of training program with benefits obtained 3.24 from it 8. Finding evidence of direct links between training investment 3.18 and returns from training
0.793
0.894 0.615
Note: Likert scale: where 5 = strongly agree; 4 = agree; 3 = neutral; 2 = disagree; 1 = strongly disagree
The results indicate that Level 1 evaluation (reaction) seems to be the most significant
training evaluation practice. A high mean score of 4.38 indicates that the majority of Malaysian manufacturing companies focus on the perception of trainees towards the training
program. Managers do play an active role in conducting Level 1 evaluation by collecting
opinions from participants with regards to the training program conducted. Measuring the accuracy of a training program in addressing the exact requirement of the job is the least practiced and is indicated by a low mean score of 2.67.
The practice of pre and post pen and paper test after a training program is most popularly practiced by these manufacturing companies and is shown in the mean score of 4.28. The lowest mean score for Level 2 evaluation was 2.69 indicates that organizations seldom ask participants if there were any barriers which prevented them from using what they have learned.
Level 3 evaluation is modestly practiced by manufacturing companies in Malaysia. The highest mean score of 3.93 indicates that the majority of these manufacturing companies
identified specific skill improvement as a result of a training program. The use of observation techniques to monitor changes of attitude and behaviour as a result of the training program shows the lowest mean score of 2.62.
The apparent lack of practice in Level 4 evaluation (result) is probably due to the effort and
potential complexities involved which entails much more work. This is reflected in the
61
survey result which indicates low interest in conducting cost-benefit analysis of training by
these organizations. Measuring the worthiness of attending training in terms of cost and time away from work showed a mean score of 3.35. This is regarded as one of the most popular practice of Level 4 evaluation by these organizations. Calculating the costs of training and its impact towards organization improvements showed the lowest mean score of 2.71. Independent t-tests were used to test for significant difference in the four levels of training evaluation conducted by multinational and Malaysian companies. It was found that there were significant differences between training evaluation at Level 1, Level 2, Level 3 and Level 4 between multinational companies (N=50) and Malaysian companies (N=59) at p < 0.05. See Table 5.
Table 5: Summary oft-tests of the four levels of training for multinational companies and Malaysian companies
Company
Level 1 (Mean + SD)
Level 2 (Mean + SD)
Level 3 (Mean + SD)
Multinational
3.78 + 0.52
3.69 + 0.56
3.63 + 0.48
3.50 + 0.53
Malaysian
3.49 + 0.67
3.17 + 0.57
2.94 + 0.56
2.56 + 0.46
2.635 *
4.758 *
6.794 *
9.838 *
t-value
Level 4 (Mean + SD)
*p
View more...
Comments