Skip to content
Big Think+

How to get the most out of a training evaluation

Here are five things to know before conducting a training evaluation.
Training Evaluation
Credit: Elena Poritskaya; shushan1974, vadarshop / Adobe Stock

Especially in times of economic uncertainty and constrained resources, executive leadership expects learning programs to support and advance key business objectives. 

A training evaluation can not only attest to the value of investing in the L&D function, but provide a basis for continuous improvement of programs. Evaluation also helps establish accountability for training outcomes and identifies skill gaps.

5 things to know before conducting a training evaluation

It’s critical to be able to demonstrate the effectiveness of learning programs, but there are several challenges that learning and development professionals face when measuring their cultural impact and monetary benefits. Here are five key considerations to get the most out of a training evaluation.

1. Understand the difference between formative and summative evaluations 

Formative and summative evaluations are aimed primarily at determining the extent to which training outcomes reflect mastery of learning objectives. Formative evaluation occurs during reviews and pilot testing of program materials under development, while summative evaluation occurs post-implementation and typically is repeated after every program delivery.

One key difference between the two is that formative evaluation tends to be based largely on learner performance on exercises, quizzes, and other “checkpoints” built into the training. Summative evaluation typically relies on discrete data collection instruments designed specifically for post-training evaluation. In either case, the overarching goal is to provide the most effective and efficient learning experiences possible, and to do so in alignment with organizational priorities.

2. Assess the various options for data collection

A training evaluation is only as good as the data it yields. Ideally, multiple data collection methods are employed in both formative and summative evaluations to provide the most nuanced view of training outcomes. 

For example, suppose that learner performance on an end-of-course exam administered during a pilot test does not meet expectations. The formative evaluation plan might include one-on-one interviews or focus groups to find out why. And those additional data collection methods might target not only learners but other stakeholders as well, such as team leaders. 

Ideally, multiple data collection methods are employed to provide the most nuanced view of training outcomes. 

Questionnaires, interviews, focus groups, and direct observation have long been used to collect data for training evaluation, but these methods aren’t always foolproof. They all have certain advantages and drawbacks. Understanding what the challenges are and employing techniques to counter them is critical to obtaining reliable, actionable data.


Questionnaires or surveys may be used immediately upon completion of training and later on, once learners have a better sense of what they gained from the experience. This method is helpful for obtaining learners’ opinions about their training experience, but opinions are highly subjective. 

Questionnaires can also reflect the implicit bias of their developers, usually in the form of leading questions. Making sure that survey questions don’t nudge respondents in a particular direction is the best way to increase the objectivity of survey data. Avoid absolutes (words like “every” or “never”) and provide a balanced range of positive and negative response choices.


Individual interviews conducted in-person or virtually are particularly helpful for following up with learners to explore their responses on a questionnaire. However, individual interviews can put learners on the spot because they don’t offer the anonymity of questionnaire-based surveys. 

Learners may not be completely candid, but a skilled interviewer knows how to put people at ease and encourage them to speak freely. That means not breaking stretches of silence with filler questions and not interrupting with follow-up questions while an interviewee is still speaking. Done well, individual interviews can be very informative.

Focus groups

Focus groups bring learners who have participated in the same intervention together with a facilitator who solicits in-depth feedback. They’re commonly used in formative evaluations. 

The data obtained in a focus group may be inaccurate if individuals conform their opinions to those of others in the group. Focus groups are also highly dependent on the skill of the facilitator, but a good facilitator is able to manage the group dynamics and keep the focus on obtaining the information needed.

Direct observation

Direct observation most often occurs in the weeks or months after training to determine whether learners are applying newly acquired skills and displaying new behaviors on the job. These observations may be inaccurate if performance changes because people know they’re being watched, so observers should be sure to explain that it’s the training that’s being evaluated, not the learner. 

The 3 success factors of top leadership development programs
Closing the leadership gap how to build a pipeline of leaders.

3. Ask the right questions

The ideal questions for an evaluation will be learner-centric and depend on the purpose of the evaluation. Typically, training evaluation questions are aimed at assessing:

  • The extent to which employees have achieved learning objectives or developed certain skills 
  • The quality of the learning experience from the employee’s perspective
  • The impact of the training on employee satisfaction and productivity
  • The cultural or financial impact of the training at an organizational level

In many cases, evaluations will require a mix of quantitative and qualitative data, with closed questions to collect the former and open-ended questions for the latter. Both types of data are important for the continuous improvement of training programs. 

Closed questions are essentially binary in nature, such as:

  • Did the training meet your expectations?
  • Which do you think was more helpful, the role play or the case study?
  • Would you recommend this program to your colleagues?

Even when a range of choices is presented, as with a Likert scale (e.g. usually, sometimes, seldom, hardly ever), responses to closed questions are easily tallied and quantified. 

Open-ended questions encourage the learner to speak freely. For example:

  • In what way(s) did the program not meet your expectations?
  • Why do you think the role play was more helpful than the case study?
  • Why wouldn’t you recommend the program to others?

These responses can be categorized and subjected to qualitative analysis to gain deeper insight.

4. Use a training evaluation model

Below is a list of several popular training evaluation models, starting with the most common. Whether or not they’re followed to the letter, it’s important to choose one or develop a unique evaluation process during the initial design of a training program. Concurrent planning of training and an evaluation strategy is widely regarded as the best approach.

The Kirkpatrick Model of Training Evaluation

Organizations around the world utilize Kirkpatrick’s model, created in the 1950s by Donald Kirkpatrick – author of “Evaluating Training Programs,” “Transferring Learning to Behavior,” and “Implementing the Four Levels.”  The model outlines four levels of training evaluation. 

Level one, “Reaction,” is an assessment of the learner’s immediate response upon completion of a training program, often through a simple questionnaire. While such data is sometimes discounted as a “smile test,” it’s helpful to know how learners view the training experience, in part because they’re likely to share their opinions with others.

Level two, “Learning,” involves using quizzes, tests, and other assessments to determine what has been learned from participating in the training. Unless pre-testing was done, it won’t be possible to assess the extent of change resulting from the training, only the extent to which the program’s objectives have been mastered.

Level three, “Behavior,” assesses behavioral change, ideally through observation of on-the-job performance, though many level three assessments rely on self-reporting by the learner. 

Level four, “Results,” looks beyond the experiences of individual learners to assess aggregate improvements in productivity, efficiency, and other measures of training success. The intent is to evaluate the extent to which stakeholder and organizational expectations have been met. Many training evaluations falter at this level because devising appropriate, quantifiable measures of improvement can be difficult. 

The impact of training on business performance can only be measured after some time has passed. It involves the use of measures such as:

  • Productivity/output 
  • Employee turnover 
  • Customer satisfaction and retention 
  • Sales volume

One effective way to support level four evaluation is to make it the starting point for training design. In consultation with stakeholders, identify key metrics related to the desired impact on business performance (e.g. a 10% increase in quarterly sales volume), and design the training to meet those targets.

The Phillips ROI Model

The Phillips model, developed by Dr. Jack L. Phillips – author of “Return on Investment in Training and Performance Improvement Programs” and “Measuring ROI In Action” – begins with collecting baseline data prior to training. Data is also collected post-training to allow for a detailed comparison and the isolation of performance improvements attributable to the training. 

After eliminating external factors that may have contributed to improvements, the improvements resulting from the training program specifically are quantified and expressed in monetary terms. Deducting the overall program costs yields the monetary value of those improvements, or the return on investment for the training. 

The CIRO Model

Peter Warr, Michael Bird, and Neil Rackham – the authors of “Evaluation of Management Training” – developed the CIRO model specifically for evaluating management training programs. 

Stage one of this model, “Context,” is undertaken in conjunction with a training needs analysis. The CIRO model classifies training needs as ultimate, intermediate, or immediate objectives of the program:

  • The ultimate training need is the elimination of specific performance gaps and deficiencies.
  • Intermediate training needs are the behavioral changes needed to reach that ultimate objective.
  • Immediate training needs are the skill and knowledge gaps that must be closed to bring about those behavioral changes.

Stage two, “Input,” is analogous to the design stage in the creation of a new training program. It involves analyzing the ultimate, intermediate, and immediate training needs and organizational resources with the goal of identifying the best instructional strategies to employ.

Stage three, “Reaction,” is the post-training data collection stage. The final stage, “Outcome,” evaluates post-training data to determine learning outcomes on any (or all) of four levels – learner, workplace, team/department, and organization – depending on the purpose of the evaluation.

5. Take advantage of digital tools 

There are many digital services that make data collection and evaluation easier than they’ve ever been. For example, learning management systems offer useful features such as:

  • Easily accessible analytics dashboards
  • Built-in tools for real-time data tracking, such as training program completions
  • The ability to produce and export custom reports 
  • Customizable survey templates to collect learner feedback

Qualtrics is one commonly used tool for the digital analysis of evaluation data. It’s favored for its use of artificial intelligence to detect patterns in data, including the use of natural language processing to analyze text. The tool also offers automatically generated reports with data visualizations so that key takeaways can be more easily communicated back to stakeholders.  

Final note

​​There are typically multiple opportunities for self-checks and quizzes throughout a learning program that collect data on learner engagement and program quality. But bear in mind that while measurement is necessary for evaluation, it is not, in and of itself, evaluation. 

Training evaluation requires comparing measurements of outcomes to established standards. Ultimately, the question is not, “Has performance improved as a result of instruction?” It is, “Has performance improved as much as expected?” and moving forward, “How can we bring about even greater improvement?”

Join the #1 community of L&D professionals

Sign up to receive new research and insights every Tuesday.