Practical application of evaluation theory and design
The UK’s HMT’s Magenta book sets out excellent advice on evaluation practice and the sorts of approaches that you can use. This short paper focuses on the lessons from our own evaluation experience. These complement and add to those in the guidance.
Types of evaluation
Evaluation is all about assessing the efficiency, effectiveness, and economy of programmes, polices or organisations – ie Do they run smoothly? Are they doing what they were intended to? And were they worthwhile? HMT guidance distinguishes between three different types of evaluation which broadly speaking address each of these three questions:
- Process – Has the policy or intervention been implemented well and can and if improvements be made? For example do stakeholders find it easy to use/access/take advantage of the facilities being offered?
- Impact – What has actually happened as a result and has this been caused by the policy or intervention or something else going on in the environment?
- Economic – Was the pain worth the gain and did the benefits outweigh the costs?
In practice most evaluations cover all three but to varying degrees as the emphasis might be different depending on what you want to know. Evaluations can be very useful in providing early evidence on whether the policy is working well, the outcomes are achievable and if there is any need for fine tuning the approach. The focus here would be on process but you might also wish to obtain any early indications of impact and economic factors as well.
To keep this paper short, we have focused this paper on our experiences of impact evaluation. We currently plan to provide some further short papers on the others and also on how to get an organisation to embed evaluation in its continuous improvement approach in due course.
Assessing impact requires attributing outcomes to an intervention rather than to other factors in the environment or contiguous with the intervention. There are three main approaches – Experimental designs, Theory and Outcome based:
Using experimental/quasi experimental design
Experimental designs using randomised allocation provide the most robust means to determine if the policy works and makes a difference. Using a random sample is a key requirement in statistical analysis as it tends to remove any bias. This is used commonly in medical tests for example. The beneficiary of the new drug can be compared with people taking a placebo and the differences measured with some degree of confidence.
However, experimental design has a limited application in large interventions with long term outcomes, because of the impracticality of controlling a range of factors over an extended period of time and the ethical considerations of withholding a beneficial intervention from a section of the population.
We have used it during pilots or phased interventions. As it is often difficult to achieve fully random allocation, stakeholders have accepted quasi-experimental designs using matched groups and time-series analysis of administrative data, (Frost 2001). They are also cost-effective to administer.
Two developments suggest experimental designs could have wider applicability in the evaluation of interventions:
- When experimentation is integral to the improvement process, evidence can be produced directly by the intervention rather than a separate evaluation. Many private corporations now run large numbers of small experiments to improve their own and their suppliers’ development, delivery and business processes (Davenport 2009).
- Small behavioural stimuli (‘nudges’) can guide individuals’ and group behaviour. These can be delivered in the short term and are suited to experimental designs to produce the evidence necessary for robust evaluation (John 2011).
Using Theory-based evaluation
Despite the robustness of experimental design, its use remains limited. We have used ‘Theory of Change’ approach to evaluate major programmes (Mason 2012, Mason 2004, Frost 2008) supported by triangulation of quantitative and qualitative data.
While HMT guidance provides criteria on what level of investment stakeholders should make in this form of evaluation, the level of investment is just as likely to relate to stakeholders’ beliefs about the outcomes versus the strength of countervailing voices and rarely are learning objectives alone sufficient to justify the investment to stakeholders.
As the guidance describes, this approach relies on developing a ‘logic model’ into a ‘change model’ that links the required outcomes to the activities and outputs of the policy/programme/intervention.
We usually turn this into a strategy map as each box often supports more than one of the subsequent ones. For example, the number of supervisors trained in performance management (an Output – box 3) could have several outcomes (box4) such as improved moral and better staff engagement both of which would contribute to higher organisational performance – our required impact (box 5).
We have also found the utility of the model increases dramatically if developed iteratively throughout the intervention. Not only is the quality of the model enhanced by the learning from each phase of development to deployment, by making it explicit that the model will be developed iteratively, reinforces perceptions that evaluation design is part of the overall process – ie it keeps focus on the outcomes and impact and what we are here to do.
There are other less obvious benefits of iterative development of the model: it becomes easier to get stakeholders to consider total cost of ownership; it is easier to track decisions and assumptions and how they may have changed over the development – deployment cycle; evaluations can be targeted on to the weakest part of the model, thus significantly reducing evaluation costs.
Theory-based evaluations do present challenges. While most are of these are described in the guidance, we have the found three to be particularly pervasive:
- Failure to identify unanticipated and/or undesired consequences. Our researchers have become particularly skilled in challenging stakeholders to uncover these effects.
- Psychological tendency for stakeholders to take fixed positions based on their own agendas. This can lead to discounting ‘disconfirming’ evidence or being overly positive or negative. The evaluator has to maintain objectivity.
- Propensity to attribute causation to probability. This tendency is deeply embedded in human psychology Kahneman 2003), and is one reason why we have found qualitative techniques to be insufficient and always seek to establish a counterfactual. This is relatively straightforward if evaluation design is part of the intervention design. Even if the design is ex-ante, we have found it possible to construct counterfactuals using phasing, time series analysis on population benchmarks, and multivariate analysis. The advent of reliable and usable ‘Near-neighbour’ Analysis helps identify comparable groups and patterns of response within populations.
Challenges specific to evaluation include:
- Methodological shortcomings of standard output measures (eg value of time saved in transport models) (Linton 2013),
- Representation of equity and related distribution effects and social exclusion in the evaluation of accessibility (van Wee and Geurs 2011)
- Impact of agglomeration benefits and therefore projects in and between cities will be valued much more favourably
- Determining if cost benefits ratios or economic rate of return are more appropriate output measures, especially for public private partnerships (Rosewell 2012). If IRR or other measures of payback are used, then smaller projects tend to have a more positive evaluation.
- Incorporating the cost and benefits of digital technologies
Using Outcome evaluations
There will be occasions where this is the only approach available. Explaining to stakeholders that while it will be possible to assess whether there has been a change, there will be no direct evidence that the intervention has been the cause, has allowed us to increase rigour of such evaluations by:
- Constructing ex-ante benefits maps and logic models with stakeholders to assess potential impact.
- Using qualitative techniques to help stakeholders identify unexpected or undesired outcomes.
- Seeking intervention generated data and population trend data to model impact.
In addition, where possible we also take informed opinions of the stakeholders and observers involved. This moves a purely outcome evaluation towards a Theory based approach and can provide assurance of impact.
Davenport, T. (2009) “How to Design Smart Business Experiments”, HBR Feb 2009
Frost, A. (2001) “Presenting the business: assessment of technologies for management information”, London, BP plc (internal report)
Frost, A. (2008) “Evaluation of the value for money measures for Framework for Excellence in Further Education in England”, Coventry, Learning Skills Council (internal report)
Kahneman, D. (2003). “A perspective on judgment and choice: Mapping bounded rationality”. American Psychologist 58 (9): 697–720.
John, P. et al (2011) Nudge, Nudge, Think, Think: Using Experiments to Change Civic Behaviour, London, Bloomsbury Academic
Linton, T. (2013) “Critical Analysis of Conventional Transport Economic Evaluation”, Victoria Transport Policy Institute
Mason, D. (2004) “Evaluation of Teaching and Learning capital programme”, HEFCE 2004
Mason, D. (2012) “Evaluation of Teaching and Learning capital programme”, HEFCE 2012
Rosewell, B. (2012) “Submission to the LSE Growth Commission: Infrastructure and Energy”, Institute for Government
van Wee, B. and Geurs, K. (2011) “Discussing Equity and Social Exclusion in Accessibility Evaluations”, EJTIR, 11(4): 350-367.
Article by David Mason – Blue Alumni
Co-author – Dr Andy Frost – Blue Alumni Associate