Program evaluation

Program evaluation is a systematic method for collecting, analyzing, and using information to answer questions about projects, policies and programs, particularly about their effectiveness and efficiency.

In both the public sector and private sector, as well as the voluntary sector, stakeholders might be required to assess—under law or charter—or want to know whether the programs they are funding, implementing, voting for, receiving or opposing are producing the promised effect. To some degree, program evaluation falls under traditional cost–benefit analysis, concerning fair returns on the outlay of economic and other assets; however, social outcomes can be more complex to assess than market outcomes, and a different skillset is required. Considerations include how much the program costs per participant, program impact, how the program could be improved, whether there are better alternatives, if there are unforeseen consequences, and whether the program goals are appropriate and useful. Evaluators help to answer these questions. Best practice is for the evaluation to be a joint project between evaluators and stakeholders.

A wide range of different titles are applied to program evaluators, perhaps haphazardly at times, but there are some established usages: those who regularly use program evaluation skills and techniques on the job are known as Program Analysts; those whose positions combine administrative assistant or secretary duties with program evaluation are known as Program Assistants, Program Clerks (United Kingdom), Program Support Specialists, or Program Associates; those whose positions add lower-level project management duties are known as Program Coordinators.

The process of evaluation is considered to be a relatively recent phenomenon. However, planned social evaluation has been documented as dating as far back as 2200 BC. Evaluation became particularly relevant in the U.S. in the 1960s during the period of the Great Society social programs associated with the Kennedy and Johnson administrations. Extraordinary sums were invested in social programs, but the impacts of these investments were largely unknown.

Program evaluations can involve both quantitative and qualitative methods of social research. People who do program evaluation come from many different backgrounds, such as sociology, psychology, economics, social work, as well as political science subfields such as public policy and public administration who have studied a similar methodology known as policy analysis. Some universities also have specific training programs, especially at the postgraduate level in program evaluation, for those who studied an undergraduate subject area lacking in program evaluation skills.

Conducting an evaluation
Program evaluation may be conducted at several stages during a program's lifetime. Each of these stages raises different questions to be answered by the evaluator, and correspondingly different evaluation approaches are needed. Rossi, Lipsey and Freeman (2004) suggest the following kinds of assessment, which may be appropriate at these different stages:


 * Assessment of the need for the program
 * Assessment of program design and logic/theory
 * Assessment of how the program is being implemented (i.e., is it being implemented according to plan? Are the program's processes maximizing possible outcomes?)
 * Assessment of the program's outcome or impact (i.e., what it has actually achieved)
 * Assessment of the program's cost and efficiency

Assessing needs
A needs assessment examines the population that the program intends to target, to see whether the need as conceptualized in the program actually exists in the population; whether it is, in fact, a problem; and if so, how it might best be dealt with. This includes identifying and diagnosing the actual problem the program is trying to address, who or what is affected by the problem, how widespread the problem is, and what are the measurable effects that are caused by the problem. For example, for a housing program aimed at mitigating homelessness, a program evaluator may want to find out how many people are homeless in a given geographic area and what their demographics are. Rossi, Lipsey and Freeman (2004) caution against undertaking an intervention without properly assessing the need for one, because this might result in a great deal of wasted funds if the need did not exist or was misconceived.

Needs assessment involves the processes or methods used by evaluators to describe and diagnose social needs This is essential for evaluators because they need to identify whether programs are effective and they cannot do this unless they have identified what the problem/need is. Programs that do not do a needs assessment can have the illusion that they have eradicated the problem/need when in fact there was no need in the first place. Needs assessment involves research and regular consultation with community stakeholders and with the people that will benefit from the project before the program can be developed and implemented. Hence it should be a bottom-up approach. In this way potential problems can be realized early because the process would have involved the community in identifying the need and thereby allowed the opportunity to identify potential barriers.

The important task of a program evaluator is thus to: First, construct a precise definition of what the problem is. Evaluators need to first identify the problem/need. This is most effectively done by collaboratively including all possible stakeholders, i.e., the community impacted by the potential problem, the agents/actors working to address and resolve the problem, funders, etc. Including buy-in early on in the process reduces potential for push-back, miscommunication, and incomplete information later on.

Second, assess the extent of the problem. Having clearly identified what the problem is, evaluators need to then assess the extent of the problem. They need to answer the ‘where’ and ‘how big’ questions. Evaluators need to work out where the problem is located and how big it is. Pointing out that a problem exists is much easier than having to specify where it is located and how rife it is. Rossi, Lipsey & Freeman (2004) gave an example that: a person identifying some battered children may be enough evidence to persuade one that child abuse exists. But indicating how many children it affects and where it is located geographically and socially would require knowledge about abused children, the characteristics of perpetrators and the impact of the problem throughout the political authority in question.

This can be difficult considering that child abuse is not a public behavior, also keeping in mind that estimates of the rates on private behavior are usually not possible because of factors like unreported cases. In this case evaluators would have to use data from several sources and apply different approaches in order to estimate incidence rates. There are two more questions that need to be answered: Evaluators need to also answer the ’how’ and ‘what’ questions The ‘how’ question requires that evaluators determine how the need will be addressed. Having identified the need and having familiarized oneself with the community evaluators should conduct a performance analysis to identify whether the proposed plan in the program will actually be able to eliminate the need. The ‘what’ question requires that evaluators conduct a task analysis to find out what the best way to perform would be. For example, whether the job performance standards are set by an organization or whether some governmental rules need to be considered when undertaking the task.

Third, define and identify the target of interventions and accurately describe the nature of the service needs of that population It is important to know what/who the target population is/are – it might be individuals, groups, communities, etc. There are three units of the population: population at risk, population in need and population in demand Being able to specify what/who the target is will assist in establishing appropriate boundaries, so that interventions can correctly address the target population and be feasible to apply<
 * Population at risk: are people with a significant probability of developing the risk e.g. the population at risk for birth control programs are women of child-bearing age.
 * Population in need: are people with the condition that the program seeks to address; e.g. the population in need for a program that aims to provide ARV's to HIV positive people are people that are HIV positive.
 * Population in demand: that part of the population in need that agrees to be having the need and are willing to take part in what the program has to offer e.g. not all HIV positive people will be willing to take ARV's.

There are four steps in conducting a needs assessment: Needs analysis is hence a very crucial step in evaluating programs because the effectiveness of a program cannot be assessed unless we know what the problem was in the first place.
 * 1) Perform a ‘gap’ analyses
 * Evaluators need to compare current situation to the desired or necessary situation. The difference or the gap between the two situations will help identify the need, purpose and aims of the program.
 * 1) Identify priorities and importance
 * In the first step above, evaluators would have identified a number of interventions that could potentially address the need e.g. training and development, organization development etc. These must now be examined in view of their significance to the program's goals and constraints. This must be done by considering the following factors: cost effectiveness (consider the budget of the program, assess cost/benefit ratio), executive pressure (whether top management expects a solution) and population (whether many key people are involved).
 * 1) Identify causes of performance problems and/or opportunities
 * When the needs have been prioritized the next step is to identify specific problem areas within the need to be addressed. And to also assess the skills of the people that will be carrying out the interventions.
 * 1) Identify possible solutions and growth opportunities
 * Compare the consequences of the interventions if it was to be implemented or not.

Assessing program theory
The program theory, also called a logic model, knowledge map, or impact pathway, is an assumption, implicit in the way the program is designed, about how the program's actions are supposed to achieve the outcomes it intends. This 'logic model' is often not stated explicitly by people who run programs, it is simply assumed, and so an evaluator will need to draw out from the program staff how exactly the program is supposed to achieve its aims and assess whether this logic is plausible. For example, in an HIV prevention program, it may be assumed that educating people about HIV/AIDS transmission, risk and safe sex practices will result in safer sex being practiced. However, research in South Africa increasingly shows that in spite of increased education and knowledge, people still often do not practice safe sex. Therefore, the logic of a program which relies on education as a means to get people to use condoms may be faulty. This is why it is important to read research that has been done in the area. Explicating this logic can also reveal unintended or unforeseen consequences of a program, both positive and negative. The program theory drives the hypotheses to test for impact evaluation. Developing a logic model can also build common understanding amongst program staff and stakeholders about what the program is actually supposed to do and how it is supposed to do it, which is often lacking (see Participatory impact pathways analysis). Of course, it is also possible that during the process of trying to elicit the logic model behind a program the evaluators may discover that such a model is either incompletely developed, internally contradictory, or (in worst cases) essentially nonexisistent. This decidedly limits the effectiveness of the evaluation, although it does not necessarily reduce or eliminate the program.

Creating a logic model is a wonderful way to help visualize important aspects of programs, especially when preparing for an evaluation. An evaluator should create a logic model with input from many different stake holders. Logic Models have 5 major components: Resources or Inputs, Activities, Outputs, Short-term outcomes, and Long-term outcomes Creating a logic model helps articulate the problem, the resources and capacity that are currently being used to address the problem, and the measurable outcomes from the program. Looking at the different components of a program in relation to the overall short-term and long-term goals allows for illumination of potential misalignments. Creating an actual logic model is particularly important because it helps clarify for all stakeholders: the definition of the problem, the overarching goals, and the capacity and outputs of the program.

Rossi, Lipsey & Freeman (2004) suggest four approaches and procedures that can be used to assess the program theory. These approaches are discussed below. This entails assessing the program theory by relating it to the needs of the target population the program is intended to serve. If the program theory fails to address the needs of the target population it will be rendered ineffective even when if it is well implemented. This form of assessment involves asking a panel of expert reviewers to critically review the logic and plausibility of the assumptions and expectations inherent in the program's design. The review process is unstructured and open ended so as to address certain issues on the program design. Rutman (1980), Smith (1989), and Wholly (1994) suggested the questions listed below to assist with the review process.
 * Assessment in relation to social needs
 * Assessment of logic and plausibility
 * Are the program goals and objectives well defined?
 * Are the program goals and objectives feasible?
 * Is the change process presumed in the program theory feasible?
 * Are the procedures for identifying members of the target population, delivering service to them, and sustaining that service through completion well defined and sufficient?
 * Are the constituent components, activities, and functions of the program well defined and sufficient?
 * Are the resources allocated to the program and its various activities adequate?

This form of assessment requires gaining information from research literature and existing practices to assess various components of the program theory. The evaluator can assess whether the program theory is congruent with research evidence and practical experiences of programs with similar concepts.
 * Assessment through comparison with research and practice

This approach involves incorporating firsthand observations into the assessment process as it provides a reality check on the concordance between the program theory and the program itself. The observations can focus on the attainability of the outcomes, circumstances of the target population, and the plausibility of the program activities and the supporting resources.
 * Assessment via preliminary observation

These different forms of assessment of program theory can be conducted to ensure that the program theory is sound.

Wright and Wallis (2019) described an additional technique for assessing a program theory based on the theory's structure. This approach, known as integrative propositional analysis (IPA), is based on research streams finding that theories were more likely to work as expected when they had better structure (in addition meaning and data). IPA involves, first, identifying the propositions (statements of cause-and-effect) and creating a visual diagram of those propositions. Then, the researcher examines the number of concepts and causal relationships between them (circles and arrows on the diagram) to measure the breadth and depth of understanding reflected in the theory's structure. The measure for breadth is the number of concepts. This is based on the idea that real-world programs involve a lot of interconnected parts, therefore a theory that shows a larger number of concepts shows greater breadth of understanding of the program. The depth is the percentage of concepts that are the result of more than one other concept. This is based on the idea that, in real-world programs, things have more than one cause. Hence, a concept that is the result of more than one other concept in the theory shows better understanding of that concept; a theory with a higher percentage of better-understood concepts shows a greater depth of understanding of the program.

Assessing implementation
Process analysis looks beyond the theory of what the program is supposed to do and instead evaluates how the program is being implemented. This evaluation determines whether the components identified as critical to the success of the program are being implemented. The evaluation determines whether target populations are being reached, people are receiving the intended services, staff are adequately qualified. Process evaluation is an ongoing process in which repeated measures may be used to evaluate whether the program is being implemented effectively. This problem is particularly critical because many innovations, particularly in areas like education and public policy, consist of fairly complex chains of action. For example, process evaluation can be used in public health research. Many of which these elements rely on the prior correct implementation of other elements, and will fail if the prior implementation was not done correctly. This was conclusively demonstrated by Gene V. Glass and many others during the 1980s. Since incorrect or ineffective implementation will produce the same kind of neutral or negative results that would be produced by correct implementation of a poor innovation, it is essential that evaluation research assess the implementation process itself. Otherwise, a good innovative idea may be mistakenly characterized as ineffective, where in fact it simply had never been implemented as designed.

Assessing the impact (effectiveness)
The impact evaluation determines the causal effects of the program. This involves trying to measure if the program has achieved its intended outcomes, i.e. program outcomes.

Program outcomes
An outcome is the state of the target population or the social conditions that a program is expected to have changed. Program outcomes are the observed characteristics of the target population or social conditions, not of the program. Thus the concept of an outcome does not necessarily mean that the program targets have actually changed or that the program has caused them to change in any way.

There are two kinds of outcomes, namely outcome level and outcome change, also associated with program effect.
 * Outcome level refers to the status of an outcome at some point in time.
 * Outcome change refers to the difference between outcome levels at different points in time.
 * Program effect refers to that portion of an outcome change that can be attributed uniquely to a program as opposed to the influence of some other factor.

Measuring program outcomes
Outcome measurement is a matter of representing the circumstances defined as the outcome by means of observable indicators that vary systematically with changes or differences in those circumstances. Outcome measurement is a systematic way to assess the extent to which a program has achieved its intended outcomes. According to Mouton (2009) measuring the impact of a program means demonstrating or estimating the accumulated differentiated proximate and emergent effect, some of which might be unintended and therefore unforeseen.

Outcome measurement serves to help understand whether the program is effective or not. It further helps to clarify understanding of a program. But the most important reason for undertaking the effort is to understand the impacts of the work on the people being served. With the information collected, it can be determined which activities to continue and build upon, and which need to be changed in order to improve the effectiveness of the program.

This can involve using sophisticated statistical techniques in order to measure the effect of the program and to find causal relationship between the program and the various outcomes.

Assessing efficiency
Finally, cost-benefit or cost-efficiency analysis assesses the efficiency of a program. Evaluators outline the benefits and cost of the program for comparison. An efficient program has a lower cost-benefit ratio. There are two types of efficiency, namely, static and dynamic. While static efficiency concerns achieving the objectives with least costs, dynamic efficiency concerns continuous improvement.

Determining causation
Perhaps the most difficult part of evaluation is determining whether the program itself is causing the changes that are observed in the population it was aimed at. Events or processes outside of the program may be the real cause of the observed outcome (or the real prevention of the anticipated outcome).

Causation is difficult to determine. One main reason for this is self selection bias. People select themselves to participate in a program. For example, in a job training program, some people decide to participate and others do not. Those who do participate may differ from those who do not in important ways. They may be more determined to find a job or have better support resources. These characteristics may actually be causing the observed outcome of increased employment, not the job training program.

Evaluations conducted with random assignment are able to make stronger inferences about causation. Randomly assigning people to participate or to not participate in the program, reduces or eliminates self-selection bias. Thus, the group of people who participate would likely be more comparable to the group who did not participate.

However, since most programs cannot use random assignment, causation cannot be determined. Impact analysis can still provide useful information. For example, the outcomes of the program can be described. Thus the evaluation can describe that people who participated in the program were more likely to experience a given outcome than people who did not participate.

If the program is fairly large, and there are enough data, statistical analysis can be used to make a reasonable case for the program by showing, for example, that other causes are unlikely.

Reliability, validity and sensitivity
It is important to ensure that the instruments (for example, tests, questionnaires, etc.) used in program evaluation are as reliable, valid and sensitive as possible. According to Rossi et al. (2004, p. 222), 'a measure that is poorly chosen or poorly conceived can completely undermine the worth of an impact assessment by producing misleading estimates. Only if outcome measures are valid, reliable and appropriately sensitive can impact assessments be regarded as credible'.

Reliability
The reliability of a measurement instrument is the 'extent to which the measure produces the same results when used repeatedly to measure the same thing' (Rossi et al., 2004, p. 218). The more reliable a measure is, the greater its statistical power and the more credible its findings. If a measuring instrument is unreliable, it may dilute and obscure the real effects of a program, and the program will 'appear to be less effective than it actually is' (Rossi et al., 2004, p. 219). Hence, it is important to ensure the evaluation is as reliable as possible.

Validity
The validity of a measurement instrument is 'the extent to which it measures what it is intended to measure' (Rossi et al., 2004, p. 219). This concept can be difficult to accurately measure: in general use in evaluations, an instrument may be deemed valid if accepted as valid by the stakeholders (stakeholders may include, for example, funders, program administrators, et cetera).

Sensitivity
The principal purpose of the evaluation process is to measure whether the program has an effect on the social problem it seeks to redress; hence, the measurement instrument must be sensitive enough to discern these potential changes (Rossi et al., 2004). A measurement instrument may be insensitive if it contains items measuring outcomes which the program couldn't possibly effect, or if the instrument was originally developed for applications to individuals (for example standardized psychological measures) rather than to a group setting (Rossi et al., 2004). These factors may result in 'noise' which may obscure any effect the program may have had.

Only measures which adequately achieve the benchmarks of reliability, validity and sensitivity can be said to be credible evaluations. It is the duty of evaluators to produce credible evaluations, as their findings may have far reaching effects. A discreditable evaluation which is unable to show that a program is achieving its purpose when it is in fact creating positive change may cause the program to lose its funding undeservedly.

Steps to program evaluation framework
The Center for Disease Control (CDC) delineates six steps to a complete program evaluation. The steps described are: engage stakeholder, describe the program, focus the evaluation design, gather credible evidence, justify conclusions, and ensure use and share lessons learned. These steps can happen in a cycle framework to represent the continuing process of evaluation.

Evaluating collective impact
Though program evaluation processes mentioned here are appropriate for most programs, highly complex non-linear initiatives, such as those using the collective impact (CI) model, require a dynamic approach to evaluation. Collective impact is "the commitment of a group of important actors from different sectors to a common agenda for solving a specific social problem" and typically involves three stages, each with a different recommended evaluation approach:

Recommended evaluation approach: Developmental evaluation to help CI partners understand the context of the initiative and its development: "Developmental evaluation involves real time feedback about what is emerging in complex dynamic systems as innovators seek to bring about systems change."
 * Early phase: CI participants are exploring possible strategies and developing plans for action. Characterized by uncertainty.

Recommended evaluation approach: Formative evaluation to refine and improve upon the progress, as well as continued developmental evaluation to explore new elements as they emerge. Formative evaluation involves "careful monitoring of processes in order to respond to emergent properties and any unexpected outcomes."
 * Middle phase: CI partners implement agreed upon strategies. Some outcomes become easier to anticipate.

Recommended evaluation approach: Summative evaluation "uses both quantitative and qualitative methods in order to get a better understanding of what [the] project has achieved, and how or why this has occurred."
 * Later phase: Activities achieve stability and are no longer in formation. Experience informs knowledge about which activities may be effective.

Planning a program evaluation
Planning a program evaluation can be broken up into four parts: focusing the evaluation, collecting the information, using the information, and managing the evaluation. Program evaluation involves reflecting on questions about evaluation purpose, what questions are necessary to ask, and what will be done with information gathered. Critical questions for consideration include:
 * What am I going to evaluate?
 * What is the purpose of this evaluation?
 * Who will use this evaluation? How will they use it?
 * What questions is this evaluation seeking to answer?
 * What information do I need to answer the questions?
 * When is the evaluation needed? What resources do I need?
 * How will I collect the data I need?
 * How will data be analyzed?
 * What is my implementation timeline?

The shoestring approach
The "shoestring evaluation approach" is designed to assist evaluators operating under limited budget, limited access or availability of data and  limited turnaround time, to conduct effective evaluations that are methodologically rigorous(Bamberger, Rugh, Church & Fort, 2004). This approach has responded to the continued greater need for evaluation processes that are more rapid and economical under difficult circumstances of budget, time constraints and limited availability of data. However, it is not always possible to design an evaluation to achieve the highest standards available. Many programs do not build an evaluation procedure into their design or budget. Hence, many evaluation processes do not begin until the program is already underway, which can result in time, budget or data constraints for the evaluators, which in turn can affect the reliability, validity or sensitivity of the evaluation. > The shoestring approach helps to ensure that the maximum possible methodological rigor is achieved under these constraints.

Budget constraints
Frequently, programs are faced with budget constraints because most original projects do not include a budget to conduct an evaluation (Bamberger et al., 2004). Therefore, this automatically results in evaluations being allocated smaller budgets that are inadequate for a rigorous evaluation. Due to the budget constraints it might be difficult to effectively apply the most appropriate  methodological instruments. These constraints may consequently affect the time available in which to do the evaluation (Bamberger et al., 2004). Budget constraints may be addressed by simplifying the evaluation design, revising the sample size, exploring economical data collection methods (such as using volunteers to collect data, shortening surveys, or using focus groups and key informants) or looking for reliable secondary data (Bamberger et al., 2004).

Time constraints
The most time constraint that can be faced by an evaluator is when the evaluator is summoned to conduct an evaluation when a project is already underway if they are given limited time to do the evaluation compared to the life of the study, or if they are not given enough time for adequate planning. Time constraints are particularly problematic when the evaluator is not familiar with the area or country in which the program is situated (Bamberger et al., 2004). Time constraints can be addressed by the methods listed under budget constraints as above, and also by careful planning to ensure effective data collection and analysis within the limited time space.

Data constraints
If the evaluation is initiated late in the program, there may be no baseline data on the conditions of the target group before the intervention began (Bamberger et al., 2004). Another possible cause of data constraints is if the data have been collected by program staff and contain systematic reporting biases or poor record keeping standards and is subsequently of little use (Bamberger et al., 2004). Another source of data constraints may result if the target group are difficult to reach to collect data from - for example homeless people, drug addicts, migrant workers, et cetera (Bamberger et al., 2004). Data constraints can be addressed by reconstructing baseline data from secondary data or through the use of multiple methods. Multiple methods, such as the combination of qualitative and quantitative data can increase validity through triangulation and save time and money. Additionally, these constraints may be dealt with through careful planning and consultation with program stakeholders. By clearly identifying and understanding client needs ahead of the evaluation, costs and time of the evaluative process can be streamlined and reduced, while still maintaining credibility.

All in all, time, monetary and data constraints can have negative implications on the validity, reliability and transferability of the evaluation. The shoestring approach has been created to assist evaluators  to correct  the limitations identified above by identifying ways to reduce costs and time, reconstruct baseline data and to ensure maximum quality under existing constraints (Bamberger et al., 2004).

Five-tiered approach
The five-tiered approach to evaluation further develops the strategies that the shoestring approach to evaluation is based upon. It was originally developed by Jacobs (1988) as an alternative way to evaluate community-based programs and as such was applied to a statewide child and family program in Massachusetts, U.S.A. The five-tiered approach is offered as a conceptual framework for matching evaluations more precisely to the characteristics of the programs themselves, and to the particular resources and constraints inherent in each evaluation context. In other words, the five-tiered approach seeks to tailor the evaluation to the specific needs of each evaluation context.

The earlier tiers (1-3) generate descriptive and process-oriented information while the later tiers (4-5) determine both the short-term and the long-term effects of the program. The five levels are organized as follows:
 * Tier 1: needs assessment (sometimes referred to as pre-implementation)
 * Tier 2: monitoring and accountability
 * Tier 3: quality review and program clarification (sometimes referred to as understanding and refining)
 * Tier 4: achieving outcomes
 * Tier 5: establishing impact

For each tier, purpose(s) are identified, along with corresponding tasks that enable the identified purpose of the tier to be achieved. For example, the purpose of the first tier, Needs assessment, would be to document a need for a program in a community. The task for that tier would be to assess the community's needs and assets by working with all relevant stakeholders.

While the tiers are structured for consecutive use, meaning that information gathered in the earlier tiers is required for tasks on higher tiers, it acknowledges the fluid nature of evaluation. Therefore, it is possible to move from later tiers back to preceding ones, or even to work in two tiers at the same time. It is important for program evaluators to note, however, that a program must be evaluated at the appropriate level.

The five-tiered approach is said to be useful for family support programs which emphasise community and participant empowerment. This is because it encourages a participatory approach involving all stakeholders and it is through this process of reflection that empowerment is achieved.

Methodological challenges presented by language and culture
The purpose of this section is to draw attention to some of the methodological challenges and dilemmas evaluators are potentially faced with when conducting a program evaluation in a developing country. In many developing countries the major sponsors of evaluation are donor agencies from the developed world, and these agencies require regular evaluation reports in order to maintain accountability and control of resources, as well as generate evidence for the program's success or failure. However, there are many hurdles and challenges which evaluators face when attempting to implement an evaluation program which attempts to make use of techniques and systems which are not developed within the context to which they are applied. Some of the issues include differences in culture, attitudes, language and political process.

Culture is defined by Ebbutt (1998, p. 416) as a "constellation of both written and unwritten expectations, values, norms, rules, laws, artifacts, rituals and behaviors that permeate a society and influence how people behave socially". Culture can influence many facets of the evaluation process, including data collection, evaluation program implementation and the analysis and understanding of the results of the evaluation. In particular, instruments which are traditionally used to collect data such as questionnaires and semi-structured interviews need to be sensitive to differences in culture, if they were originally developed in a different cultural context. The understanding and meaning of constructs which the evaluator is attempting to measure may not be shared between the evaluator and the sample population and thus the transference of concepts is an important notion, as this will influence the quality of the data collection carried out by evaluators as well as the analysis and results generated by the data.

Language also plays an important part in the evaluation process, as language is tied closely to culture. Language can be a major barrier to communicating concepts which the evaluator is trying to access, and translation is often required. There are a multitude of problems with translation, including the loss of meaning as well as the exaggeration or enhancement of meaning by translators. For example, terms which are contextually specific may not translate into another language with the same weight or meaning. In particular, data collection instruments need to take meaning into account as the subject matter may not be considered sensitive in a particular context might prove to be sensitive in the context in which the evaluation is taking place. Thus, evaluators need to take into account two important concepts when administering data collection tools: lexical equivalence and conceptual equivalence. Lexical equivalence asks the question: how does one phrase a question in two languages using the same words? This is a difficult task to accomplish, and uses of techniques such as back-translation may aid the evaluator but may not result in perfect transference of meaning. This leads to the next point, conceptual equivalence. It is not a common occurrence for concepts to transfer unambiguously from one culture to another. Data collection instruments which have not undergone adequate testing and piloting may therefore render results which are not useful as the concepts which are measured by the instrument may have taken on a different meaning and thus rendered the instrument unreliable and invalid.

Thus, it can be seen that evaluators need to take into account the methodological challenges created by differences in culture and language when attempting to conduct a program evaluation in a developing country.

Utilization results
There are three conventional uses of evaluation results: persuasive utilization, direct (instrumental) utilization, and conceptual utilization.

Persuasive utilization
Persuasive utilization is the enlistment of evaluation results in an effort to persuade an audience to either support an agenda or to oppose it. Unless the 'persuader' is the same person that ran the evaluation, this form of utilization is not of much interest to evaluators as they often cannot foresee possible future efforts of persuasion.

Direct (instrumental) utilization
Evaluators often tailor their evaluations to produce results that can have a direct influence in the improvement of the structure, or on the process, of a program. For example, the evaluation of a novel educational intervention may produce results that indicate no improvement in students' marks. This may be due to the intervention not having a sound theoretical background, or it may be that the intervention is not conducted as originally intended. The results of the evaluation would hopefully cause to the creators of the intervention to go back to the drawing board to re-create the core structure of the intervention, or even change the implementation processes.

Conceptual utilization
But even if evaluation results do not have a direct influence in the re-shaping of a program, they may still be used to make people aware of the issues the program is trying to address. Going back to the example of an evaluation of a novel educational intervention, the results can also be used to inform educators and students about the different barriers that may influence students' learning difficulties. A number of studies on these barriers may then be initiated by this new information.

Variables affecting utilization
There are five conditions that seem to affect the utility of evaluation results, namely relevance, communication between the evaluators and the users of the results, information processing by the users, the plausibility of the results, as well as the level of involvement or advocacy of the users.

Guidelines for maximizing utilization
Quoted directly from Rossi et al. (2004, p. 416).:
 * Evaluators must understand the cognitive styles of decisionmakers
 * Evaluation results must be timely and available when needed
 * Evaluations must respect stakeholders' program commitments
 * Utilization and dissemination plans should be part of the evaluation design
 * Evaluations should include an assessment of utilization

Internal versus external program evaluators
The choice of the evaluator chosen to evaluate the program may be regarded as equally important as the process of the evaluation. Evaluators may be internal (persons associated with the program to be executed) or external (Persons not associated with any part of the execution/implementation of the program). (Division for oversight services,2004). The following provides a brief summary of the advantages and disadvantages of internal and external evaluators adapted from the Division of oversight services (2004), for a more comprehensive list of advantages and disadvantages of internal and external evaluators, see (Division of oversight services, 2004).

Internal evaluators
Advantages
 * May have better overall knowledge of the program and possess informal knowledge of the program
 * Less threatening as already familiar with staff
 * Less costly

Disadvantages
 * May be less objective
 * May be more preoccupied with other activities of the program and not give the evaluation complete attention
 * May not be adequately trained as an evaluator.

External evaluators
Advantages Disadvantages
 * More objective of the process, offers new perspectives, different angles to observe and critique the process
 * May be able to dedicate greater amount of time and attention to the evaluation
 * May have greater expertise and evaluation brain
 * May be more costly and require more time for the contract, monitoring, negotiations etc.
 * May be unfamiliar with program staff and create anxiety about being evaluated
 * May be unfamiliar with organization policies, certain constraints affecting the program.

Positivist
Potter (2006) identifies and describes three broad paradigms within program evaluation. The first, and probably most common, is the positivist approach, in which evaluation can only occur where there are "objective", observable and measurable aspects of a program, requiring predominantly quantitative evidence. The positivist approach includes evaluation dimensions such as needs assessment, assessment of program theory, assessment of program process, impact assessment and efficiency assessment (Rossi, Lipsey and Freeman, 2004). A detailed example of the positivist approach is a study conducted by the Public Policy Institute of California report titled "Evaluating Academic Programs in California's Community Colleges", in which the evaluators examine measurable activities (i.e. enrollment data) and conduct quantitive assessments like factor analysis.

Interpretive
The second paradigm identified by Potter (2006) is that of interpretive approaches, where it is argued that it is essential that the evaluator develops an understanding of the perspective, experiences and expectations of all stakeholders. This would lead to a better understanding of the various meanings and needs held by stakeholders, which is crucial before one is able to make judgments about the merit or value of a program. The evaluator's contact with the program is often over an extended period of time and, although there is no standardized method, observation, interviews and focus groups are commonly used. A report commissioned by the World Bank details 8 approaches in which qualitative and quantitative methods can be integrated and perhaps yield insights not achievable through only one method.

Critical-emancipatory
Potter (2006) also identifies critical-emancipatory approaches to program evaluation, which are largely based on action research for the purposes of social transformation. This type of approach is much more ideological and often includes a greater degree of social activism on the part of the evaluator. This approach would be appropriate for qualitative and participative evaluations. Because of its critical focus on societal power structures and its emphasis on participation and empowerment, Potter argues this type of evaluation can be particularly useful in developing countries.

Despite the paradigm which is used in any program evaluation, whether it be positivist, interpretive or critical-emancipatory, it is essential to acknowledge that evaluation takes place in specific socio-political contexts. Evaluation does not exist in a vacuum and all evaluations, whether they are aware of it or not, are influenced by socio-political factors. It is important to recognize the evaluations and the findings which result from this kind of evaluation process can be used in favour or against particular ideological, social and political agendas (Weiss, 1999). This is especially true in an age when resources are limited and there is competition between organizations for certain projects to be prioritised over others (Louw, 1999).

Empowerment evaluation
Empowerment evaluation makes use of evaluation concepts, techniques, and findings to foster improvement and self-determination of a particular program aimed at a specific target population/program participants. Empowerment evaluation is value oriented towards getting program participants involved in bringing about change in the programs they are targeted for. One of the main focuses in empowerment evaluation is to incorporate the program participants in the conducting of the evaluation process. This process is then often followed by some sort of critical reflection of the program. In such cases, an external/outsider evaluator serves as a consultant/coach/facilitator to the program participants and seeks to understand the program from the perspective of the participants. Once a clear understanding of the participants perspective has been gained appropriate steps and strategies can be devised (with the valuable input of the participants) and implemented in order to reach desired outcomes.

According to Fetterman (2002) empowerment evaluation has three steps;
 * Establishing a mission
 * Taking stock
 * Planning for the future

Establishing a mission
The first step involves evaluators asking the program participants and staff members (of the program) to define the mission of the program. Evaluators may opt to carry this step out by bringing such parties together and asking them to generate and discuss the mission of the program. The logic behind this approach is to show each party that there may be divergent views of what the program mission actually is.

Taking stock
Taking stock as the second step consists of two important tasks. The first task is concerned with program participants and program staff generating a list of current key activities that are crucial to the functioning of the program. The second task is concerned with rating the identified key activities, also known as prioritization. For example, each party member may be asked to rate each key activity on a scale from 1 to 10, where 10 is the most important and 1 the least important. The role of the evaluator during this task is to facilitate interactive discussion amongst members in an attempt to establish some baseline of shared meaning and understanding pertaining to the key activities. In addition, relevant documentation (such as financial reports and curriculum information) may be brought into the discussion when considering some of the key activities.

Planning for the future
After prioritizing the key activities the next step is to plan for the future. Here the evaluator asks program participants and program staff how they would like to improve the program in relation to the key activities listed. The objective is to create a thread of coherence whereby the mission generated (step 1) guides the stock take (step 2) which forms the basis for the plans for the future (step 3). Thus, in planning for the future specific goals are aligned with relevant key activities. In addition to this it is also important for program participants and program staff to identify possible forms of evidence (measurable indicators) which can be used to monitor progress towards specific goals. Goals must be related to the program's activities, talents, resources and scope of capability- in short the goals formulated must be realistic.

These three steps of empowerment evaluation produce the potential for a program to run more effectively and more in touch with the needs of the target population. Empowerment evaluation as a process which is facilitated by a skilled evaluator equips as well as empowers participants by providing them with a 'new' way of critically thinking and reflecting on programs. Furthermore, it empowers program participants and staff to recognize their own capacity to bring about program change through collective action.

Transformative paradigm
The transformative paradigm is integral in incorporating social justice in evaluation. Donna Mertens, primary researcher in this field, states that the transformative paradigm, "focuses primarily on viewpoints of marginalized groups and interrogating systemic power structures through mixed methods to further social justice and human rights". The transformative paradigm arose after marginalized groups, who have historically been pushed to the side in evaluation, began to collaborate with scholars to advocate for social justice and human rights in evaluation. The transformative paradigm introduces many different paradigms and lenses to the evaluation process, leading it to continually call into question the evaluation process.

Both the American Evaluation Association and National Association of Social Workers call attention to the ethical duty to possess cultural competence when conducting evaluations. Cultural competence in evaluation can be broadly defined as a systemic, response inquiry that is actively cognizant, understanding, and appreciative of the cultural context in which the evaluation takes place; that frames and articulates epistemology of the evaluation endeavor; that employs culturally and contextually appropriate methodology; and that uses stakeholder-generated, interpretive means to arrive at the results and further use of the findings. Many health and evaluation leaders are careful to point out that cultural competence cannot be determined by a simple checklist, but rather it is an attribute that develops over time. The root of cultural competency in evaluation is a genuine respect for communities being studied and openness to seek depth in understanding different cultural contexts, practices and paradigms of thinking. This includes being creative and flexible to capture different cultural contexts, and heightened awareness of power differentials that exist in an evaluation context. Important skills include: ability to build rapport across difference, gain the trust of the community members, and self-reflect and recognize one's own biases.

Paradigms
The paradigms axiology, ontology, epistemology, and methodology are reflective of social justice practice in evaluation. These examples focus on addressing inequalities and injustices in society by promoting inclusion and equality in human rights.

Axiology (Values and Value Judgements)
The transformative paradigm's axiological assumption rests on four primary principles:
 * The importance of being culturally respectful
 * The promotion of social justice
 * The furtherance of human rights
 * Addressing inequities

Ontology (Reality)
Differences in perspectives on what is real are determined by diverse values and life experiences. In turn these values and life experiences are often associated with differences in access to privilege, based on such characteristics as disability, gender, sexual identity, religion, race/ethnicity, national origins, political party, income level, age, language, and immigration or refugee status.

Epistemology (Knowledge)
Knowledge is constructed within the context of power and privilege with consequences attached to which version of knowledge is given privilege. "Knowledge is socially and historically located within a complex cultural context".

Methodology (Systematic Inquiry)
Methodological decisions are aimed at determining the approach that will best facilitate use of the process and findings to enhance social justice; identify the systemic forces that support the status quo and those that will allow change to happen; and acknowledge the need for a critical and reflexive relationship between the evaluator and the stakeholders.

Lenses
While operating through social justice, it is imperative to be able to view the world through the lens of those who experience injustices. Critical Race Theory, Feminist Theory, and Queer/LGBTQ Theory are frameworks for how we think others should think about providing justice for marginalized groups. These lenses create opportunity to make each theory priority in addressing inequality.

Critical Race Theory
Critical Race Theory(CRT)is an extension of critical theory that is focused in inequities based on race and ethnicity. Daniel Solorzano describes the role of CRT as providing a framework to investigate and make visible those systemic aspects of society that allow the discriminatory and oppressive status quo of racism to continue.

Feminist theory
The essence of feminist theories is to "expose the individual and institutional practices that have denied access to women and other oppressed groups and have ignored or devalued women"

Queer/LGBTQ theory
Queer/LGBTQ theorists question the heterosexist bias that pervades society in terms of power over and discrimination toward sexual orientation minorities. Because of the sensitivity of issues surrounding LGBTQ status, evaluators need to be aware of safe ways to protect such individuals’ identities and ensure that discriminatory practices are brought to light in order to bring about a more just society.

Government requirements
Given the Federal budget deficit, the Obama Administration moved to apply an "evidence-based approach" to government spending, including rigorous methods of program evaluation. The President's 2011 Budget earmarked funding for 19 government program evaluations for agencies such as the Department of Education and the United States Agency for International Development (USAID). An inter-agency group delivers the goal of increasing transparency and accountability by creating effective evaluation networks and drawing on best practices. A six-step framework for conducting evaluation of public health programs, published by the Centers for Disease Control and Prevention (CDC), initially increased the emphasis on program evaluation of government programs in the US. The framework is as follows:
 * 1) Engage stakeholders
 * 2) Describe the program.
 * 3) Focus the evaluation.
 * 4) Gather credible evidence.
 * 5) Justify conclusions.
 * 6) Ensure use and share lessons learned.

In January 2019, the Foundations for Evidence-Based Policymaking Act introduced new requirements for federal agencies, such as naming a Chief Evaluation Officer. Guidance published by the Office of Management and Budget on implementing this law requires agencies to develop a multi-year learning agenda, which has specific questions the agency wants to answer to improve strategic and operational outcomes. Agencies must also complete an annual evaluation plan summarizing the specific evaluations the agency plans to undertake to address the questions in the learning agenda.

Types of evaluation
There are many different approaches to program evaluation. Each serves a different purpose.
 * Utilization-Focused Evaluation
 * CIPP Model of evaluation
 * Formative Evaluation
 * Summative Evaluation
 * Developmental Evaluation
 * Principles-Focused Evaluation
 * Theory-Driven Evaluation
 * Realist-Driven Evaluation

History of the CIPP model
The CIPP model of evaluation was developed by Daniel Stufflebeam and colleagues in the 1960s.CIPP is an acronym for Context, Input, Process and Product. CIPP is an evaluation model that requires the evaluation of context, input, process and product in judging a programme's value. CIPP is a decision-focused approach to evaluation and emphasises the systematic provision of information for programme management and operation.

CIPP model
The CIPP framework was developed as a means of linking evaluation with programme decision-making. It aims to provide an analytic and rational basis for programme decision-making, based on a cycle of planning, structuring, implementing and reviewing and revising decisions, each examined through a different aspect of evaluation –context, input, process and product evaluation.

The CIPP model is an attempt to make evaluation directly relevant to the needs of decision-makers during the phases and activities of a programme. Stufflebeam's context, input, process, and product (CIPP) evaluation model is recommended as a framework to systematically guide the conception, design, implementation, and assessment of service-learning projects, and provide feedback and judgment of the project's effectiveness for continuous improvement.

Four aspects of CIPP evaluation
These aspects are context, inputs, process, and product. These four aspects of CIPP evaluation assist a decision-maker to answer four basic questions: This involves collecting and analysing needs assessment data to determine goals, priorities and objectives. For example, a context evaluation of a literacy program might involve an analysis of the existing objectives of the literacy programme, literacy achievement test scores, staff concerns (general and particular), literacy policies and plans and community concerns, perceptions or attitudes and needs. This involves the steps and resources needed to meet the new goals and objectives and might include identifying successful external programs and materials as well as gathering information. This provides decision-makers with information about how well the programme is being implemented. By continuously monitoring the program, decision-makers learn such things as how well it is following the plans and guidelines, conflicts arising, staff support and morale, strengths and weaknesses of materials, delivery and budgeting problems. By measuring the actual outcomes and comparing them to the anticipated outcomes, decision-makers are better able to decide if the program should be continued, modified, or dropped altogether. This is the essence of product evaluation.
 * What should we do?
 * How should we do it?
 * Are we doing it as planned?
 * Did the programme work?

Using CIPP in the different stages of the evaluation
As an evaluation guide, the CIPP model allows evaluators to evaluate the program at different stages, namely: before the program commences by helping evaluators to assess the need and at the end of the program to assess whether or not the program had an effect.

CIPP model allows evaluators to ask formative questions at the beginning of the program, then later supports evaluation the programs impact through asking summative questions on all aspects of the program.
 * Context: What needs to be done? Vs. Were important needs addressed?
 * Input: How should it be done? Vs. Was a defensible design employed?
 * Process: Is it being done? Vs. Was the design well executed?
 * Product: Is it succeeding? Vs. Did the effort succeed?