Guide to Statistics and Methods
Surgical Education Research
January 3, 2024
Practical Guide to Education Program Evaluation Research
Marc de Moya, Jason S. Haukoos, Kamal M. F. Itani
JAMA Surg. Published online January 3, 2024. doi:10.1001/jamasurg.2023.6702
Introduction
Program evaluation is the systematic assessment of a program’s implementation. In medical education, evaluation includes the synthesis and analysis of educational programs, which in turn provides evidence for educational best practices. In medical education, as in other fields, the quality of the synthesis is dependent on the rigor by which evaluations are performed. Individual program evaluation is best achieved when similar programs apply the same scientific rigor and methodology to assess outcomes, thus allowing for direct comparisons. The pedagogy of a given program, particularly in medical education, can be driven by nonscientific forces (ie, political, faddism, or ideology) rather than evidence.1This often impedes more rapid progress of educational methods to achieve an educational goal compared with a more evidence-based practice.2
This article will review the most common methods for building evidence of validity for an educational program within health professional education and will include an example of a framework to perform program evaluation within departments or institutions or across educational systems. There are lessons to be learned from non–health professional education that are applicable and will be mentioned. This will provide a more practical approach to understanding how to inform individual curricula, educational systems, and pedagogy. This more scientific approach to educational programs will inform policy and result in best practices across medical education. There are also a number of initiatives in other educational realms that synthesize research on educational programs that are summarized herein.
Using the Methodology
When and Why This Method Might Be Used
Program evaluation in education assesses educational processes, methods, and policies and evaluates their efficacy and effectiveness in achieving an educational goal. It allows for the design, implementation, evaluation, and revision of programs on an ongoing basis. These evaluations may be driven by an external accrediting agency such as the Accreditation Council for Graduate Medical Education (ACGME), which requires that ACGME-approved programs perform an annual program evaluation, or internally, such as the required institutionally driven reviews. When done on a regular basis, evaluations can ensure continued progress and improvement toward set educational goals based on collected data and temporal trends. When standardized across programs or systems, they also allow for comparison and improvement. This evaluation process uses both quantitative and qualitative methods; for example, the evaluation of resident work hours and well-being might require a mixed methods approach.
Data on modifications of residents’ work hours can be used to assess the effectiveness of these modifications for resident learning and patients’ outcomes. Data on hiring of additional personnel to make up for a reduction in resident work hours and budgetary requirement will help in assessment of efficiency and costs. The evaluation of these various components will allow for an overall programmatic assessment that can inform change, allow for comparison, and improvements in policy.
Resources Required for Using Program Evaluation
The first task is to identify the various components of a program and perform the evaluation process for each component. This will help to define the purpose of the evaluation and how the results will be used. By understanding the questions to be asked for each component of the program and the purpose of the evaluation, one can then identify the stakeholders that will benefit from the evaluation. Once the evaluation questions are better defined, one can then ensure which performance data will be used to inform the process; these data can come from several sources.
Program evaluation requires use of both quantitative and qualitative methods as well as buy-in and participation of stakeholders. In the aforementioned example on work hours, stakeholders can enter their own data on work hours in a computerized database or application. One potential flaw to this method of data collection is the inaccuracy of reporting by residents or intentional misrepresentation. A potential solution is the hiring of a person to make observations or implementing technology to measure work hours, which will require additional funds. Qualitative assessment is more difficult as it will require the development of validated surveys or semistructured interviews or use of focus groups to assess the association of work hours with residents’ well-being. Programmers and biostatisticians will need to be available to link work hour databases to patient outcome databases to analyze the effect of work hour modifications and resident well-being on patient-centered outcomes. Biostatisticians can also develop the proper analyses to answer specific questions related to efficacy and cost-effectiveness. In every instance, it is important to ensure that the data to be used are aligned with the questions posed for the evaluation.
It is also important to define what will be considered success or failure of the program. The timeline and budget for performing the evaluation and how the information will be disseminated are needed. Once the results are obtained, there needs to be a method to introduce the information to the stakeholders. This process is repetitive and when used properly will introduce a cycle of continuous improvement. An example of a commonly used standardized approach to evaluation adapted by the Surgeon as Educators Course by the American College of Surgeons is given in the Box.
Box.
Summary
- Individual program evaluations should follow a standardized systematic approach.
- Identifying the focus of an evaluation enables building of comparisons.
- Planning interventions based on the evaluation process should also follow a systematic approach.
Advantages of the Method
A systematic approach to program evaluation will ensure that the information obtained will be helpful to the program leadership to institute necessary changes to improve the program. In addition, if surgical educational programs have a more standardized approach to performing evaluations, it will be easier to compare 2 or more components of programs.
Pitfalls of the Method
There is often misalignment of program evaluation questions, stakeholders, and/or performance data needed. Programs may not have adequate performance data and may only use what they have available. When engaging in educational program evaluation, one must apply scientific rigor and standardization, ask the appropriate questions, and define the stakeholders, timeline, and budget. Deficiency in any of those areas will result in dubious outcomes and inability to perform comparisons and may subsequently improve the program overtime.
Statistical Considerations
Study Designs Unique to Program Evaluation and the Research Question
Specific research questions or components of programs may require separate evaluation and methods. As evaluation of a program involves use of an intervention (ie, the program), use of both experimental (ie, randomized clinical trials) and quasi-experimental study designs will be important. However, given the context of program implementation, individual-level randomization is often not feasible, which then requires use of quasi-experimental designs, including use of cluster randomization, stepped-wedge, or interrupted time series designs to perform optimal evaluation. Furthermore, use of mixed or qualitative methods are integral to program evaluation.
Flaws to Avoid in the Analyses
High-quality data collection is critical to all research, including program evaluation. Poor data acquisition may introduce bias, resulting in misrepresentation of the study findings. Conventional statistical analyses are often not sufficient for program evaluation, as study designs often require less conventional approaches (eg, quasi-experimental) and include multilevel correlation (eg, repeated measurements or individuals clustered within larger entities like departments or institutions). Thus, use of multilevel multivariable modeling approaches are often necessary for robust and valid analyses of programmatic interventions. Moreover, implementation of qualitative methods requires sound contextual evaluation to understand themes identified during interviews or focus groups. Finally, failure to standardize and pilot surveys and ideally validate survey items may similarly result in flawed interpretations.
Where to Find More Information
There are educational forums outside medical education that provide means for analysis of programs. The What Works Clearinghouse, Comprehensive School Reform Quality Center, and the Best Evidence Encyclopedia are some of the larger initiatives to produce a forum for evidence-based educational research to better inform policy. The Campbell Collaboration is an international society that performs systematic reviews of research and publishes these results in academic journals.
In medical education there are more limited forums that exist to synthesize information (ie, best evidence in medical education) and provide educators with best practices for education. However, the consistency and rigor for medical education is early in its evolution. Society’s like the Association for Surgical Education and the Association of Program Directors in Surgery provide venues for discussion and presentation of educational research. Additional resources include Advancing Surgical Education: Innovation and Change in Professional Education,3 The Systematic Design of Instruction,4 and The Practice of Health Program Evaluation.5