|
|
REVIEW ARTICLE |
|
Year : 2017 | Volume
: 5
| Issue : 1 | Page : 8-11 |
|
Standard setting in objective structured clinical examination: Assigning a pass/fail cut score
Fadi Munshi1, Abdullah Alnemari2, Hatim Al-Jifree3, Abdulaziz Alshehri2
1 Department of Training and Supervision, Saudi Commission for Health Specialties, Riyadh, Saudi Arabia 2 Medical Intern, King Saud bin Abdulaziz University for Health Sciences, Jeddah, Saudi Arabia 3 Department of Oncology, National Guard Health Affairs, Riyadh, Saudi Arabia
Date of Web Publication | 20-Jan-2017 |
Correspondence Address: Abdullah Alnemari King Saud bin Abdulaziz University for Health Sciences, Riyadh Saudi Arabia
  | Check |
DOI: 10.4103/2468-6360.198803
Objective structured clinical examination (OSCE) has been considered a standard assessment method since its introduction; and as with every assessment tool, OSCE has its advantages and disadvantages. To have OSCE as a reliable method to evaluate examinees, its standards must be set according to certain criteria. Standard setting is one of the methods developed to set the expected pass or fail cut score in an OSCE station. This review article is based on multiple studies that have been proposed to evaluate the reliability and validity of OSCE. There is no single method that is considered best for all testing situations. The choice of a method will depend on what kind of judgements you can get and how much the experience they have in the standing method. All methods of standard setting mentioned are required for judgement. All the standard methods were considered subjective methods. Yet once a standard method has been set, the decisions based on that method can be made objectively. Instead of a separate set of judgements for each applicant, we would get the same set of judgements applied to all applicants. Standards cannot be objectively determined, but they can be objectively applied. Keywords: Absolute, objective structured clinical examination, relative, standard setting
How to cite this article: Munshi F, Alnemari A, Al-Jifree H, Alshehri A. Standard setting in objective structured clinical examination: Assigning a pass/fail cut score. J Health Spec 2017;5:8-11 |
How to cite this URL: Munshi F, Alnemari A, Al-Jifree H, Alshehri A. Standard setting in objective structured clinical examination: Assigning a pass/fail cut score. J Health Spec [serial online] 2017 [cited 2021 Jan 16];5:8-11. Available from: https://www.thejhs.org/text.asp?2017/5/1/8/198803 |
Introduction | |  |
Objective structured clinical examination (OSCE) is defined as an educational tool that was made to assess the clinical skills of health professionals.[1] OSCE has become a standard assessment method since its introduction. It consists of multiple stations that are set to test examinees' different clinical skills within a given period of time.[1],[2] The evaluated clinical skills can range from history taking to more complex integrated clinical encounters. The assessment is done by either an examiner or a patient.[3] As with every assessment tool, OSCE has advantages and disadvantages. Its advantages are objectivity, versatility and reproducibility. On the other hand, cost and human resources are somehow problematic.[4],[5] OSCE's skeleton consists of examiners, objective scoring checklist and patients, whether actual or standardised.[6] OSCE is designed to evaluate the examinees' ability to 'show how' they carry out different clinical skills, which constitute an important component of Miller's Pyramid of Competence.[7]
The test score obtained in an OSCE station is a piece of information about the examinee. This score is used to make high-stake decisions in some instances, e.g. who gets promoted and who does not. Standard setting is one of the methods used to set the expected pass or fail cut score in an OSCE station. It is a critical part in summative high-stake examinations such as licensing and certification testing.[8],[9],[10]
One of the two main types of standard setting is relative standards, which is also called norm-referenced standards. They depend on the performance of the number of examinees or test-takers.[11],[12] Basically, candidates will be considered pass or fail based on the performance of a reference group. In comparison, absolute standards rely on judgements about test questions. Standards are set based on the contents that the examinees are required to know. First, the pass scores that are selected based on relative standards that vary since they are derived from the performance of a set of examinees. This implies that pass scores will change due to differences among the examined groups of candidates. Second, norm-referenced standards do not provide the examiners with a clear picture in terms of what the examinees are required to know.[13],[14],[15]
Methods for Standard Setting | |  |
Assigning a pass score would be an easy task if all test-takers clearly fell into a perfect score and a near-the-bottom score. Unfortunately, this is not the situation in real life, and a clear-cut score needs to be assigned for each test which defines how much is enough.
Standards can be absolute or relative. In a relative standard, the reliance is on a comparison between individuals. An example of a relative standard would be the top 10% of test-takers that are good enough. While on the other hand, absolute standards do not rely on individual comparisons.
Absolute standards are constituted by judgemental methods (examinee-centred), empirical methods (test-centred) or combination methods. Empirical methods need examinee test data as a item of the standard setting manner, such as the widely used Angoff's method.[16],[17]
On the other hand, judgemental methods scrutinise everyone's test items so they can evaluate how borderline the candidate would perform on each item and how if the candidate is minimally competent. An example of this is the borderline regression method (BRM).[18],[19]
Empirical Absolute Method (Based on Test Questions) | |  |
This category has been proposed and processed by Angoff and others. In this method, experts judge every checklist item. This is made to ease the defence of the passing score results. The Angoff strategy considers the first of the absolute method types and has proven to be the most successful method.[20]
Angoff's method is considered an easy method to use compared with others. Multiple studies support its use in clinical competency assessment.[21] In addition, this method provides a clear cut score that differentiates between passing or failing; however, some limitations have been reported. One of which is the need for experienced examiners on Angoff's method.[22] In one study, experienced judges were compared with inexperienced ones. The results that were yielded by the inexperienced judges varied significantly with those who had previous experience.[23] This calls for increasing the familiarity of judges to different methods.
One major disadvantage is to increase the number of judges to increase the reliability of the Angoff's method. Due to that, Angoff's method is time-consuming and costly in comparison with other methods, such as BRM.[24]
For example, a checklist developer in OSCE will ask experts in Angoff's method to evaluate the items of this checklist. They will rate the items based on whether or not a least-qualified test-taker will answer the specific item right or wrong. Then, all the experts would gain access to each other's rating to review it. Afterwards, they will rate the items for another round. Finally, the second ratings will be averaged to determine the final cut score for the test.[25],[26]
It is a necessity that the judges must understand what the test is supposed to measure. Angoff's and other standard setting methods such as Ebel's and Nedelsky require the judges to review the test in detail.[25],[27]
It consists of five steps to implement Angoff's standard setting:
- The examiner will review the marginal line of the applicants and make models of borderline applicants
- Examiners will decide the characteristics of the applicant's marginal level
- Each examiner will evaluate the fulfilment of the marginal line of applicants for each object or rate it as percentage (0 - 100%)
- Their estimations are noted down
- Examiners will record the mean to define the score of passing on the achieved test.
Judgemental Absolute Methods (Based on Examinees) | |  |
It is also known as BRM. This category of standard setting methods has been described to reduce resources' consumption, which is considered quite troublesome in the other methods. Furthermore, BRM has the benefits of making a number of indices that will be helpful in assessing the OSCE stations' quality, since it is based on the global grade and the checklist score. Therefore, assessing its reliability is of crucial significance.[28]
Judgements are basically inferred merely on individual test-takers. The method can only be of use when experts right away observe the test performance. If simulated patients are trained appropriately, they may be taken into consideration by the content experts in communion and interpersonal proficiencies. The observing examiners' global ratings are utilised to set the checklist items score that will be used as a standard for pass.
In fact, many literatures have studied Angoff's and BRM for reliability. The BRM method produces standards that are more reliable for an OSCE examination than if using Angoff's method. Unfortunately, reality check can only enhance the credibility method of Angoff's procedure but carries no benefit towards improving its reliability.[29]
The benefits of BRM method include hypothetical ease; in fact, the examiners examine the approximately well-known examinees. Drawbacks of the BRM include the fact that it needs big panel of the examiners, big number of candidates and longer time for processing.[30]
In this method, the evaluator will give the examinee a mark for each objective and for their overall performance. For instance, a 'breaking bad news' station could have multiple objectives: Ensuring the setting is appropriate, assessing the patient's knowledge, breaking the bad news empathically and confirming the patient's understanding of the condition. Once the marks on the objectives of all the examinees have been assigned, the mean will be drafted and compared with the midpoint of the overall grades. Then, those scores will be plotted on a graph. The pass mark for the station can be determined as the point where the two lines will meet.[31],[32]
Procedures
- The examiners must be adjusted by familiarising them with the examination station or patient case and the checklist
- Examiners should immediately note the test score of each applicant. Every examiner must note more than one applicant on the same examination station rather than pursuing the applicant among several examination stations. The fulfilments of the test noted may, with proper training, contain the fulfilment products as particular checklist points
- The examiners who observe will provide a total rating of comprehensive fulfilment of every applicant on 3 grades: 3 for pass, 2 for average and 1 for fail
- The fulfilment is also graded (by the examiners or any rater) utilising checklist points or an evaluation scale
- The average score of the checklist of marginal applicants will be the passing score for this test.[20]
Combined Method | |  |
The Hofstee method was generated to facilitate standard setting in terms of both absolute and relative standards. It requires having trained judges to answer four questions; the lower limit and upper limit failure rate and the lower limit and upper limit passing score. Its main advantage is having the capacity of driving holistic judgement on pass score and failure rates with almost no constraints. The judges can set the performance parameter limits based on their knowledge of the examination's objectives. In addition, the Hofstee method incorporates performance data. In one study that compared different methods for standard setting, the usage of the Hofstee manner has yielded similar and more realistic results when compared with the other standard setting methods.[33]
On the other hand, the Hofstee method is not considered as a primary standard setting, rather a safety check, because of its flexibility and relative ease of use. It can only be of value if we need to check whether the other standard setting method results have met the general expectations of the examiners. Wayne et al., described that the application of the Hofstee method alone in setting the standards is quite problematic, especially when it comes to OSCE.[27] The major issue was that the performance standards that were yielded following the Hofstee method were too strict. Thus, this may lead to a high number of failures.
For example, a group of judges experienced in the Hofstee method would review the OSCE checklist and determine the lower limit and upper limit failure rates and the lower limit and upper limit passing scores. Then, the mean for each of the four questions is computed. Those means are plotted on a graph as two points, the first being the minimum passing score and maximum fail rate and the second being the maximum passing score and minimum fail rate. After that, a straight line is drawn between those points and the cumulative frequency is added to the figure. Finally, the point where the cumulative distribution of scores crosses the line is marked and the horizontal axis of this point is the cut score.[27],[34]
Conclusion | |  |
An overview of common absolute and relative standard setting methods used in OSCE has been presented in this review. The question that follows is; which is the best method to use, a relative or an absolute? The answer to this is contextual and depends on the nature and purpose of the test. It is crucial to check whether the standard setting method results have met the general expectations of the examiners or not.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References | |  |
1. | Zayyan M. Objective structured clinical examination: The assessment of choice. Oman Med J 2011;26:219-22. |
2. | Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet 2001;357:945-9. |
3. | Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990;65 9 Suppl:S63-7. |
4. | Kaufman DM, Mann KV, Muijtjens AM, van der Vleuten CP. A comparison of standard-setting procedures for an OSCE in undergraduate medical education. Acad Med 2000;75:267-71. |
5. | Dent JA, Harden RM. A Practical Guide for Medical Teachers. 4 th ed. London: Churchill Livingstone; 2013. p. 292-8. |
6. | Ben-David MF. AMEE Guide No. 18: Standard setting in student assessment. Med Teach 2000;22:120-30. |
7. | Swanwick T. Understanding Medical Education. 2 nd ed. New York: John Wiley & Sons Inc.; 2014. p. 305-12. |
8. | Colliver JA, Swartz MH, Robbs RS. The effect of examinee and patient ethnicity in clinical-skills assessment with standardized patients. Adv Health Sci Educ Theory Pract 2001;6:5-13. |
9. | Cohen DS, Colliver JA, Marcy MS, Fried ED, Swartz MH. Psychometric properties of a standardized-patient checklist and rating-scale form used to assess interpersonal and communication skills. Acad Med 1996;71 1 Suppl:S87-9. |
10. | Hudson JN, Rienits H, Corrin L, Olmos M. An innovative OSCE clinical log station: A quantitative study of its influence on log use by medical students. BMC Med Educ 2012;12:111. |
11. | Reece A, Chung EM, Gardiner RM, Williams SE. Competency domains in an undergraduate objective structured clinical examination: Their impact on compensatory standard setting. Med Educ 2008;42:600-6. |
12. | Searle J. Defining competency – The role of standard setting. Med Educ 2000;34:363-6. |
13. | Schindler N, Corcoran J, DaRosa D. Description and impact of using a standard-setting method for determining pass/fail scores in a surgery clerkship. Am J Surg 2007;193:252-7. |
14. | Stern DT, Ben-David MF, De Champlain A, Hodges B, Wojtczak A, Schwarz MR. Ensuring global standards for medical graduates: A pilot study of international standard-setting. Med Teach 2005;27:207-13. |
15. | Chesser AM, Laing MR, Miedzybrodzka ZH, Brittenden J, Heys SD. Factor analysis can be a useful standard setting tool in a high stakes OSCE assessment. Med Educ 2004;38:825-31. |
16. | Christensen L, Karle H, Nystrup J. Process-outcome interrelationship and standard setting in medical education: The need for a comprehensive approach. Med Teach 2007;29:672-7. |
17. | Carney PA, Ogrinc G, Harwood BG, Schiffman JS, Cochran N. The influence of teaching setting on medical students' clinical skills development: Is the academic medical center the gold standard? Acad Med 2005;80:1153-8. |
18. | Wilkinson TJ, Newble DI, Frampton CM. Standard setting in an objective structured clinical examination: Use of global ratings of borderline performance to determine the passing score. Med Educ 2001;35:1043-9. |
19. | Richter Lagha RA, Boscardin CK, May W, Fung CC. A comparison of two standard-setting approaches in high-stakes clinical performance assessment using generalizability theory. Acad Med 2012;87:1077-82. |
20. | Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med 2006;18:50-7. |
21. | Norcini JJ. Setting standards on educational tests. Med Educ 2003;37:464-9. |
22. | Cusimano MD, Rothman AI. The effect of incorporating normative data into a criterion-referenced standard setting in medical education. Acad Med 2003;78 10 Suppl: S88-90. |
23. | Boursicot KA, Roberts TE, Pell G. Using borderline methods to compare passing standards for OSCEs at graduation across three medical schools. Med Educ 2007;41:1024-31. |
24. | Schoonheim-Klein M, Muijtjens A, Habets L, Manogue M, van der Vleuten C, van der Velden U. Who will pass the dental OSCE? Comparison of the Angoff and the borderline regression standard setting methods. Eur J Dent Educ 2009;13:162-71. |
25. | Livingston SA, Zieky MJ. Passing Scores: A manual for Setting Standards of Performance on Educational and Occupational Tests. Princeton: Educational Testing Service; 1982. p. 24-42. |
26. | Senthong V, Chindaprasirt J, Sawanyawisuth K, Aekphachaisawat N, Chaowattanapanit S, Limpawattana P, et al. Group versus modified individual standard-setting on multiple-choice questions with the Angoff method for fourth-year medical students in the internal medicine clerkship. Adv Med Educ Pract 2013;4:195-200. |
27. | Wayne DB, Barsuk JH, Cohen E, McGaghie WC. Do baseline data influence standard setting for a clinical skills examination? Acad Med 2007;82 10 Suppl: S105-8. |
28. | Hejri SM, Jalili M, Muijtjens AM. Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination. J Res Med Sci 2013;18:887-91. |
29. | Kramer A, Muijtjens A, Jansen K, Düsman H, Tan L, van der Vleuten C. Comparison of a rational and an empirical standard setting procedure for an OSCE. Objective structured clinical examinations. Med Educ 2003;37:132-9. |
30. | Näsström G, Nyström P. A comparison of two different methods for setting performance standards for a test with constructed-response items. Pract Assess Res Eval 2008;13:1-12. |
31. | Cusimano MD, Rothman AI. Consistency of standards and stability of pass/fail decisions with examinee-based standard-setting methods in a small-scale objective structured clinical examination. Acad Med 2004;79 10 Suppl: S25-7. |
32. | Humphrey-Murto S, MacFadyen JC. Standard setting: A comparison of case-author and modified borderline-group methods in a small-scale OSCE. Acad Med 2002;77:729-32. |
33. | Eckes T. Examinee-centered standard setting for large-scale assessments: The prototype group method. Psychol Test Assess Model 2012;54:257-83. |
34. | Frischknecht AC, Boehler ML, Schwind CJ, Brunsvold ME, Gruppen LD, Brenner MJ, et al. How prepared are your interns to take calls? Results of a multi-institutional study of simulated pages to prepare medical students for surgery internship. Am J Surg 2014;208:307-15. |
|