Home Print this page Email this page
Users Online: 88
Home About us Editorial board Search Ahead of print Current issue Archives Submit article Instructions Subscribe Contacts Login 

 Table of Contents  
REVIEW ARTICLE
Year : 2017  |  Volume : 5  |  Issue : 1  |  Page : 8-11

Standard setting in objective structured clinical examination: Assigning a pass/fail cut score


1 Department of Training and Supervision, Saudi Commission for Health Specialties, Riyadh, Saudi Arabia
2 Medical Intern, King Saud bin Abdulaziz University for Health Sciences, Jeddah, Saudi Arabia
3 Department of Oncology, National Guard Health Affairs, Riyadh, Saudi Arabia

Date of Web Publication20-Jan-2017

Correspondence Address:
Abdullah Alnemari
King Saud bin Abdulaziz University for Health Sciences, Riyadh
Saudi Arabia
Login to access the Email id


DOI: 10.4103/2468-6360.198803

Rights and Permissions
  Abstract 

Objective structured clinical examination (OSCE) has been considered a standard assessment method since its introduction; and as with every assessment tool, OSCE has its advantages and disadvantages. To have OSCE as a reliable method to evaluate examinees, its standards must be set according to certain criteria. Standard setting is one of the methods developed to set the expected pass or fail cut score in an OSCE station. This review article is based on multiple studies that have been proposed to evaluate the reliability and validity of OSCE. There is no single method that is considered best for all testing situations. The choice of a method will depend on what kind of judgements you can get and how much the experience they have in the standing method. All methods of standard setting mentioned are required for judgement. All the standard methods were considered subjective methods. Yet once a standard method has been set, the decisions based on that method can be made objectively. Instead of a separate set of judgements for each applicant, we would get the same set of judgements applied to all applicants. Standards cannot be objectively determined, but they can be objectively applied.

Keywords: Absolute, objective structured clinical examination, relative, standard setting


How to cite this article:
Munshi F, Alnemari A, Al-Jifree H, Alshehri A. Standard setting in objective structured clinical examination: Assigning a pass/fail cut score. J Health Spec 2017;5:8-11

How to cite this URL:
Munshi F, Alnemari A, Al-Jifree H, Alshehri A. Standard setting in objective structured clinical examination: Assigning a pass/fail cut score. J Health Spec [serial online] 2017 [cited 2017 May 28];5:8-11. Available from: http://www.thejhs.org/text.asp?2017/5/1/8/198803


  Introduction Top


Objective structured clinical examination (OSCE) is defined as an educational tool that was made to assess the clinical skills of health professionals.[1] OSCE has become a standard assessment method since its introduction. It consists of multiple stations that are set to test examinees' different clinical skills within a given period of time.[1],[2] The evaluated clinical skills can range from history taking to more complex integrated clinical encounters. The assessment is done by either an examiner or a patient.[3] As with every assessment tool, OSCE has advantages and disadvantages. Its advantages are objectivity, versatility and reproducibility. On the other hand, cost and human resources are somehow problematic.[4],[5] OSCE's skeleton consists of examiners, objective scoring checklist and patients, whether actual or standardised.[6] OSCE is designed to evaluate the examinees' ability to 'show how' they carry out different clinical skills, which constitute an important component of Miller's Pyramid of Competence.[7]

The test score obtained in an OSCE station is a piece of information about the examinee. This score is used to make high-stake decisions in some instances, e.g. who gets promoted and who does not. Standard setting is one of the methods used to set the expected pass or fail cut score in an OSCE station. It is a critical part in summative high-stake examinations such as licensing and certification testing.[8],[9],[10]

One of the two main types of standard setting is relative standards, which is also called norm-referenced standards. They depend on the performance of the number of examinees or test-takers.[11],[12] Basically, candidates will be considered pass or fail based on the performance of a reference group. In comparison, absolute standards rely on judgements about test questions. Standards are set based on the contents that the examinees are required to know. First, the pass scores that are selected based on relative standards that vary since they are derived from the performance of a set of examinees. This implies that pass scores will change due to differences among the examined groups of candidates. Second, norm-referenced standards do not provide the examiners with a clear picture in terms of what the examinees are required to know.[13],[14],[15]


  Methods for Standard Setting Top


Assigning a pass score would be an easy task if all test-takers clearly fell into a perfect score and a near-the-bottom score. Unfortunately, this is not the situation in real life, and a clear-cut score needs to be assigned for each test which defines how much is enough.

Standards can be absolute or relative. In a relative standard, the reliance is on a comparison between individuals. An example of a relative standard would be the top 10% of test-takers that are good enough. While on the other hand, absolute standards do not rely on individual comparisons.

Absolute standards are constituted by judgemental methods (examinee-centred), empirical methods (test-centred) or combination methods. Empirical methods need examinee test data as a item of the standard setting manner, such as the widely used Angoff's method.[16],[17]

On the other hand, judgemental methods scrutinise everyone's test items so they can evaluate how borderline the candidate would perform on each item and how if the candidate is minimally competent. An example of this is the borderline regression method (BRM).[18],[19]


  Empirical Absolute Method (Based on Test Questions) Top


This category has been proposed and processed by Angoff and others. In this method, experts judge every checklist item. This is made to ease the defence of the passing score results. The Angoff strategy considers the first of the absolute method types and has proven to be the most successful method.[20]

Angoff's method is considered an easy method to use compared with others. Multiple studies support its use in clinical competency assessment.[21] In addition, this method provides a clear cut score that differentiates between passing or failing; however, some limitations have been reported. One of which is the need for experienced examiners on Angoff's method.[22] In one study, experienced judges were compared with inexperienced ones. The results that were yielded by the inexperienced judges varied significantly with those who had previous experience.[23] This calls for increasing the familiarity of judges to different methods.

One major disadvantage is to increase the number of judges to increase the reliability of the Angoff's method. Due to that, Angoff's method is time-consuming and costly in comparison with other methods, such as BRM.[24]

For example, a checklist developer in OSCE will ask experts in Angoff's method to evaluate the items of this checklist. They will rate the items based on whether or not a least-qualified test-taker will answer the specific item right or wrong. Then, all the experts would gain access to each other's rating to review it. Afterwards, they will rate the items for another round. Finally, the second ratings will be averaged to determine the final cut score for the test.[25],[26]

It is a necessity that the judges must understand what the test is supposed to measure. Angoff's and other standard setting methods such as Ebel's and Nedelsky require the judges to review the test in detail.[25],[27]

It consists of five steps to implement Angoff's standard setting:

  • The examiner will review the marginal line of the applicants and make models of borderline applicants
  • Examiners will decide the characteristics of the applicant's marginal level
  • Each examiner will evaluate the fulfilment of the marginal line of applicants for each object or rate it as percentage (0 - 100%)
  • Their estimations are noted down
  • Examiners will record the mean to define the score of passing on the achieved test.



  Judgemental Absolute Methods (Based on Examinees) Top


It is also known as BRM. This category of standard setting methods has been described to reduce resources' consumption, which is considered quite troublesome in the other methods. Furthermore, BRM has the benefits of making a number of indices that will be helpful in assessing the OSCE stations' quality, since it is based on the global grade and the checklist score. Therefore, assessing its reliability is of crucial significance.[28]

Judgements are basically inferred merely on individual test-takers. The method can only be of use when experts right away observe the test performance. If simulated patients are trained appropriately, they may be taken into consideration by the content experts in communion and interpersonal proficiencies. The observing examiners' global ratings are utilised to set the checklist items score that will be used as a standard for pass.

In fact, many literatures have studied Angoff's and BRM for reliability. The BRM method produces standards that are more reliable for an OSCE examination than if using Angoff's method. Unfortunately, reality check can only enhance the credibility method of Angoff's procedure but carries no benefit towards improving its reliability.[29]

The benefits of BRM method include hypothetical ease; in fact, the examiners examine the approximately well-known examinees. Drawbacks of the BRM include the fact that it needs big panel of the examiners, big number of candidates and longer time for processing.[30]

In this method, the evaluator will give the examinee a mark for each objective and for their overall performance. For instance, a 'breaking bad news' station could have multiple objectives: Ensuring the setting is appropriate, assessing the patient's knowledge, breaking the bad news empathically and confirming the patient's understanding of the condition. Once the marks on the objectives of all the examinees have been assigned, the mean will be drafted and compared with the midpoint of the overall grades. Then, those scores will be plotted on a graph. The pass mark for the station can be determined as the point where the two lines will meet.[31],[32]

Procedures

  • The examiners must be adjusted by familiarising them with the examination station or patient case and the checklist
  • Examiners should immediately note the test score of each applicant. Every examiner must note more than one applicant on the same examination station rather than pursuing the applicant among several examination stations. The fulfilments of the test noted may, with proper training, contain the fulfilment products as particular checklist points
  • The examiners who observe will provide a total rating of comprehensive fulfilment of every applicant on 3 grades: 3 for pass, 2 for average and 1 for fail
  • The fulfilment is also graded (by the examiners or any rater) utilising checklist points or an evaluation scale
  • The average score of the checklist of marginal applicants will be the passing score for this test.[20]



  Combined Method Top


The Hofstee method was generated to facilitate standard setting in terms of both absolute and relative standards. It requires having trained judges to answer four questions; the lower limit and upper limit failure rate and the lower limit and upper limit passing score. Its main advantage is having the capacity of driving holistic judgement on pass score and failure rates with almost no constraints. The judges can set the performance parameter limits based on their knowledge of the examination's objectives. In addition, the Hofstee method incorporates performance data. In one study that compared different methods for standard setting, the usage of the Hofstee manner has yielded similar and more realistic results when compared with the other standard setting methods.[33]

On the other hand, the Hofstee method is not considered as a primary standard setting, rather a safety check, because of its flexibility and relative ease of use. It can only be of value if we need to check whether the other standard setting method results have met the general expectations of the examiners. Wayne et al., described that the application of the Hofstee method alone in setting the standards is quite problematic, especially when it comes to OSCE.[27] The major issue was that the performance standards that were yielded following the Hofstee method were too strict. Thus, this may lead to a high number of failures.

For example, a group of judges experienced in the Hofstee method would review the OSCE checklist and determine the lower limit and upper limit failure rates and the lower limit and upper limit passing scores. Then, the mean for each of the four questions is computed. Those means are plotted on a graph as two points, the first being the minimum passing score and maximum fail rate and the second being the maximum passing score and minimum fail rate. After that, a straight line is drawn between those points and the cumulative frequency is added to the figure. Finally, the point where the cumulative distribution of scores crosses the line is marked and the horizontal axis of this point is the cut score.[27],[34]


  Conclusion Top


An overview of common absolute and relative standard setting methods used in OSCE has been presented in this review. The question that follows is; which is the best method to use, a relative or an absolute? The answer to this is contextual and depends on the nature and purpose of the test. It is crucial to check whether the standard setting method results have met the general expectations of the examiners or not.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

 
  References Top

1.
Zayyan M. Objective structured clinical examination: The assessment of choice. Oman Med J 2011;26:219-22.  Back to cited text no. 1
    
2.
Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. Lancet 2001;357:945-9.  Back to cited text no. 2
    
3.
Miller GE. The assessment of clinical skills/competence/performance. Acad Med 1990;65 9 Suppl:S63-7.  Back to cited text no. 3
    
4.
Kaufman DM, Mann KV, Muijtjens AM, van der Vleuten CP. A comparison of standard-setting procedures for an OSCE in undergraduate medical education. Acad Med 2000;75:267-71.  Back to cited text no. 4
    
5.
Dent JA, Harden RM. A Practical Guide for Medical Teachers. 4th ed. London: Churchill Livingstone; 2013. p. 292-8.  Back to cited text no. 5
    
6.
Ben-David MF. AMEE Guide No. 18: Standard setting in student assessment. Med Teach 2000;22:120-30.  Back to cited text no. 6
    
7.
Swanwick T. Understanding Medical Education. 2nd ed. New York: John Wiley & Sons Inc.; 2014. p. 305-12.  Back to cited text no. 7
    
8.
Colliver JA, Swartz MH, Robbs RS. The effect of examinee and patient ethnicity in clinical-skills assessment with standardized patients. Adv Health Sci Educ Theory Pract 2001;6:5-13.  Back to cited text no. 8
    
9.
Cohen DS, Colliver JA, Marcy MS, Fried ED, Swartz MH. Psychometric properties of a standardized-patient checklist and rating-scale form used to assess interpersonal and communication skills. Acad Med 1996;71 1 Suppl:S87-9.  Back to cited text no. 9
    
10.
Hudson JN, Rienits H, Corrin L, Olmos M. An innovative OSCE clinical log station: A quantitative study of its influence on log use by medical students. BMC Med Educ 2012;12:111.  Back to cited text no. 10
    
11.
Reece A, Chung EM, Gardiner RM, Williams SE. Competency domains in an undergraduate objective structured clinical examination: Their impact on compensatory standard setting. Med Educ 2008;42:600-6.  Back to cited text no. 11
    
12.
Searle J. Defining competency – The role of standard setting. Med Educ 2000;34:363-6.  Back to cited text no. 12
    
13.
Schindler N, Corcoran J, DaRosa D. Description and impact of using a standard-setting method for determining pass/fail scores in a surgery clerkship. Am J Surg 2007;193:252-7.  Back to cited text no. 13
    
14.
Stern DT, Ben-David MF, De Champlain A, Hodges B, Wojtczak A, Schwarz MR. Ensuring global standards for medical graduates: A pilot study of international standard-setting. Med Teach 2005;27:207-13.  Back to cited text no. 14
    
15.
Chesser AM, Laing MR, Miedzybrodzka ZH, Brittenden J, Heys SD. Factor analysis can be a useful standard setting tool in a high stakes OSCE assessment. Med Educ 2004;38:825-31.  Back to cited text no. 15
    
16.
Christensen L, Karle H, Nystrup J. Process-outcome interrelationship and standard setting in medical education: The need for a comprehensive approach. Med Teach 2007;29:672-7.  Back to cited text no. 16
    
17.
Carney PA, Ogrinc G, Harwood BG, Schiffman JS, Cochran N. The influence of teaching setting on medical students' clinical skills development: Is the academic medical center the gold standard? Acad Med 2005;80:1153-8.  Back to cited text no. 17
    
18.
Wilkinson TJ, Newble DI, Frampton CM. Standard setting in an objective structured clinical examination: Use of global ratings of borderline performance to determine the passing score. Med Educ 2001;35:1043-9.  Back to cited text no. 18
    
19.
Richter Lagha RA, Boscardin CK, May W, Fung CC. A comparison of two standard-setting approaches in high-stakes clinical performance assessment using generalizability theory. Acad Med 2012;87:1077-82.  Back to cited text no. 19
    
20.
Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med 2006;18:50-7.  Back to cited text no. 20
    
21.
Norcini JJ. Setting standards on educational tests. Med Educ 2003;37:464-9.  Back to cited text no. 21
    
22.
Cusimano MD, Rothman AI. The effect of incorporating normative data into a criterion-referenced standard setting in medical education. Acad Med 2003;78 10 Suppl: S88-90.  Back to cited text no. 22
    
23.
Boursicot KA, Roberts TE, Pell G. Using borderline methods to compare passing standards for OSCEs at graduation across three medical schools. Med Educ 2007;41:1024-31.  Back to cited text no. 23
    
24.
Schoonheim-Klein M, Muijtjens A, Habets L, Manogue M, van der Vleuten C, van der Velden U. Who will pass the dental OSCE? Comparison of the Angoff and the borderline regression standard setting methods. Eur J Dent Educ 2009;13:162-71.  Back to cited text no. 24
    
25.
Livingston SA, Zieky MJ. Passing Scores: A manual for Setting Standards of Performance on Educational and Occupational Tests. Princeton: Educational Testing Service; 1982. p. 24-42.  Back to cited text no. 25
    
26.
Senthong V, Chindaprasirt J, Sawanyawisuth K, Aekphachaisawat N, Chaowattanapanit S, Limpawattana P, et al. Group versus modified individual standard-setting on multiple-choice questions with the Angoff method for fourth-year medical students in the internal medicine clerkship. Adv Med Educ Pract 2013;4:195-200.  Back to cited text no. 26
    
27.
Wayne DB, Barsuk JH, Cohen E, McGaghie WC. Do baseline data influence standard setting for a clinical skills examination? Acad Med 2007;82 10 Suppl: S105-8.  Back to cited text no. 27
    
28.
Hejri SM, Jalili M, Muijtjens AM. Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination. J Res Med Sci 2013;18:887-91.  Back to cited text no. 28
    
29.
Kramer A, Muijtjens A, Jansen K, Düsman H, Tan L, van der Vleuten C. Comparison of a rational and an empirical standard setting procedure for an OSCE. Objective structured clinical examinations. Med Educ 2003;37:132-9.  Back to cited text no. 29
    
30.
Näsström G, Nyström P. A comparison of two different methods for setting performance standards for a test with constructed-response items. Pract Assess Res Eval 2008;13:1-12.  Back to cited text no. 30
    
31.
Cusimano MD, Rothman AI. Consistency of standards and stability of pass/fail decisions with examinee-based standard-setting methods in a small-scale objective structured clinical examination. Acad Med 2004;79 10 Suppl: S25-7.  Back to cited text no. 31
    
32.
Humphrey-Murto S, MacFadyen JC. Standard setting: A comparison of case-author and modified borderline-group methods in a small-scale OSCE. Acad Med 2002;77:729-32.  Back to cited text no. 32
    
33.
Eckes T. Examinee-centered standard setting for large-scale assessments: The prototype group method. Psychol Test Assess Model 2012;54:257-83.  Back to cited text no. 33
    
34.
Frischknecht AC, Boehler ML, Schwind CJ, Brunsvold ME, Gruppen LD, Brenner MJ, et al. How prepared are your interns to take calls? Results of a multi-institutional study of simulated pages to prepare medical students for surgery internship. Am J Surg 2014;208:307-15.  Back to cited text no. 34
    




 

Top
 
 
  Search
 
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
Abstract
Introduction
Methods for Stan...
Empirical Absolu...
Judgemental Abso...
Combined Method
Conclusion
References

 Article Access Statistics
    Viewed878    
    Printed10    
    Emailed0    
    PDF Downloaded97    
    Comments [Add]    

Recommend this journal


[TAG2]
[TAG3]
[TAG4]