WARNING: This post has been written from my own research into standard setting, and how I think it will be applied to the new fellowship exam. ACEM has supplied no information on the specifics of how these methods are being used so I cannot guarantee that my opinion accurately reflects the process being used in the new exam. This post will be updated as new information comes to hand.
Whilst it may seem confusing, or unnecessarily complex, the new marking method is considered to be an “industry standard”. ACEM has used it for the MCQ for several years and it fits the new exam format. There are over 50 potential marking methods to choose from and they had to choose one. This was the best choice based on current modern educational standards and experience.
The old “the pass mark is 50%” methods, that we all grew up with, that are indelibly ingrained in our brains as the “norm” for marking exams, are nowadays discredited and indefensible, based on the inability to take into account the inevitable variation in difficulty of the exam as a whole from one exam to the next, and the inability to select out so-called “rogue questions” that the whole group performs poorly on, indicating that it’s more likely a problem with the question rather than the candidates.
So instead of pre-set pass marks, each section of the new exam is now “standard set”, the MCQ and SAQ using the Angoff method, and the OSCE using a “post-hoc” method.
So, there is no pre-determined pass mark, and there is no fixed pass rate.
What is “standard setting”?
In a nutshell, the examiners determine what the standard should be (for example, what minimum standard could be expected of a new/junior consultant), then a pass mark (or “cut score”) is set after a pre-defined objective process. The actual process being used by ACEM has not been revealed, but usually it would involve specific training of the examiners on how to apply this method, and multiple rounds of ratings with feedback and discussion in between rounds to improve accuracy.
It’s worth noting that “standard setting is a philosophical and policy making activity, whilst setting a cut score is the operationalisation of that policy. While the Angoff and other methods are often referred to as methods for standard setting, they are actually used once standards, or a useful form of the standards, have been established”. (Reference)
What is the “Angoff method” and why is it being used?
William H. Angoff was an American research scientist whose area of expertise was was measurement in education and psychometrics. The “Angoff Method” refers to techniques for standard setting he described in his famous publication Scales Norms and Equivalent Scores and which was elaborated on in the subsequent book Educational Measurement with co-author Robert Thorndike.
Despite being written in the 1970’s it seems these two references are the cornerstones of modern standard setting.
The following is taken from: Validating standards-based test score interpretations and is one of the better summaries I could find of the Angoff, and Modified Angoff methods:
Angoff’s original proposal for a standard setting method was as follows:
A systematic procedure for deciding on the minimum raw scores for passing and honors might be developed as follows: keeping the hypothetical “minimally acceptable person” in mind, one could go through the test item by item and decide whether such a person could answer correctly each item under consideration. If a score of one is given for each item answered correctly by the hypothetical person and a score of zero is given for each item answered incorrectly by that person, the sum of the item scores will equal the raw score earned by the “minimally acceptable person”. With a number of judges independently making these judgments it would be possible to decide by consensus on the [cut scores] without actually administering the test. If desired, the results of this consensus could later be compared with the number and percentage of examinees who actually earned passing and honors grades.(Thorndike & Angoff, 1971, pp. 514-515)
In a footnote to the first of the two paragraphs just quoted, Angoff suggested that rather than saying which items the “minimally acceptable person” would answer correctly, judges could state the probability that each item would be answered correctly. This suggestion is incorporated into most procedures now referred to as “modified Angoff” methods.
The Angoff method has evolved since it was first proposed. Some of the modifications featured in standard-setting exercises include elaboration of panelists’ (judges’) training, changes in panelists’ instructions, incorporation of multiple rounds of ratings with feedback and discussion between rounds, segmentation of panelists into subpanels to provide replicated cut score estimates so that standard errors can be calculated, and the addition of post-task corrections, such as corrections for guessing when tests are multiple choice. Nonetheless, the judgment locus – perceived skill requirement of single test items – remains an essential aspect of the Angoff method.
It has been recognised that the effectiveness of this sort of standard setting method relies heavily on the judgment of the people setting the standard. Also, the “probability prediction” method (i.e. assigning a probability that a minimum-standard candidate will pass a given question”) to me seems wildly open to individual opinion and variability especially given the subject matter that our exam entails.
There’s another description of the Angoff method in: Passing Scores A Manual for Setting Standards of Performance in Educational and Occupational Tests (page 24)
Hopefully the college will eventually reveal more about the specifics of how the Angoff method is being used for standard setting in the fellowship exam.
What is post-hoc standard setting?
It sounds insane that for the OSCE, the examiners will decide the pass mark after the exam. However there is apparently some theory behind this method of setting a “cut score”.
This method recognises several of the errors that can occur in assessment, and by “detecting and adjusting unreliable ratings, student marks become a more robust representation of actual performance, as the reliability of ratings is significantly increased”.
For example post hoc analysis encourages discussion between standard setters and helps to minimise subjective errors of standard setters when judging student ability, and if there are defective items in a test, the theory is that students should not be held accountable for them.
This method seems far more complex than the Angoff method, and is explained in a recent medical education article, that I’ve read a couple of times and still don’t quite understand. If you’re the type who wants information about the nitty gritty details of the exam process, read this:
It is a very recent (2015), medical education based description of post hoc item analysis from the UK, and is about as relevant and succinct an article you’ll find on the subject.
What has ACEM revealed about this process?
ACEM has revealed that OSCE examiners are undergoing specific training including workshops, training on OSCE theory and the rationale for change, and performing calibration exercises to determine their accuracy and what information they base their judgements on, as well as running mock OSCE’s, they also get hands on experience with rating forms, equipment and use of simulated patients. One would hope that this will improve their judgement as standard setters, and make the process as reliable and valid as possible.
I assume the examiners who designed and standard set the written exam underwent a similarly rigorous process, however no-one seems to know, and those on the inside seem unfathomably reluctant to speak publicly about it. Given the fallout from the written exam, I think it would be in everyone’s best interest if the processes were made more transparent.
Hopefully more information will come to light about the specifics of the Angoff and post-hoc methods being used by ACEM, so trainees can more fully understand the mechanics of the new exam, as well as gaining a better understanding of the standard required to pass and the methods being used to judge their efforts.
It may also help reassure those distraught candidates from the recent written exam that despite its apparent deficiencies, the actual marking process is as modern, valid, and reliable as possible, and should they fail, a lot of though will have gone into the setting of the cut score compared with an arbitrarily set pass mark.
As soon as I learn more I’ll update this post.
It appears ACEM is using a “Modified Angoff” standard setting process, with a small amount of information here.
Scroll down to: “Fellowship Exam Standard Setting”
Apparently “There are different methods for standard setting, each described in educational literature”, but there are no references provided.