DESIGNING TESTS FROM QUESTION POOLS WITH EFFICIENCY, RELIABILITY, AND INTEGRITY

Main Article Content

Mark Murdock
Matthew Brenneman

Keywords

Classroom Asssessment, Cheating, Testing Procedures

Abstract

Objective: This article provides resources for educators performing online assessments to curb cheating. We will focus on a method to prevent cheating known as “peer-to-peer sharing” (P2PS), where students take the test together without supervision. Using probability theory, we develop the framework for rigorously analyzing P2PS for given parameters of question pool size, assessment size, and class size.


Methods: The development was as follows: (1) We define “integrity” and “reliability” of online assessments in the context of P2PS; (2) we derive formulas for both reliability and integrity; (3) we address the question of how large a question bank should be to attain a specified level of reliability and integrity, paying special attention of efficiency: (4) we provide a table with sample results for common classroom scenarios to help educators devise efficient question banks; and (5) we include summary charts of cheating methods and strategies. Theoretical models are used to characterize the probabilistic scenario we explore. We use the cumulative distribution function of the hypergeometric function to model this relationship. This model was verified using computer simulations.


Results: Probability theory was used to both define and derive formulas for “reliability” and “integrity” of an online assessment. Charts were created that include question pool size, question number, integrity and reliability for a given student class size.


Conclusion: Educators can use the Tables in this article to determine a reasonable question pool size and amount of questions for an assessment to obtain the integrity and reliability they desire given class size. Summary charts of cheating methods and strategies are included from the literature to provide resources for further exploration of management of cheating and promoting test integrity and reliability. It is difficult to achieve both reliability and integrity if a large portion of the class cheats unless the question pool is very large.

Downloads

Download data is not yet available.
Abstract 9 | DESIGNING TESTS Downloads 3

References

1. Al-Saleem S, Ullah H. Security considerations and recommendations in computer-based testing. Sci World J 2014;ArticleID562787:1-7. DOI:10.1155/2014/562787
2. Wilcox B, Pollock, S. Investigating students’ behavior and performance in online conceptual assessment. Phys Rev Phys Educ Res 2014;15:020145:1-9. DOI: 10.1103/PhysRevPhysEducRes.15.020145
3. Severo ME. Item pre-knowledge true prevalence in clinical anatomy - application of gated item response theory model. BMC Med Educ 2019;19(284):1-10. DOI: 10.1186/s12909-019-1710-z
4. Wu YE et al. Can priming legal consequences and the concept of honesty decrease cheating during examinations? Frontiers Psych 2020;10: Article 2887:Jan 21:1-7. doi: 10.3389/fpsyg.2019.02887
5. Zimmermann S, Klusmann D, Hampe W. Are exam questions known in advance? Using local dependence to detect cheating. PLOS One 2016;Dec1:1-13. DOI:10.1371. DOI:10.1371/journal.pone.0167545
6. Joncas S, St-Onge C, Bourque S, Farand, P. Re-using questions in classroom-based assessment: An exploratory study at the undergraduate medical education level. Perspect Med Educ 2018;7:373-378. DOI: 10.1007/s40037-018-0482-1
7. Delgado AE. Are surface and deep learning approaches associated with study patterns and choices among medical students? A cross-sectional study. Sao Paulo Med J 2018;136(5):414-420. DOI: 10.1590/1516-3180.2018.0200060818
8. Orosz G. Teacher enthusiasm: a potential cure of academic cheating. Frontiers in Psych 2015;6(March), Article318:1-12. DOI: 10.3389/fpsyg.2015.00318
9. McManus I, Lissauer T, Williams S. Detecting cheating in written medical examinations by statistical analysis of similarity of answers: pilot study. Br Med J 2005;330:May7:1064-6. DOI: 10.1136/bmj.330.7499.1064.BMJ. 2005.
10. Munk P. Doctors cheating on exams: a tempest in a teacup? Canadian Assoc of Rad J 2012;63:77-78. Editorial. DOI:10.1016/j.carj.20l2.03.005
11. Chang S, Ansley TA. Comparative study of item exposure control methods in compurterized adaptive testing. J Ed Measurement 2003;40(1):71-103.
12. Casella G, Berge R. Statistical inference. 2nd Edition. Duxbury Thompson Learning. 2002