| Australasian Journal of Educational Technology 2011, 27(6), 896-906. |
AJET 27 |
A flexible, extensible online testing system for mathematics
Tim Passmore, Leigh Brookshaw and Harry Butler
University of Southern Queensland
An online testing system developed for entry-skills testing of first-year university students in algebra and calculus is described. The system combines the open-source computer algebra system Maxima with computer scripts to parse student answers, which are entered using standard mathematical notation and conventions. The answers can involve structures such as: lists, variable-precision-floating-point numbers, integers and algebra. This flexibility allows more sophisticated testing designs than the multiple choice, or exact match, paradigms common in other systems, and the implementation is entirely vendor neutral. Experience using the system and ideas for further development are discussed.
The first system to make use of CAS in this way was AiM (Klai, Kolokolnikov & Van den Bergh, 2000) and its descendant AiM-TtH (Strickland, 2002; Sangwin, 2003) which used the commercial computer algebra system Maple, but it required students to know Maple syntax and its question authoring system required programming expertise (Sangwin & Grove, 2006). Maple has its own proprietary system, MapleTA, while other systems have been developed for various CAS: CalMath uses Mathematica, CABLE (Naismith & Sangwin, 2004) uses Axiom and STACK (Sangwin & Grove, 2006) which uses Maxima. Each of these systems provided the student with an interactive mathematical experience within a virtual learning environment, that is they were tutorial systems as well as testing systems. Tutorial assessment systems have been used to prepare students for standardised testing (Feng, Heffernan & Koedinger, 2009) and in remedial teaching (Wang, 2010).
At the University of Southern Queensland we wanted to use CAA for entry-level testing (Engelbrecht & Harding, 2004) - assessing the core prerequisite, entry-level skills of students undertaking first-year algebra and calculus. First-year cohorts were large in number, up to 900 students, making manual testing and marking impractical. A wide variation in the skill levels of commencing students had been reported (Jourdan, Cretchley & Passmore, 2007) and early detection and intervention was desirable for the weaker students. It was essential that the entry-level testing be done early in the semester, with minimal expense to the student; that immediate feedback be available to identify those who may be at risk without early remediation; and that the test be delivered online, to make it available to off campus students (i.e. distance students). An existing CAA system was deemed unsuitable because it only allowed questions which were either multiple choice, or exact character-match to a given answer. It was also slow to use as LATEX-coded mathematical objects were first rendered as bitmap images that were subsequently re-embedded into HTML to generate the on screen questions. The quality of these images was thus hardware dependent, and often quite poor. A better system was needed, and we developed the Online Testing System (OTS) described below.
Past experience indicated that the most likely areas of difficulty were algebra, functions, trigonometry and inequalities, which suggested immediately the use of a CAS. However, we did not want to burden students with learning any specialised input syntax so early in their university experience, opting instead for an "informal syntax", along the lines proposed by Sangwin and Ramsden (2007), which was as close as possible on a keyboard to standard mathematical notation and convention. Lastly, we had no budget for proprietary software or licence fees, so we needed a vendor neutral system which used existing open source software.
Another advantage of PDF question delivery is that it makes the OTS independent of the word processing or authoring system used by question authors. No programming expertise is required and the questions are displayed exactly as the author intended.
Disadvantages are that students cannot work interactively and check their syntax as with AiM and STACK. Also the question paper is printable and therefore not secure, though it would be possible to encrypt the PDF so that it was not printable. For our particular application to entry-level testing, this was not considered a serious problem as the intention is to eventually build up question banks and randomly generate question papers. More attention would need to be given to these issues if the OTS were applied for more security-sensitive purposes such as examinations.
When checking is complete, the student uploads his or her answers by selecting the Submit button on the form. Their answers are evaluated, marked, the marks totalled and returned to the student's screen with specific feedback. The submitted answers and resulting marks are stored for future processing (e.g. uploading to central grading database) by the OTS. Question authors can optionally specify two types of feedback, one if the question is correct and the other if the question is incorrect. This functionality may be extended in the future.
Sometimes, it is not necessary to call Maxima at all, resulting in faster processing performance. Consider the question:
Student input was parsed using XML code as shown in the Appendix Example 1 (AE1).
Express as a simple fraction involving no negative powers.
The student input is matched to a regular expression test (regexp) which is handled directly by PHP. But first the string is prefiltered to remove leading, trailing and embedded white space, convert alpha characters to lower case, remove any explicit multiplication symbols *, substitute any grouping done with [ ] or { } pairs by ( ) and finally, remove all ( ) pairs. The filtered string is then compared to the regexp between the @ characters in the <value> element (see AE1). In this case, the @ character functions as a regexp delimiter, but more generally delimits any string to be passed to another script (@ was chosen as it is unlikely to appear in student input). The correct answer 16b/a is matched if the filtered string is either 16b/a or b16/a, the second form being necessary if the student entered the correct string b*16/a.
A second example is the following question.
Expand (x+1)(-2x+1)(x-3)The XML parsing instructions are given in Appendix Example 2 (AE2).(Exponents or powers must be typed using the caret character ^. For example, type x2 as x^2)
Two tests are applied to the student input and they must both return a Boolean true value for the question to be marked correct. The two tests are enclosed in the logical <and> container (AE2 lines 2 and 21). The first test is a regular expression match <regexp> (AE2 lines 4-9), which has been negated by the logical <not> tag (AE2 lines 3 and 10). This checks that the student input does not contain three pairs of parentheses i.e. ( )( )( ) or ( )*( )*( ), which would mean that the student had not done any expansion.
The second test (AE2 lines 11-20) specifies a script to be run by the CAS Maxima. The script is specified in the <script> container (AE2 lines 16-18) and expands the difference between the given expression and the prefiltered student input (the @ symbol). The result is compared to the value specified in the <value> element (AE2 line 19) and if it matches, the result of the test is Boolean true. Since the two tests are contained in an <and> element, the final result is the logical AND of the two tests. Notice that any necessary prefiltering must be done in each test separately, as each test is passed a copy of the student's raw input. Also notice, that this code would allow a partial expansion to be marked correct; if the student has correctly removed only one pair of parentheses, then the answer would be marked correct. Of course, this can be changed, if desired, to award only part marks, or no marks, for a partial expansion.
The capacity to prefilter input allows OTS to handle informal input syntax, which can be defined separately for each answer, if necessary, and the ability to combine different tests together allows sophisticated control over how an input is evaluated. Thus partial credit can be awarded for an answer that is not correct, but contains some of the correct working (Ashton, Beevers, Korabinski & Youngson, 2006; Darrah, Fuller & Miller, 2010).
Each question must have a <title> container, a <solution> container, which specifies how to determine the correct answer (see AE1 and AE2), and optionally, a <reminder> container, the contents of which are returned when the student selects Check, and lastly <feedback> containers which are returned when Submit is selected and the question is marked. Feedback may differ depending upon whether the question is correct or not. Since title, reminders and feedback are provided via a web page, they must be formatted as valid XHTML.
As an authoring language though, XML is rather daunting for some authors. A future plan is to develop an authoring front end to the OTS which would make it accessible to more users.
The list data type is very useful for questions such as:
Solve the following cubic equation for x: x3 - 4x2 + x + 6 = 0where one does not want to disclose to students that three answers are expected. Lists also give partial credit for some correct answers.(Your answers should be a list separated by commas.)
These data types and structures, combined with the suite of prefilter actions discussed earlier, provide fine-grained control over how student input is processed and evaluated. Thus, allowing the informal input syntax to be implemented. For example, equivalences such as 2.5 = 2 + 1/2 = 5/2 can be handled seamlessly without troubling the student with format directions.
When starting processes remotely via web interfaces, care must be taken to ensure that the server does not become overloaded, either with running processes, or processes that fail to terminate correctly. To ensure that the server performance is not compromised the following control measures were adopted.
Using CAA has resulted in a huge time saving for staff and provides very early feedback to staff and students alike. During initial testing, student feedback was sought via anonymous exit questionnaires, from which a number of modifications were made to the student interface. The main issues identified were to do with question design and errors of logic in the marking scheme. There were no problems with the underlying engine. Subsequently, the OTS was deployed to deliver the first assignment in seven entry-level mathematics courses, for cohorts in Business, Engineering, Education and Sciences. Student feedback is routinely collected in all courses at USQ via anonymous online questionnaires at the end of each semester. This was one source of information about student reaction to the OTS; the other was comments and questions posted directly by students, either by email or on course online discussion groups, while they were using the system to complete their assignments. We emphasised that fair comment would always be listened to and that electronic marks would be reviewed if requested. With experience, we were better able to anticipate the variations that would arise in student solutions to each question, and to tailor the marking scheme and question feedback appropriately.
The majority of students accept the system well and those who do badly on the test later report that it served as a wake-up call to get them to revise their skills. To date there have been no complaints about accessing and using the system, but some students mistrust the reliability of the marking. Sangwin and Ramsden (2007) report similar fears among students, that they may be marked wrong because of a syntax error. Indeed, some still feel that this was the case even when investigation shows that their error was mathematical. Nonetheless, where a student is still not happy, we either let the student take the test again or mark the test manually, which currently occurs in about 6% of papers.
The main difficulty is developing and testing good questions and specifying the informal syntax for their solution. We learned from student responses and no system can handle perfectly all the possible answers that students may give to any question. Student input was analysed to detect common error patterns and this information was fed back both to question design and solution parsing. This process involved some trial and error, though as we gained more experience using the system, problems became less frequent.
As can be seen from the example questions in this paper, we have tried to help students by giving specific syntax hints where necessary. However, Sangwin and Ramsden (2007) report that around 18% of students still make errors of this kind despite explicit syntax instructions. Their students were expected to enter answers directly in CAS syntax, but in our experience with OTS the incidence of such errors is much lower, around 5%. We believe this is due to the effort we have made to implement an informal input syntax.
Currently, the back-end administrative reports and functions use a purpose-built web interface, but capturing the data in an SQL database would integrate better with other data systems at the University. We also plan to investigate more elaborate user verification and access control to enable summative testing online.
We want to develop better-targeted feedback to the student about particular areas of weakness: algebra, functions, trigonometry, inequalities etc and possibly do some syntax checking of student input during the Check phase before submission. This would alert the student to possible syntax errors before they are assessed. One possibility is to use Maxima to render student input as TEX and pass this to a display engine like MathJax (see http://www.mathjax.org/), which could render and display the input as properly formatted mathematics. The student could then be asked to verify, during the Check phase, that the input is what they intended.
The modular design of OTS was intended to allow for automated testing, randomly generating tests from question banks, or targeting tests to areas of weakness. These functions will develop over time and provide web-based assessment, remedial and support materials to students. Another intended improvement is to paginate the Answer Form and allow students to save partially completed answers, which will allow for longer tests to be delivered.
Ashton, H. S., Beevers, C. E., Korabinski, A. A. & Youngson, M. A. (2006). Incorporating partial credit in computer-aided assessment of mathematics in secondary education. British Journal of Educational Technology, 37(1), 93-119. http:/dx.doi.org/10.1111/j.1467-8535.2005.00512.x
Berger, M. (2010). Using CAS to solve a mathematics task: A deconstruction. Computers & Education, 55(1), 320-332. http://dx.doi.org/10.1016/j.compedu.2010.01.018
Darrah, M., Fuller, E. & Miller, D. (2010). A comparative study of partial credit assessment and computer-based testing. In Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2010 (pp. 2335-2340). Toronto, Canada: AACE. http://www.editlib.org/p/34965
Dopper, S. M. & Sjoer, E. (2004). Implementing formative assessment in engineering education: the use of the online assessment system Etude. European Journal of Engineering Education, 29(2), 259-266. http://dx.doi.org/10.1080/0304379032000157187
Engelbrecht, J. & Harding, A. (2004). Combining online and paper assessment in a web-based course in undergraduate mathematics. Journal of Computers in Mathematics and Science Teaching, 23(3), 217-231. http://www.editlib.org/p/4981
Feng, M., Heffernan, N. & Koedinger, K. (2009). Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted Interaction, 19(3), 243-266. http:/dx.doi.org/10.1007/s11257-009-9063-7
Jourdan, N., Cretchley, P. & Passmore, T. (2007). Secondary-tertiary transition: What mathematics skills can and should we expect this decade? In J. Watson & K. Beswick (Eds), Mathematics: Essential research, essential practice. Vol. 2 of Proceedings of 30th Annual Conference (pp. 463-442). Hobart, Australia: MERGA. http://www.merga.net.au/documents/RP412007.pdf
Klai, S., Kolokolnikov, T. & Van den Bergh, N. (2000). Using Maple and the web to grade mathematics tests. In Proceedings of the international workshop on advanced learning technologies. 4-6 December, Palmerston North, New Zealand. [verified 29 Aug 2011] http://users.ugent.be/~nvdbergh/aim_rug/docs/iwalt.pdf
Klasa, J. (2010). A few pedagogical designs in linear algebra with Cabri and Maple. Linear Algebra and its Applications, 432(8), 2100-2111. http://dx.doi.org/10.1016/j.laa.2009.08.039
Naismith, L. & Sangwin, C. J. (2004). Computer algebra based assessment of mathematics online. In 8th Annual CAA Conference (pp. 235-242). Loughborough, UK: Loughborough University. http://hdl.handle.net/2134/1956
Nicaud, J.-F., Bouhineau, D. & Chaachoua, H. (2004). Mixing microworld and CAS features in building computer systems that help students learn algebra. International Journal of Computers for Mathematical Learning, 9(2), 169-211. http://dx.doi.org/10.1023/B:IJCO.0000040890.20374.37
Sangwin, C. J. (2003). Assessing higher mathematical skills using computer algebra marking through AIM. In R. L. May & W. F. Blyth (Eds), EMAC2003: Proceedings of the engineering mathematics and applications conference. University of Technology, Sydney: Engineering Mathematics Group, ANZIAM. http://web.mat.bham.ac.uk/C.J.Sangwin/Publications/Emac03.pdf
Sangwin, C. J. (2006). Assessment within computer algebra rich learning environments. In Le Hung Son, N. Sinclair, J. B. Lagrange & C. Hoyles (Eds.), Proceedings of the ICMI 17 study conference: Background papers for the ICMI 17 study (p. C79). Hanoi University of Technology: Springer. http://web.mat.bham.ac.uk/C.J.Sangwin/Publications/2006-ICMI-Sangwin-CJ.pdf
Sangwin, C. J. & Grove, M. J. (2006). STACK: Addressing the needs of the "neglected learners". In M. SeppŠlŠ, S. Xambo & O. Caprotti (Eds.), Proceedings of the WebAlt conference. Technical University of Eindhoven, Netherlands: European eContent EDC-22253-WEBALT. [verified 29 Aug 2011; 4.53 MB] http://webalt.math.helsinki.fi/webalt2006/content/e31/e176/webalt2006.pdf
Sangwin, C. J. & Ramsden, P. (2007). Linear syntax for communicating elementary mathematics. Journal of Symbolic Computation, 42(9), 920-934. http://web.mat.bham.ac.uk/C.J.Sangwin/Publications/2007-Sangwin_Ramsden_Syntax.pdf; http://dx.doi.org/10.1016/j.jsc.2007.07.002
Strickland, N. (2002). Alice Interactive Mathematics. MSOR Connections, 2(1), 27-30. http://mathstore.ac.uk/newsletter/feb2002/pdf/aim.pdf
Wang, H.-C., Chang, C.-Y. & Li, T.-Y. (2008). Assessing creative problem-solving with automated text grading. Computers & Education, 51(4), 1450-1466. http://dx.doi.org/10.1016/j.compedu.2008.01.006
Wang, T., Su, X., Wang, Y. & Ma, P. (2007). Semantic similarity-based grading of student programs. Information and Software Technology, 49(2), 99-107. http://dx.doi.org/10.1016/j.infsof.2006.03.001
Wang, T.-H. (2010). Implementation of Web-based dynamic assessment in facilitating junior high school students to learn mathematics. Computers & Education, 56(4), 1062-1071. http://dx.doi.org/10.1016/j.compedu.2010.09.014
![]()
Example 1: prefiltering of input allows informal input syntax
![]()
Example 2: multiple tests applied to input allow fine-grained response and partial credit
![]()
Example 3: question specifications are validated using an XML schema