Next: Discussion Up: Introduction Previous: Task issues

Evaluation

The last two sections have sketched the high-level details of my theory of creative reading--the theory contains elements describing the knowledge representation and ontology, as well as the tasks and supertasks which produce the behavior. My theory is also implemented as a computer model capable of reading published short stories containing novel concepts. This model is the ISAAC system, mentioned several times in the course of this chapter. The ISAAC model is a computer program written in Common LISP, designed to act as a testbed for the ideas I developed in the my theory of creative reading (more details of the ISAAC system can be found in Chapter 7). ISAAC is designed to access files containing the English text making up various short science fiction stories. Specifically, the system is capable of reading five short science stores, ranging in length from one to three pages, as well as nine synopses, ranging from one to three paragraphs, of Star Trek: The Original Series episodes. As it reads the stories it attempts to arrive at a coherent comprehension of what occurred within them. This comprehension is stored as a set of three interlocking representations--one captures the events of the tale, one focuses on the structure of the story, and one concentrates on the course of processing which ISAAC undertook in order to arrive at a comprehension. After any story is read, then, the contents of ISAAC's memory have been altered to reflect its interpretation of the text. When reading is complete, ISAAC's memory can be cleared for another story, another story can be read without erasing the first story from memory, or ISAAC can be asked questions about the material.

The purpose of the model is two-fold. First, the need to create a computational model forces me to be specific about aspects of my theory. A theory can have underspecified components while a running computer program cannot. This does mean, however, that I must be careful to differentiate in this work between the aspects of the model which are supported by the theory and those aspects which exist solely to allow the system to function. This is the case since whenever a computer model is created, there will likely be atheoretical pieces to it.

To create the ISAAC model, each supertask which is described by the theory was implemented as a separate piece of code. This was a design decision which allowed me to map the functionality of the theory onto the working computer program, other choices of implementation could have been made. For example, one monolithic function could have captured the functionality of all six supertasks. Or, at the other extreme, completely separate computer systems could have been used to implement each of the supertasks. The ISAAC model represents only one possible instantiation of the theory. In addition to the tasks, the knowledge commitments which the theory makes had to be implemented--this was accomplished through the use of frame notation in a semantic network to create the structure of the knowledge and the use of a functional representation in order to capture the content of the necessary knowledge. The system also makes use of the ontology which the theory proposes, incorporating the ontological restrictions into the code for the model. Again, the need to produce an executable model forced me to be specific about various aspects of the theory.

Beyond this need for specification, however, there is a second purpose to the instantiation of the theory as a computer model. This implementation allows me to evaluate the theory by assessing how well the model accomplishes its task of reading novel texts. If the model is an accurate reflection of the theory, then the model can be tested in order to gain insights into the validity of the theory. Unfortunately, evaluating this form of theory and model is historically difficult; the domain is a so-called ``scruffy'' one where it is difficult to judge the correctness of a model on a performance task. So, to assist in evaluation, I have made use of a set of experts in the field, mainly high-school English teachers with a long history of teaching reading. After the ISAAC system reached a certain level of competence, I temporarily halted development of it, thereby ``freezing'' it at a particular level of skill. I then had the evaluators create question sets for the stories being read by the ISAAC system. A group of human readers and ISAAC were both given the stories to read and the questions to answer. Since the ISAAC system was developmentally frozen at this time, there was no possibility of me tailoring the model to handle the specific questions which the evaluators produced. The answers were then given back to the teachers to grade. The evaluators were not told which participants were human and which were the computer model.

After the evaluations were graded, an analysis was performed on the data to reveal important information. Three possible factors could have contributed to the variance in the scores. First, the evaluators could have differed in their question-making ability; perhaps one evaluator asked extremely difficult questions while another asked extremely easy ones. Second, the stories could have been at different levels of complexity. Third, the participants could have been at different levels of reading competence. An ANOVA statistical test revealed that the primary source of variance was from the participants. A follow-up analysis revealed that the eleven participants fell into two equivalence classes of readers, a ``lower'' class containing five students and a ``higher'' class containing the remaining five humans and the ISAAC system. This indicates that ISAAC is a reader performing at a level indistinguishable from five human readers and better than the five others, with respect to the three stories used in the study. The evaluation chapter (8) of this work will describe further analyses which were performed on the data to provide more details of the overall performance of the ISAAC model and how it relates to the theory.

Next: Discussion Up: Introduction Previous: Task issues

Kenneth Moorman
11/4/1997