next up previous index
Next: Processing Up: The ISAAC system Previous: Overview of the system

Preprocessing

  ISAAC reads published English stories ranging from one paragraph to four pages. There is a great deal of information contained in the English text which makes the text difficult for a standard parser to handle. For example, most sentence processing systems do not deal with punctuation. It is often the case, then, that only English words, usually arranged in grammatically correct sentences (unless the research was concerned with understanding ungrammatical phrases), are given to a sentence processing system. In this research, the processing of the English texts needs to happen as automatically as possible. Additionally, the information contained in these non-word portions of the text should be beneficial to the processing of the stories. As a result, the SARTrE preprocessing system was built to serve as a front-end to the reading system.

  Prior to a story being read, the text is given to SARTrE. The preprocessor produces an augmented English text; the specific augmentations allow the ISAAC system to handle the text in a more natural fashion. The preprocessor serves two roles. First, the text is changed to include certain explicit markers. The purpose of these markers is to alter certain elements of the original text to a form which is easier for a computer model to handle. The specific aspects which are marked are:

This information is contained in the original text; however, it is visually decoded by the reader. Since ISAAC does not have the ability to visually read the material, the SARTrE system allows ISAAC to take advantage of this information.

So, the first stage of ISAAC's reading of a story is the following:

English Text $\Rightarrow$ SARTrE $\Rightarrow$ Augmented English Text
Consider the example synopsis The Squire of Gothos again. The original text is:

 

The Squire of Gothos

In Space Quadrant 964, eight days away from Colony Beta Six, the Enterprise is trapped in orbit around an uncharted planet. There, Kirk and company are confronted by Trelane, an illogical but extremely powerful alien. Although he appears as an adult humanoid, Trelane is eventually revealed to be a ``child'' belonging to an unknown, alien race. Trelane's parents rescue Kirk and the Enterprise from their playful son.

After running the text through the SARTrE system, the text appears as follows:

*PHRASE* squire of gothos *PARAGRAPH* *SENTENCE* in space quadrant 964 *COMMA* eight days away from colony beta six *COMMA* the enterprise is trapped in orbit around an uncharted planet *PERIOD* *SENTENCE* there *COMMA* kirk *AND* company are confronted by trelane *COMMA* an illogical *BUT* extremely powerful alien *PERIOD* *SENTENCE* *ALTHOUGH* he appears as an adult humanoid *COMMA* trelane is eventually revealed to be a *START-DOUBLE-QUOTE* child *END-DOUBLE-QUOTE* belonging to an unknown *COMMA* alien race *PERIOD* *SENTENCE* *POSSESSIVE*-trelane parents rescue kirk *AND* the enterprise from their playful son *PERIOD*
 


next up previous index
Next: Processing Up: The ISAAC system Previous: Overview of the system
Kenneth Moorman
11/4/1997