Converting linear text documents into documents publishable in a hypertext environment is a complex task requiring conversion software on the technical side as well as conversion strategies and methods on the conceptual side. While most of the research on text-to-hypertext conversion has concentrated on technical aspects or was related to specific projects and systems, there is now a growing need for general principles and strategies for handling conceptual problems of text-to-hypertext conversion such as:
The project focuses on these conceptual problems, using XML as the technical basis for hypertext modelling and viewing.
The central idea of the project is to base conversion strategies on annotations which explicitly mark-up the text-grammatical
structures and relations between text segments, e.g. co-reference relations, semantics of connectives, text-deictic expressions,
and expressions indicating topic handling. The project developed a methodology which (semi)-automatically constructs hypertext
layers and views, using the text-grammatical annotations.
Our conversion approach operates on two levels:
In this approach, we store the hypertext views as an additional document layer. Since we preserve structure and content of
the original text documents,
the reader still has the choice between sequential and selective (hypertext-driven) reading modes.
The users that we have in mind in generating our hypertext views are in search for information in a scientific domain in which
they have previous but no
expert knowledge. Their time is constrained, and they have to solve a very specific type of problem. Such situations are typical
for many contexts, e.g.
interdisciplinary research, scientific journalism, or specialised lexicography. In scenarios like these, users often read
excursively and perceive only parts
of longer documents. When these documents are sequentially organised, i.e. designed to be read from the beginning to the end,
this selective reading
may result in coherence problems. For example, a reader, jumping right in the middle of a sequential document, may not understand
(or may misunderstand)
a paragraph because he lacks the prerequisite knowledge given in the preceding text. The goal of our conversion approach is
to generate hypertext views on
sequential documents which avoid these coherence problems and make selective reading and browsing more efficient and more
convenient than it would
be possible with printmedia.
Feasability and performance of the conversion methodology are tested and evaluated using a sample text corpus containing documents
of the domain "hypertext research" and "text-technology".