Topics
Such Questions and the pertaining corpus-architectural
considerations interact with at least two more problem areas: on the
one hand with the kinds of research questions and of phenomena to be
analysed in linguistic and natural interaction research (which may call
for certain architectural solutions), and on the other hand with tools
for the creation, annotation, manipulation and exploration of
XML-based corpora.
The workshop will attempt to address the interplay between the
following research areas:
-
XML techniques for corpus representation, i.e. :
-
Standoff annotation vs. embedded annotation;
-
Use of XML linking standards for language data (XLink,
XPointer, XPath); other ways of ensuring relationships between levels,
e.g. through naming conventions;
-
Concepts of layering in corpora annotated at several levels
of linguistic description; types of information grouped together vs.
distributed over different "packages"
-
Hierarchical vs. flat annotation;
-
the grounding of annotations (e.g. in XML elements vs. in
characters?) and its implications;
-
techniques for the manipulation of XML-based representations
for massively annotated corpora; usefulness and relevance of XQuery.
-
Levels of linguistic description and their interaction, i.e.:
-
Examples of richly annotated corpora: reasons for the choice
of the annotated levels; linguistic and natural interactivity research
questions which can (only) be solved with richly annotated data;
-
Interaction between levels: new research questions in
linguistics and natural interactivity research which can only be
addressed because of observation across levels, across modalities, etc.
An example is the use of clustering techniques across different levels:
e.g. relevant cooccurrences of phenomena from different levels
identified via clustering;
-
Use and usefulness of concurrent annotations in
XML-based corpora; an example is concurrent flat and deep syntactic
analysis.
-
Tools for handling richly annotated corpora: Software solutions
for, e.g.,
- corpus creation, transformation, exchange, and validation
- interactive annotation;
- exploration: query and retrieval, statistical analysis;
- corpus management (e.g. wrt. meta-data).
Tools presented should be positioned with respect to the
questions of corpus architecture and with respect to the research
directions discussed above under (1) and (2).
The workshop aims at bringing together XML experts, both theorists
and practitioners, as well as linguists and natural interactivity
researchers working on the definition of corpus architectures,
annotation and resource exchange schemes and on tools for the use of
multilevel and/or multi-layer annotated corpora. It will provide a
forum for the definition of requirements for corpus representations and
pertaining tools, discussing at the same time case studies from
linguistics and natural interactivity research.
|