Interactive Annotation Learner (IAL)

Welcome to the Interactive Annotation Learner Project Wiki.

The goal of this project is to develop an Interactive Active Learning framework for various NLP applications. The main research focus of this project is automatic feature selection and multi-strategy selective sampling. Features are automatically formulated from the tokens in the text and any annotations already present in the training data. Feature selection is done automatically and/or interactively with the user's input. Selective sampling is the task of selecting examples to present to the user for his/her input. Different criteria have been proposed in the literature for selective sampling. We aim to combine various selective sampling strategies based on 1) estimated user effort (e.g. ENUA), 2) estimated F-measure gain (conversion-based measures), 3) data distribution (statistical relevance measures), and 4) soft match with generalized example patterns (hypothesis-driven measures).

History

The LTI began investigation of active learning techniques for interactive annotation learning during Spring of 2007, when a team of two graduate students at Carnegie Mellon built a first prototype of a Java framwork called IAL (Interactive Annotation Learning) . This prototype was refined during Summer 2007 to perform selective sampling based on a combination of measures: confidence-based measures (which drive convergence in the model) and user effort measures (which drive minimization of user effort). Initial experimental results were reported in (Arora and Pathak, 2007). During Fall 2007, several student groups in a software engineering class used the prototype framework to develop selective sampling measures, and a team of four students is currently using the system to interactively develop an opinion annotator for the BLOG06 corpus.

Papers

  • Shilpa Arora and Manas Pathak (2007). Learning More with Less: Reducing Annotation Effort with Active and Interactive Learning. LTI Student Research Symposium - Research Abstract. [pdf] slides]
  • Shilpa Arora and Sachin Agarwal (2007). Literature Review: Active Learning for Natural Language Processing.[pdf] [Slides]

Spring 2008: 11-792 Software Engineering II: Opinion Analysis

IAL for Opinion Analysis (Spring 08)

Previous Work