Change Log

NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)

Overview

Roadmap

The ultimate goal in cross-lingual information access is to answer any type of question or to satisfy any type of information needs in any language with responses drawn from multilingual corpora. If the information needs are very simple ones (e.g. factoid question), then the answer can be a simple word or phrases. If the information needs are more complex, then the answers may come from multiple documents. Candidate answers from different corpora can be merged or possibly summarized before presentation to the user. Alternatively, candidate answers from different languages can be translated into the user's native language. For example, if we retrieve answers from Chinese, Japanese and Korean corpora and translate them into English, we can compare the nature of answers drawn from different geographic and cultural contexts.

In the ACLIA evaluation task, our research direction is moving toward these goals, and it is our hope that QA and IR researchers can collaborate to achieve these goals for ACLIA in NTCIR-8.

Goal for Cross-lingual Question Answering: Since NTCIR has already evaluated factoid questions and four types of complex questions in the past, the ACLIA task at NTCIR-8 will promote progress toward longer-term research goals by combining factoid and complex question answering into a single task.

Goal for Cross-lingual Information Retrieval: Following the success of NTCIR-7, we plan to evaluate CLIR as a component of cross-lingual question answering (CLQA) to determine which IR technique(s) are most useful for CLQA. We will adopt a standard XML input/output specification of each module, so that CLIR components can be integrated into multiple CLQA systems for evaluation. At the same time, we can evaluate the CLIR component itself.

NTCIR-8 Task Overview

Current research in QA is moving beyond factoid CLQA, so there is a significant motivation to evaluate more complex questions in order to move the research forward. We have evaluated cross-lingual and monolingual QA on complex questions (i.e. events, biographies/definitions, and relationships) in NTCIR-7. Our goal in ACLIA for NTCIR-8 is to develop effective CLQA evaluations for complex questions and factoid questions. We will evaluate end-to-end systems and conduct module-based evaluations for question type analysis, document retrieval and answer extraction.

Why (CL)IR researchers should consider participating in ACLIA: Since document retrieval is an essential part of CLQA, we welcome the CLIR community to participate in ACLIA to evaluate their CLIR systems, both as modules and as a component within end-to-end QA systems.

NTCIR-8 Task Definition

The task in ACLIA is to accept complex questions and factoid questions in English and provide answers in Chinese (Simplified, Traditional) and/or Japanese. The target corpus will consist of newspaper articles.

Each participant is expected to register for one or more of the following tasks.

In order to combine a CLIR module with a CLQA system for module-based evaluation, we will use an XML input/output format.

Formal run

formal-run-phase.gif

Evaluation

evaluation-phase.gif

Dataset

NTCIR distributes QA data (e.g. corpora, questions, answers etc) for free, or in inexpensive prices for some corpora, with agreements on research purpose use.

Training (Dry Run)

Past NTCIR QA test collections from QAC/CLQA/ACLIA tasks are available for training your system. For more details, go to NTCIR official website at http://research.nii.ac.jp/ntcir/permission/

In the previous ACLIA at NTCIR-7, which is the most relevant task, we used following newswire corpora.

Lang

Name

Year

#doc

Distributer

Chinese (Simplified)

Zaobao

1998-2001

249,287

NII

Chinese (Simplified)

Xinhua

1998-2001

295,875

LDC

Chinese (Traditional)

CIRB

1998-2001

1,150,954

NII

Japanese

Mainichi

1998-2001

419,759

NII

Formal Run

In the NTCIR-8 ACLIA formal evaluation, we will use following newswire corpora.

Lang

Name

Year

#doc

Distributer

Chinese (Simplified)

Xinhua

2002-2005

308,845

LDC

Chinese (Traditional)

UDN

2002-2005

1,663,517

NII

Japanese

Mainichi

2002-2005

377,941

NII

Submission

Go to Submission to see how to submit the run results.

Evaluation

Go to Evaluation to see sample questions and evaluation methods.

Task Organizers

ACLIA task organizers:

IR for QA:

Japanese CLQA:

Simplified Chinese CLQA:

Traditional Chinese CLQA:

Advisors:

Organizers can visit the OrganizerOnly page for drafting the task design, after logging in to the wiki.

Schedule

What

When

First call for participation

2009-05

Registration Due

2009-07-31

Document Set Release

2009-07

Dry Run

2009-08~

Formal Run

2010-01~2010-01

Release TOPICS to CCLQA, IR4QA

2010-01-06 (EST)

Submit Answer type analysis

2010-01-13 (EST)

Release Answer type analysis to IR4QA group from CCLQA group

2010-01-14 (EST)

Submit IR4QA results(Both with or without the Use of Answer type analysis from CCLQA)

2010-01-20 (EST)

Submit CCLQA and Monolingual QA results (J-J, CS-CS, CT-CT, E-J, E-CS, E-CT)

2010-01-27 (EST)

Release IR4QA results to CCLQA groups(Both with and without the use of Answer type analysis)

2010-01-29 (EST)

Submit CCLQA results using IR4QA results

2010-02-05 (EST)

IR4QA Evaluation Results Return

2010-03-03 (EST)

CCLQA/Monolingual QA Evaluation Results Return

2010-03-10 (EST)

Task Overview Partial Release

2010-03-24 (EST)

IR4QA Working Paper for the Proceedings Due

2010-04-15 (EST)

CCLQA/Monolingual QA Working Paper for the Proceedings Due

2010-04-15 (EST)

Return the Comments from organizers on IR4QA working papers Due

2010-04-26 (EST)

Return the Comments from organizers on CCLQA/Monolingual QA working papers Due

2010-04-26 (EST)

Camera-ready Paper for the Proceedings Due

2010-05-15 (Japan Time)

Final Meeting

2010-06-15~2010-06-18 (Japan Time)

Home (last edited 2010-05-17 22:01:57 by HidekiShima)