Change Log
2009-01-06: The formal run topics have been released. Please read Submission and SEPIA carefully.
2009-01-02: Released Submission page for more details about submission. Follow the link and go to SEPIA to learn how to download topics to run
2009-10-29: Released submission format spec and evaluation method details
NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)
Roadmap
The ultimate goal in cross-lingual information access is to answer any type of question or to satisfy any type of information needs in any language with responses drawn from multilingual corpora. If the information needs are very simple ones (e.g. factoid question), then the answer can be a simple word or phrases. If the information needs are more complex, then the answers may come from multiple documents. Candidate answers from different corpora can be merged or possibly summarized before presentation to the user. Alternatively, candidate answers from different languages can be translated into the user's native language. For example, if we retrieve answers from Chinese, Japanese and Korean corpora and translate them into English, we can compare the nature of answers drawn from different geographic and cultural contexts.
In the ACLIA evaluation task, our research direction is moving toward these goals, and it is our hope that QA and IR researchers can collaborate to achieve these goals for ACLIA in NTCIR-8.
Goal for Cross-lingual Question Answering: Since NTCIR has already evaluated factoid questions and four types of complex questions in the past, the ACLIA task at NTCIR-8 will promote progress toward longer-term research goals by combining factoid and complex question answering into a single task.
Goal for Cross-lingual Information Retrieval: Following the success of NTCIR-7, we plan to evaluate CLIR as a component of cross-lingual question answering (CLQA) to determine which IR technique(s) are most useful for CLQA. We will adopt a standard XML input/output specification of each module, so that CLIR components can be integrated into multiple CLQA systems for evaluation. At the same time, we can evaluate the CLIR component itself.
NTCIR-8 Task Overview
Current research in QA is moving beyond factoid CLQA, so there is a significant motivation to evaluate more complex questions in order to move the research forward. We have evaluated cross-lingual and monolingual QA on complex questions (i.e. events, biographies/definitions, and relationships) in NTCIR-7. Our goal in ACLIA for NTCIR-8 is to develop effective CLQA evaluations for complex questions and factoid questions. We will evaluate end-to-end systems and conduct module-based evaluations for question type analysis, document retrieval and answer extraction.
Why (CL)IR researchers should consider participating in ACLIA: Since document retrieval is an essential part of CLQA, we welcome the CLIR community to participate in ACLIA to evaluate their CLIR systems, both as modules and as a component within end-to-end QA systems.
NTCIR-8 Task Definition
The task in ACLIA is to accept complex questions and factoid questions in English and provide answers in Chinese (Simplified, Traditional) and/or Japanese. The target corpus will consist of newspaper articles.
Each participant is expected to register for one or more of the following tasks.
- English to Japanese CLQA (with Japanese to Japanese as a subtask);
- English to Chinese CLQA (Simplified or Traditional, with Chinese to Chinese as a subtask);
- English to Japanese CLIR (embedded in E-J CLQA);
- English to Chinese CLIR (embedded in E-C CLQA);
In order to combine a CLIR module with a CLQA system for module-based evaluation, we will use an XML input/output format.
Formal run
Evaluation
Dataset
NTCIR distributes QA data (e.g. corpora, questions, answers etc) for free, or in inexpensive prices for some corpora, with agreements on research purpose use.
Training (Dry Run)
Past NTCIR QA test collections from QAC/CLQA/ACLIA tasks are available for training your system. For more details, go to NTCIR official website at http://research.nii.ac.jp/ntcir/permission/
In the previous ACLIA at NTCIR-7, which is the most relevant task, we used following newswire corpora.
Lang |
Name |
Year |
#doc |
Distributer |
Chinese (Simplified) |
Zaobao |
1998-2001 |
249,287 |
NII |
Chinese (Simplified) |
Xinhua |
1998-2001 |
295,875 |
LDC |
Chinese (Traditional) |
CIRB |
1998-2001 |
1,150,954 |
NII |
Japanese |
Mainichi |
1998-2001 |
419,759 |
NII |
Formal Run
In the NTCIR-8 ACLIA formal evaluation, we will use following newswire corpora.
Lang |
Name |
Year |
#doc |
Distributer |
Chinese (Simplified) |
Xinhua |
2002-2005 |
308,845 |
LDC |
Chinese (Traditional) |
UDN |
2002-2005 |
1,663,517 |
NII |
Japanese |
Mainichi |
2002-2005 |
377,941 |
NII |
Submission
Go to Submission to see how to submit the run results.
Evaluation
Go to Evaluation to see sample questions and evaluation methods.
Task Organizers
ACLIA task organizers:
Teruko Mitamura <teruko AT cs.cmu.edu> (Carnegie Mellon University)
- Hideki Shima (Carnegie Mellon University)
IR for QA:
- Tetsuya Sakai (Microsoft Research Asia)
- Noriko Kando (National Institute of Informatics)
Japanese CLQA:
- Tatsunori Mori (Yokohama National University)
- Koichi Takeda (IBM Research - Tokyo)
Simplified Chinese CLQA:
- Chin-Yew Lin (Microsoft Research Asia)
- Ruihua Song (Microsoft Research Asia)
Traditional Chinese CLQA:
- Chuan-Jie Lin (National Taiwan Ocean University)
- Cheng-Wei Lee (Academic Sinica)
Advisors:
- Eric Nyberg (Carnegie Mellon University)
- Kuang-hua Chen
Organizers can visit the OrganizerOnly page for drafting the task design, after logging in to the wiki.
Schedule
What |
When |
First call for participation |
2009-05 |
Registration Due |
2009-07-31 |
Document Set Release |
2009-07 |
Dry Run |
2009-08~ |
Formal Run |
2010-01~2010-01 |
Release TOPICS to CCLQA, IR4QA |
2010-01-06 (EST) |
Submit Answer type analysis |
2010-01-13 (EST) |
Release Answer type analysis to IR4QA group from CCLQA group |
2010-01-14 (EST) |
Submit IR4QA results(Both with or without the Use of Answer type analysis from CCLQA) |
2010-01-20 (EST) |
Submit CCLQA and Monolingual QA results (J-J, CS-CS, CT-CT, E-J, E-CS, E-CT) |
2010-01-27 (EST) |
Release IR4QA results to CCLQA groups(Both with and without the use of Answer type analysis) |
2010-01-29 (EST) |
Submit CCLQA results using IR4QA results |
2010-02-05 (EST) |
IR4QA Evaluation Results Return |
2010-03-03 (EST) |
CCLQA/Monolingual QA Evaluation Results Return |
2010-03-10 (EST) |
Task Overview Partial Release |
2010-03-24 (EST) |
IR4QA Working Paper for the Proceedings Due |
2010-04-15 (EST) |
CCLQA/Monolingual QA Working Paper for the Proceedings Due |
2010-04-15 (EST) |
Return the Comments from organizers on IR4QA working papers Due |
2010-04-26 (EST) |
Return the Comments from organizers on CCLQA/Monolingual QA working papers Due |
2010-04-26 (EST) |
Camera-ready Paper for the Proceedings Due |
2010-05-15 (Japan Time) |
Final Meeting |
2010-06-15~2010-06-18 (Japan Time) |
