Question Analysis Format
This documentation describes the xml format of question analysis output. The format is for the Question Analysis submission, and for those IR4QA participants who wants to use question analysis results from QA. For the definition of Run ID, please refer to: SubmissionFormat#RunIDDefinition.
Overview
Tag |
Description |
TOPIC_SET |
Contains a meta data and a list of topics |
METADATA |
Contains meta information about the system that produced this result |
TOPIC |
Each TOPIC is associated with QUESTION_ANALYSIS. |
QUESTION_ANALYSIS |
Contains an ANSWERTYPE and KEYTERMS extracted from the question. |
ANSWERTYPE |
By default, one of DEFINITION, BIOGRAPHY, RELATIONSHIP, EVENT are expected. You can expand the answer type with your original type, if you wish. SCORE is optional but you are recommended to produce this value between 0 and 1. |
KEYTERM |
This field stores (translated) key word from the question. Synonym/alias can also be added as KEYTERM. SCORE is optional but you are recommended to produce this value between 0 and 1. |
DTD
<!DOCTYPE TOPIC_SET [ <!ELEMENT TOPIC_SET (METADATA,TOPIC*)> <!ELEMENT METADATA (RUNID,DESCRIPTION?)> <!ELEMENT RUNID (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)> <!ELEMENT TOPIC (QUESTION_ANALYSIS)> <!ATTLIST TOPIC ID CDATA #REQUIRED> <!ELEMENT QUESTION_ANALYSIS (ANSWERTYPE,KEYTERMS)> <!ELEMENT ANSWERTYPE (#PCDATA)> <!ATTLIST ANSWERTYPE SCORE CDATA #IMPLIED> <!ELEMENT KEYTERMS (KEYTERM*)> <!ATTLIST KEYTERMS LANGUAGE (CS|CT|EN|JA) #REQUIRED> <!ELEMENT KEYTERM (#PCDATA)> <!ATTLIST KEYTERM SCORE CDATA #IMPLIED> ]>
Sample XML Format
<TOPIC_SET>
<METADATA>
<RUNID>CMUJAV-EN-JA-01-T</RUNID>
<DESCRIPTION>We used Support Vector Machine for answer type classification and NP chunking.</DESCRIPTION>
</METADATA>
<TOPIC ID="ACLIA1-JA-T1">
<QUESTION_ANALYSIS>
<ANSWERTYPE SCORE="1.0">DEFINITION</ANSWERTYPE>
<KEYTERMS LANGUAGE="JA">
<KEYTERM SCORE="1.0">ファタハ</KEYTERM>
<KEYTERM SCORE="0.1">組織</KEYTERM>
</KEYTERMS>
</QUESTION_ANALYSIS>
</TOPIC>
<TOPIC ID="ACLIA1-JA-T2">
<QUESTION_ANALYSIS>
<ANSWERTYPE SCORE="1.0">DEFINITION</ANSWERTYPE>
<KEYTERMS LANGUAGE="JA">
<KEYTERM SCORE="1.0">もやもや病</KEYTERM>
<KEYTERM SCORE="0.3">病気</KEYTERM>
</KEYTERMS>
</QUESTION_ANALYSIS>
</TOPIC>
</TOPIC_SET>
Changelog
- 2008-05-16: Generalized format by using TOPIC_SET:TOPIC structure. Simplified the DTD by removing optional fields for process time etc.
