Question Analysis Format

This documentation describes the xml format of question analysis output. The format is for the Question Analysis submission, and for those IR4QA participants who wants to use question analysis results from QA. For the definition of Run ID, please refer to: SubmissionFormat#RunIDDefinition.

Overview

Tag

Description

TOPIC_SET

Contains a meta data and a list of topics

METADATA

Contains meta information about the system that produced this result

TOPIC

Each TOPIC is associated with QUESTION_ANALYSIS.

QUESTION_ANALYSIS

Contains an ANSWERTYPE and KEYTERMS extracted from the question.

ANSWERTYPE

By default, one of DEFINITION, BIOGRAPHY, RELATIONSHIP, EVENT are expected. You can expand the answer type with your original type, if you wish. SCORE is optional but you are recommended to produce this value between 0 and 1.

KEYTERM

This field stores (translated) key word from the question. Synonym/alias can also be added as KEYTERM. SCORE is optional but you are recommended to produce this value between 0 and 1.

DTD

<!DOCTYPE TOPIC_SET [
<!ELEMENT TOPIC_SET (METADATA,TOPIC*)>
<!ELEMENT METADATA (RUNID,DESCRIPTION?)>
<!ELEMENT RUNID (#PCDATA)>
<!ELEMENT DESCRIPTION (#PCDATA)>
<!ELEMENT TOPIC (QUESTION_ANALYSIS)>
<!ATTLIST TOPIC ID CDATA #REQUIRED>
<!ELEMENT QUESTION_ANALYSIS (ANSWERTYPE,KEYTERMS)>
<!ELEMENT ANSWERTYPE (#PCDATA)>
<!ATTLIST ANSWERTYPE SCORE CDATA #IMPLIED>
<!ELEMENT KEYTERMS (KEYTERM*)>
<!ATTLIST KEYTERMS LANGUAGE (CS|CT|EN|JA) #REQUIRED>
<!ELEMENT KEYTERM (#PCDATA)>
<!ATTLIST KEYTERM SCORE CDATA #IMPLIED>
]>

Sample XML Format

<TOPIC_SET>
  <METADATA>
    <RUNID>CMUJAV-EN-JA-01-T</RUNID>
    <DESCRIPTION>We used Support Vector Machine for answer type classification and NP chunking.</DESCRIPTION>
  </METADATA>
  
  <TOPIC ID="ACLIA1-JA-T1">
    <QUESTION_ANALYSIS>
      <ANSWERTYPE SCORE="1.0">DEFINITION</ANSWERTYPE>
      <KEYTERMS LANGUAGE="JA">
        <KEYTERM SCORE="1.0">ファタハ</KEYTERM>
        <KEYTERM SCORE="0.1">組織</KEYTERM>
      </KEYTERMS>
    </QUESTION_ANALYSIS>
  </TOPIC>
  
  <TOPIC ID="ACLIA1-JA-T2">
    <QUESTION_ANALYSIS>
      <ANSWERTYPE SCORE="1.0">DEFINITION</ANSWERTYPE>
      <KEYTERMS LANGUAGE="JA">
        <KEYTERM SCORE="1.0">もやもや病</KEYTERM>
        <KEYTERM SCORE="0.3">病気</KEYTERM>
      </KEYTERMS>
    </QUESTION_ANALYSIS>
  </TOPIC>
</TOPIC_SET>

Changelog

QuestionAnalysisFormat (last edited 2008-06-25 18:28:06 by TerukoMitamura)