The Document Model

Sub-module for handling document-level stuff

class corenlp_xml.document.Document(xml_string)[source]

This class abstracts a Stanford CoreNLP Document

coreferences[source]

Returns a list of Coreference classes

Getter:Returns a list of coreferences
Type:list of corenlp_xml.coreference.Coreference
get_sentence_by_id(id)[source]

Gets sentence by ID

Parameters:id (int) – the ID of the sentence, as defined in the XML
Returns:a sentence
Return type:corenlp_xml.document.Sentence
sentences[source]

Returns the ordered dict of sentences as a list.

Getter:returns list of sentences, in order
Type:list of corenlp_xml.document.Sentence
sentiment[source]

Returns average sentiment of document. Must have sentiment enabled in XML output.

Getter:returns average sentiment of the document
Type:float
class corenlp_xml.document.Sentence(element)[source]

This abstracts a sentence

basic_dependencies[source]

Accesses basic dependencies from the XML output

Getter:Returns the dependency graph for basic dependencies
Type:corenlp_xml.dependencies.DependencyGraph
collapsed_ccprocessed_dependencies[source]

Accesses collapsed, CC-processed dependencies

Getter:Returns the dependency graph for collapsed and cc processed dependencies
Type:corenlp_xml.dependencies.DependencyGraph
collapsed_dependencies[source]

Accessess collapsed dependencies for this sentence

Getter:Returns the dependency graph for collapsed dependencies
Type:corenlp_xml.dependencies.DependencyGraph
get_token_by_id(id)[source]

Accesses token by the XML ID

Parameters:id (int) – The XML ID of the token
Returns:The token
Return type:corenlp_xml.document.Token
id[source]
Returns:the ID attribute of the sentence
Return type:int
parse[source]

Accesses the parse tree based on the S-expression parse string in the XML

Getter:Returns the NLTK parse tree
Type:nltk.Tree
parse_string[source]

Accesses the S-Expression parse string stored on the XML document

Getter:Returns the parse string
Type:str
phrase_strings(phrase_type)[source]

Returns strings corresponding all phrases matching a given phrase type

Parameters:phrase_type (str) – POS such as “NP”, “VP”, “det”, etc.
Returns:a list of strings representing those phrases
semantic_head[source]

Returns the semantic head of the sentence – AKA the dependent of the root node of the dependency parse

Returns:the mention related to the semantic head
Return type:corenlp_xml.coreference.Mention
sentiment[source]

The sentiment of this sentence

Getter:Returns the sentiment value of this sentence
Type:int
subtrees_for_phrase(phrase_type)[source]

Returns subtrees corresponding all phrases matching a given phrase type

Parameters:phrase_type (str) – POS such as “NP”, “VP”, “det”, etc.
Returns:a list of NLTK.Tree.Subtree instances
Return type:list of NLTK.Tree.Subtree
tokens[source]

The tokens related to this sentence

Getter:Returns a a list of Token instances
Type:corenlp_xml.document.TokenList
class corenlp_xml.document.Token(element)[source]

Wraps the token XML element

character_offset_begin[source]

Lazy-loads character offset begin node

Getter:Returns the integer value of the beginning offset
Type:int
character_offset_end[source]

Lazy-loads character offset end node

Getter:Returns the integer value of the ending offset
Type:int
id[source]

Lazy-loads ID

Getter:Returns the ID of the token element
Type:int
lemma[source]

Lazy-loads the lemma for this word

Getter:Returns the plain string value of the word lemma
Type:str
ner[source]

Lazy-loads the NER for this word

Getter:Returns the plain string value of the NER tag for the word
Type:str
pos[source]

Lazy-loads the part of speech tag for this word

Getter:Returns the plain string value of the POS tag for the word
Type:str
speaker[source]

Lazy-loads the speaker for this word

Getter:Returns the plain string value of the speaker tag for the word
Type:str
word[source]

Lazy-loads word value

Getter:Returns the plain string value of the word
Type:str