-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Is it possible to have Alpino output the parse tree in the following format:
In: "Several theories about the higher prevalence in males have been investigated, but the cause of the difference is unconfirmed; one theory is that females are underdiagnosed."
Out: (S (S (S (NP (NP (JJ Several) (NNS theories)) (PP (IN about) (NP (NP (DT the) (JJR higher) (NN prevalence)) (PP (IN in) (NP (NNS males)))))) (VP (VBP have) (VP (VBN been) (VP (VBN investigated))))) (, ,) (CC but) (S (NP (NP (DT the) (NN cause)) (PP (IN of) (NP (DT the) (NN difference)))) (VP (VBZ is) (ADJP (JJ unconfirmed))))) (: ;) (S (NP (CD one) (NN theory)) (VP (VBZ is) (SBAR (IN that) (S (NP (NNS females)) (VP (VBP are) (ADJP (JJ underdiagnosed))))))) (. .))
This output is currently achieved through the use of AllenNLP and a minimal span-based neural constituency parser. However, as I'm also working with Dutch data I intend to use the Alpino parser. If the above output isn't conceivable I suspect I have to go over the XML output and work something out myself.