Consider the following POS-tagged string:
It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB
from/IN them/PRP ever/RB again/RB
(It was not okay or funny and I will never buy from them ever again)
I want to accomplish the following:
[*]Check for negating adverbs (RB) against defined array('not', 'never')
[*]When there's a match, remove the adverb
[*]Concatenate "not-" to the beginning of every subsequent adjective (JJ), adverb (RB), or verb (VB or VBN for past tense)
[*]Remove all POS-tags (/XX)
Thus, the desired output would be:
It was not-okay or not-funny and I will not-buy from them not-ever not-again
My first thought was to do this the way I know how to: explode the string on space, then explode every word on "/" to [JJ => okay], then make a switch statement to treat every word (case JJ: concatenate, etc.), but this seems very sloppy. Does anybody have a more clean and / or efficient way of doing this, for instance regex? The strings have been pre-cleaned, so they will always only contain words (no punctuation, other characters than a-z, etc.).
Any tips, example code fragments, etc. would be greatly appreciated!
*Edit: I am aware, btw, of the very basic character of this way of treating negations, but it is good enough for what I need. There will be an error margin, but that's ok *