pr0no Posted February 12, 2012 Share Posted February 12, 2012 Consider the following POS-tagged string: It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB from/IN them/PRP ever/RB again/RB (It was not okay or funny and I will never buy from them ever again) I want to accomplish the following: [*]Check for negating adverbs (RB) against defined array('not', 'never') [*]When there's a match, remove the adverb [*]Concatenate "not-" to the beginning of every subsequent adjective (JJ), adverb (RB), or verb (VB or VBN for past tense) [*]Remove all POS-tags (/XX) Thus, the desired output would be: It was not-okay or not-funny and I will not-buy from them not-ever not-again My first thought was to do this the way I know how to: explode the string on space, then explode every word on "/" to [JJ => okay], then make a switch statement to treat every word (case JJ: concatenate, etc.), but this seems very sloppy. Does anybody have a more clean and / or efficient way of doing this, for instance regex? The strings have been pre-cleaned, so they will always only contain words (no punctuation, other characters than a-z, etc.). Any tips, example code fragments, etc. would be greatly appreciated! *Edit: I am aware, btw, of the very basic character of this way of treating negations, but it is good enough for what I need. There will be an error margin, but that's ok * Quote Link to comment https://forums.phpfreaks.com/topic/256951-write-negated-string-according-to-pos-tags/ Share on other sites More sharing options...
Psycho Posted February 12, 2012 Share Posted February 12, 2012 This is rough, but it works. It only requires the first part of the input sting with each word and its type identifier. $neg_adv = array('not', 'never'); $input = "It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB from/IN them/PRP ever/RB again/RB (It was not okay or funny and I will never buy from them ever again)"; $output = array(); foreach(explode(' ', $input) as $part) { if(strpos($part, '/')) { list($word, $type) = explode('/', $part); if($type!='RB' || !in_array($word, $neg_adv)) { if($type=='JJ' || $type=='RB' || $type=='VB' || $type=='VBN') { $output[] = 'not-'.$word; } else { $output[] = $word; } } } } echo implode(' ', $output); Quote Link to comment https://forums.phpfreaks.com/topic/256951-write-negated-string-according-to-pos-tags/#findComment-1317305 Share on other sites More sharing options...
pr0no Posted February 12, 2012 Author Share Posted February 12, 2012 Hey, thanks. It doesn't fully work as expected however. Consider the input: It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB from/IN them/PRP ever/RB again/RB The output now is: It was not not-okay or funny and I will not-buy from them not-ever again However, the expected output is: It was not not-okay or not-funny and I will not-buy from them not-ever not-again The difference is in "not-funny" and "not-again". They are respectively a JJ and RB word, but they do not get tagged like the others. I think this is due to the second if-statement: if($type!='RB' || !in_array($word, $neg_adv)) { if($type=='JJ' || $type=='RB' ... Why do you first check if $type is not 'RB', and then check if $type * is * 'RB'? Is the first one meant to remove the negation word (not, never)? I think this is stopping "funny" and "again" from being tagged. Could you explain? Quote Link to comment https://forums.phpfreaks.com/topic/256951-write-negated-string-according-to-pos-tags/#findComment-1317312 Share on other sites More sharing options...
pr0no Posted February 12, 2012 Author Share Posted February 12, 2012 Oh, nevermind! It works great; for some reason when I take live output from the database here, it makes the error described above. But it works perfectly with the string as I gave it in this post Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/256951-write-negated-string-according-to-pos-tags/#findComment-1317350 Share on other sites More sharing options...
Psycho Posted February 13, 2012 Share Posted February 13, 2012 Oh, nevermind! It works great; for some reason when I take live output from the database here, it makes the error described above. But it works perfectly with the string as I gave it in this post Thanks! Yeah, there were spaces that were replaced with line-breaks. I assumed that was a copy/paste error. Quote Link to comment https://forums.phpfreaks.com/topic/256951-write-negated-string-according-to-pos-tags/#findComment-1317426 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.