Jump to content

Write negated string according to POS tags


pr0no

Recommended Posts

Consider the following POS-tagged string:

 

It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB 
from/IN them/PRP ever/RB again/RB
(It was not okay or funny and I will never buy from them ever again)

 

I want to accomplish the following:

[*]Check for negating adverbs (RB) against defined array('not', 'never')

[*]When there's a match, remove the adverb

[*]Concatenate "not-" to the beginning of every subsequent adjective (JJ), adverb (RB), or verb (VB or VBN for past tense)

[*]Remove all POS-tags (/XX)

Thus, the desired output would be:

 

It was not-okay or not-funny and I will not-buy from them not-ever not-again

 

My first thought was to do this the way I know how to: explode the string on space, then explode every word on "/" to [JJ => okay], then make a switch statement to treat every word (case JJ: concatenate, etc.), but this seems very sloppy. Does anybody have a more clean and / or efficient way of doing this, for instance regex? The strings have been pre-cleaned, so they will always only contain words (no punctuation, other characters than a-z, etc.).

 

Any tips, example code fragments, etc. would be greatly appreciated!

 

*Edit: I am aware, btw, of the very basic character of this way of treating negations, but it is good enough for what I need. There will be an error margin, but that's ok :)*

Link to comment
Share on other sites

This is rough, but it works. It only requires the first part of the input sting with each word and its type identifier.

 

$neg_adv = array('not', 'never');

$input = "It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB from/IN them/PRP ever/RB again/RB (It was not okay or funny and I will never buy from them ever again)";

$output = array();
foreach(explode(' ', $input) as $part)
{
    if(strpos($part, '/'))
    {
        list($word, $type) = explode('/', $part);
        if($type!='RB' || !in_array($word, $neg_adv))
        {
            if($type=='JJ' || $type=='RB' || $type=='VB' || $type=='VBN')
            {
                $output[] = 'not-'.$word;
            }
            else
            {
                $output[] = $word;
            }
        }
    }
}

echo implode(' ', $output);

Link to comment
Share on other sites

Hey, thanks. It doesn't fully work as expected however. Consider the input:

It/PRP was/VBD not/RB okay/JJ or/CC funny/JJ and/CC I/NN will/MD never/RB buy/VB from/IN them/PRP ever/RB again/RB

The output now is:

It was not not-okay or funny and I will not-buy from them not-ever again

However, the expected output is:

It was not not-okay or not-funny and I will not-buy from them not-ever not-again

The difference is in "not-funny" and "not-again". They are respectively a JJ and RB word, but they do not get tagged like the others. I think this is due to the second if-statement:

if($type!='RB' || !in_array($word, $neg_adv)) {
  if($type=='JJ' || $type=='RB' ...

Why do you first check if $type is not 'RB', and then check if $type * is * 'RB'? Is the first one meant to remove the negation word (not, never)? I think this is stopping "funny" and "again" from being tagged. Could you explain?

Link to comment
Share on other sites

Oh, nevermind! It works great; for some reason when I take live output from the database here, it makes the error described above. But it works perfectly with the string as I gave it in this post :) Thanks!

 

Yeah, there were spaces that were replaced with line-breaks. I assumed that was a copy/paste error.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.