Jump to content

Cleaning a String and Extracting only First and last two words in New String


Recommended Posts

Folks,

 

I need some help with the PHP code to achieve the below requirement:

 

I want to manipulate a string in the below way:

 

1- First remove all special characters and punctuations from string. (Clean the string)

(output: $semiclean)

 

2-  Remove all alpha-numeric and Numeric Words from the String.

(output: $cleanedstr)

 

3- Now, lets make a new String which will have only the First word and last two words from the Cleaned string.

 

Example:

 

Input String:

 

Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10

 

We need to remove all punctuations (#, ! so on) and all alphanumeric words (age10) and all numeric words (15).

 

Then lets make a new string with only First and Last words of the Cleaned string:

 

Desired output:

  Cuckoo bath toy

 

This is what i want to achieve, i tried to code my own, but its too crude and buggy, not even been able to remove any alphanumeric words.  :-(....

 

Can someone please help me out with this?

 

Cheers

Natasha Thomas

 

 

 

<?php

$string="Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10";

 

$string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string);

echo $string;

?>

 

I am able to strip off all the special characters, but not able to remove all alphanumeric words. (I am able to achieve Requirement 1 but not Requirement No 2 and 3 from my First post.)

<?php

$string="Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10";

 

$string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string);

echo $string;

?>

 

I am able to strip off all the special characters, but not able to remove all alphanumeric words. (I am able to achieve Requirement 1 but not Requirement No 2 and 3 from my First post.)

 

 

Here is where i had bug:

 

<?php
$string="Cuckoo Alex# Rub   a 15 Dub Transportation Squirters for the Tub bath toy! age10";

$string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string);

$arraystr = array();
$newstr =  array();

$arraystr = explode(' ',$string);

foreach ($arraystr as $key=>$value)
{
    $str1 = preg_replace("/[^0-9s]/", " ", $value);
    
    if ($str1)
    {
        unset($value[$key]);
    }
    else
    {
     $newstr = $value;   
    }
}



print_r( $newstr);
?> 

 

It Outputs nothing..

<?php
$string = 'Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10';
$semiclean = preg_replace('/[^a-z0-9 ]/i', '', $string);
$clean = trim(preg_replace('/\b[a-z0-9]*[0-9][a-z0-9]* ?\b/i', '', $semiclean));
echo $final = preg_replace('/^([a-z]+)\b.*([a-z]+).\b([a-z]+)$/i', '\1 \2 \3 ', $clean);
?> 

<?php
$string = 'Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10';
$semiclean = preg_replace('/[^a-z0-9 ]/i', '', $string);
$clean = trim(preg_replace('/\b[a-z0-9]*[0-9][a-z0-9]* ?\b/i', '', $semiclean));
echo $final = preg_replace('/^([a-z]+)\b.*([a-z]+).\b([a-z]+)$/i', '\1 \2 \3 ', $clean);
?> 

 

Thank you Sasa,

 

At last i was able to make it work with the belwo Code:

 

<?php
$string="Cuckoo Alex# Rub   a 15 Dub T%ransportation Squ10irters for the Tub bath toy! age10";

$string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string);


$string = preg_replace('/\S*[^a-zA-Z\s,\.]+\S*/', '', $string);

print "$string\n";




?> 


 

 

Now i am stuck at another thing, i have a text file with all the StopWords, i want to remove any occurance of Stopword form the Final String which matches with any word from that Text file of Stop Words.

 

stopwords.txt has all bad words and its on root folder.

 

I used file(stopwords); to make an arry of words.

 

I knwo i can do this filtering it with foreach(), but is there more optimal way to handle Filtering out of badwords from Array?

 

Cheers

NT

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.