natasha_thomas Posted March 2, 2011 Share Posted March 2, 2011 Folks, I need some help with the PHP code to achieve the below requirement: I want to manipulate a string in the below way: 1- First remove all special characters and punctuations from string. (Clean the string) (output: $semiclean) 2- Remove all alpha-numeric and Numeric Words from the String. (output: $cleanedstr) 3- Now, lets make a new String which will have only the First word and last two words from the Cleaned string. Example: Input String: Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10 We need to remove all punctuations (#, ! so on) and all alphanumeric words (age10) and all numeric words (15). Then lets make a new string with only First and Last words of the Cleaned string: Desired output: Cuckoo bath toy This is what i want to achieve, i tried to code my own, but its too crude and buggy, not even been able to remove any alphanumeric words. :-(.... Can someone please help me out with this? Cheers Natasha Thomas Quote Link to comment Share on other sites More sharing options...
trq Posted March 2, 2011 Share Posted March 2, 2011 Can someone please help me out with this? Post your problematic code and a description of what is problematic. Quote Link to comment Share on other sites More sharing options...
natasha_thomas Posted March 2, 2011 Author Share Posted March 2, 2011 <?php $string="Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10"; $string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string); echo $string; ?> I am able to strip off all the special characters, but not able to remove all alphanumeric words. (I am able to achieve Requirement 1 but not Requirement No 2 and 3 from my First post.) Quote Link to comment Share on other sites More sharing options...
natasha_thomas Posted March 2, 2011 Author Share Posted March 2, 2011 <?php $string="Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10"; $string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string); echo $string; ?> I am able to strip off all the special characters, but not able to remove all alphanumeric words. (I am able to achieve Requirement 1 but not Requirement No 2 and 3 from my First post.) Here is where i had bug: <?php $string="Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10"; $string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string); $arraystr = array(); $newstr = array(); $arraystr = explode(' ',$string); foreach ($arraystr as $key=>$value) { $str1 = preg_replace("/[^0-9s]/", " ", $value); if ($str1) { unset($value[$key]); } else { $newstr = $value; } } print_r( $newstr); ?> It Outputs nothing.. Quote Link to comment Share on other sites More sharing options...
sasa Posted March 2, 2011 Share Posted March 2, 2011 <?php $string = 'Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10'; $semiclean = preg_replace('/[^a-z0-9 ]/i', '', $string); $clean = trim(preg_replace('/\b[a-z0-9]*[0-9][a-z0-9]* ?\b/i', '', $semiclean)); echo $final = preg_replace('/^([a-z]+)\b.*([a-z]+).\b([a-z]+)$/i', '\1 \2 \3 ', $clean); ?> Quote Link to comment Share on other sites More sharing options...
natasha_thomas Posted March 2, 2011 Author Share Posted March 2, 2011 <?php $string = 'Cuckoo Alex# Rub a 15 Dub Transportation Squirters for the Tub bath toy! age10'; $semiclean = preg_replace('/[^a-z0-9 ]/i', '', $string); $clean = trim(preg_replace('/\b[a-z0-9]*[0-9][a-z0-9]* ?\b/i', '', $semiclean)); echo $final = preg_replace('/^([a-z]+)\b.*([a-z]+).\b([a-z]+)$/i', '\1 \2 \3 ', $clean); ?> Thank you Sasa, At last i was able to make it work with the belwo Code: <?php $string="Cuckoo Alex# Rub a 15 Dub T%ransportation Squ10irters for the Tub bath toy! age10"; $string = preg_replace("/[^a-zA-Z0-9s]/", " ", $string); $string = preg_replace('/\S*[^a-zA-Z\s,\.]+\S*/', '', $string); print "$string\n"; ?> Now i am stuck at another thing, i have a text file with all the StopWords, i want to remove any occurance of Stopword form the Final String which matches with any word from that Text file of Stop Words. stopwords.txt has all bad words and its on root folder. I used file(stopwords); to make an arry of words. I knwo i can do this filtering it with foreach(), but is there more optimal way to handle Filtering out of badwords from Array? Cheers NT Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.