Jump to content

split compound words in a random string


newb

Recommended Posts

how can i split compound words in a random php string

 

for example, i have strings such as

 

$word = 'ohmygod'

 

is it possible for php to detect the individual words as joined together and make them to form:

 

$word = 'oh my god'

 

thanks.

Link to comment
Share on other sites

ok,

but the string is always some random combination of joined words like 'ohmygod' and 'laughoutloud' never always the same. however i need the words to be split up not joined.

 

the function you've sent me to says i need to have a set array of words to check against in order for it to work. however, the string could be any combination of joined words.  so i think this is no good for me..

 

Link to comment
Share on other sites

I am by no means an expert here, but I *think* that this would require some kind of dictionary of words that a script could look up, as PHP itself does not know what would constitute each word in the list.

 

For example, 'laughoutloud' would require a dictionary that had 'laugh', 'out' and 'loud' in it so your script could possibly extract the string based on each word.  You could possibly do something similar to (tho this is a very long method and not very efficient):

 

$word = "laughoutloud";

$dict1 = "laugh";
$dict2 = "out";
$dict3 = "loud";

// Get length of string
$len = mb_strlen($word);

$pos = 0;
$string = "";

while($post < $len)
{
   $string = $string . substr($word, $pos, 1);
   $pos = $pos + 1;
   if($string == $dict1 || $string == $dict2 || $string == $dict3)
   {
      break;
   } 
}

echo $string;

 

This will output $string as "laugh".  Of course you'd have to modify this code so that it will do the rest of the word, as this figures out the first word and then quits the while loop, meaning you'd have to make it continue to process the "outloud" part of the initial string.  You'd also have to develop the dictionary too.  How does the word get constructed in the first place? If it has a list of word that it strings together then this could serve as your dictionary.

Link to comment
Share on other sites

the compound word strings are from URL names that are joined together for example phpfreaks.com but id like to put a space inbetween 'php' and 'freaks' somehow.

 

also how could i add a dictionary list to the php script..

Link to comment
Share on other sites

well ive wrote up something using pspell but its pretty sloppy but it works..its able to split 3 joined words max as thats all i require..

 

if anyone else thinks they can write something better let me know..

 

<?
$sentence = "laughoutloud"; 


function extractwords($sentence) {

$pspell_link = pspell_new("en_us");

$size = strlen($sentence); 
    for ($i = 0; $i < $size-1; $i++)   {
         
    $currentword = substr_replace($sentence ,"",($size - $i));
           if (pspell_check($pspell_link, $currentword)) {   
            $firstword = $currentword; 
            $remaining = substr($sentence, strlen($firstword));
            //echo  "currentword:$currentword<br>";
            //echo  "remaining:$remaining $secword<br>";   
		break;
            } 
    }

$size = strlen($remaining); 
    for ($i = 0; $i < $size-1; $i++)   {
         
    $secword = substr_replace($remaining ,"",($size - $i));
           if (pspell_check($pspell_link, $secword)) {   
            $secondword = $secword; 
            $thirdword = substr($remaining, strlen($secondword));
            //echo  "currentword:$currentword<br>";
            //echo  "remaining:$remaining $secword<br>";   
		break;
            } 
    }

echo "$firstword<br />";
echo "$secondword<br />";
echo "$thirdword<br />";

}


extractwords($sentence);

?>

Link to comment
Share on other sites

I am no expert by an means, so the code example I have given is just basic.  It looks like you are onto the kind of idea, the only problem that I have just thought of with this which I don't know how you would be able to solve is with words that comprise of other words, e.g.:

 

yourself = your and self.  Using my dictionary idea, if it detected the word "YOUR" it would separate that out and you would end up with YOUR and SELF as two different words with a space between instead of one whole word. 

 

The only other suggestion I can offer is if there is a limited number of possibilities that your URLs will give, then you could "hard code" or set up a database for the wordage... e.g.

 

$sentance = "laughoutloud";

 

if ($sentance = "laughoutloud")

  $word = "Laugh Out Loud";

 

if ($sentance = "ohyummyihavepizza")

  $word = "Oh yummy I have pizza"

 

and so on...

 

I am sorry I cannot give any further assistance on this one because I have no other clue how else to tackle this... good luck tho!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.