Jump to content

split compound words in a random string


newb

Recommended Posts

ok,

but the string is always some random combination of joined words like 'ohmygod' and 'laughoutloud' never always the same. however i need the words to be split up not joined.

 

the function you've sent me to says i need to have a set array of words to check against in order for it to work. however, the string could be any combination of joined words.  so i think this is no good for me..

 

I am by no means an expert here, but I *think* that this would require some kind of dictionary of words that a script could look up, as PHP itself does not know what would constitute each word in the list.

 

For example, 'laughoutloud' would require a dictionary that had 'laugh', 'out' and 'loud' in it so your script could possibly extract the string based on each word.  You could possibly do something similar to (tho this is a very long method and not very efficient):

 

$word = "laughoutloud";

$dict1 = "laugh";
$dict2 = "out";
$dict3 = "loud";

// Get length of string
$len = mb_strlen($word);

$pos = 0;
$string = "";

while($post < $len)
{
   $string = $string . substr($word, $pos, 1);
   $pos = $pos + 1;
   if($string == $dict1 || $string == $dict2 || $string == $dict3)
   {
      break;
   } 
}

echo $string;

 

This will output $string as "laugh".  Of course you'd have to modify this code so that it will do the rest of the word, as this figures out the first word and then quits the while loop, meaning you'd have to make it continue to process the "outloud" part of the initial string.  You'd also have to develop the dictionary too.  How does the word get constructed in the first place? If it has a list of word that it strings together then this could serve as your dictionary.

well ive wrote up something using pspell but its pretty sloppy but it works..its able to split 3 joined words max as thats all i require..

 

if anyone else thinks they can write something better let me know..

 

<?
$sentence = "laughoutloud"; 


function extractwords($sentence) {

$pspell_link = pspell_new("en_us");

$size = strlen($sentence); 
    for ($i = 0; $i < $size-1; $i++)   {
         
    $currentword = substr_replace($sentence ,"",($size - $i));
           if (pspell_check($pspell_link, $currentword)) {   
            $firstword = $currentword; 
            $remaining = substr($sentence, strlen($firstword));
            //echo  "currentword:$currentword<br>";
            //echo  "remaining:$remaining $secword<br>";   
		break;
            } 
    }

$size = strlen($remaining); 
    for ($i = 0; $i < $size-1; $i++)   {
         
    $secword = substr_replace($remaining ,"",($size - $i));
           if (pspell_check($pspell_link, $secword)) {   
            $secondword = $secword; 
            $thirdword = substr($remaining, strlen($secondword));
            //echo  "currentword:$currentword<br>";
            //echo  "remaining:$remaining $secword<br>";   
		break;
            } 
    }

echo "$firstword<br />";
echo "$secondword<br />";
echo "$thirdword<br />";

}


extractwords($sentence);

?>

I am no expert by an means, so the code example I have given is just basic.  It looks like you are onto the kind of idea, the only problem that I have just thought of with this which I don't know how you would be able to solve is with words that comprise of other words, e.g.:

 

yourself = your and self.  Using my dictionary idea, if it detected the word "YOUR" it would separate that out and you would end up with YOUR and SELF as two different words with a space between instead of one whole word. 

 

The only other suggestion I can offer is if there is a limited number of possibilities that your URLs will give, then you could "hard code" or set up a database for the wordage... e.g.

 

$sentance = "laughoutloud";

 

if ($sentance = "laughoutloud")

  $word = "Laugh Out Loud";

 

if ($sentance = "ohyummyihavepizza")

  $word = "Oh yummy I have pizza"

 

and so on...

 

I am sorry I cannot give any further assistance on this one because I have no other clue how else to tackle this... good luck tho!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.