Jump to content

[SOLVED] Matching 2 strings of text


sug15

Recommended Posts

Let's say I have two strings of text:

"Does the dog jump over the lazy sheep or the spotted cow?"

and

"The dog jumps over the lazy sheep."

 

I want to match the similarities in the two sentences, in this case it would be "over the lazy sheep".

 

How can I do this?

Link to comment
Share on other sites

one way would be to split by the space characters, and then run an array_intersect against the results:

 

$first_sentence = 'Does the do jump over the lazy sheep or the spotted cow?';
$second_sentence = 'The dog jumps over the lazy sheep.';

$first_components = explode(' ', $first_sentence);
$second_components = explode(' ', $second_sentence);

$duplicate_words = array_intersect($first_components, $second_components);
print_r($duplicate_words);

 

note that this won't take into account periods and will be case-sensitive. to avoid case sensitivity, you can use strtolower against the original string, as well as str_replace to replace any characters you don't want interfering such as punctuation.

 

have a look in the manual at the other intersect functions if you want to play around with key preservation.

Link to comment
Share on other sites

$first_components = explode(' ', $first_sentence);

$second_components = explode(' ', $second_sentence);

 

Can also be written as (and may depending on the PHP implementation even be better):

 

$first_components = str_word_count($first_sentence, 1);
$second_components = str_word_count($second_sentence, 1);

Link to comment
Share on other sites

That's a pretty complex task. "The" also appears in both of them as well and so does "the dog".

 

Anyway, try to experiment with something like this:

 

<?php
$string1 = 'Does the dog jump over the lazy sheep or the spotted cow?';
$string2 = 'The dog jumps over the lazy sheep.';

$similarities = array_intersect(
explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string1))),
explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string2)))
);

print_r($similarities);

 

Edit: Someone beat me to it.

 

Edit 2: Why would this not be a "similarity"?

"Does the dog jump over the lazy sheep or the spotted cow?"

"The dog jumps over the lazy sheep."

Link to comment
Share on other sites

Edit 2: Why would this not be a "similarity"?

"Does the dog jump over the lazy sheep or the spotted cow?"

"The dog jumps over the lazy sheep."

 

Well using the array_intersect and explode will give you the words both strings have in common. But maybe the OP wants the substrings both strings have in common and want to do something else if it isn't:

 

"Does the dog jump over the lazy sheep or the spotted cow?"

"The lazy dog jumps over the sheep."

 

"over the lazy sheep" is only present in the first sentence and not in the second.

Link to comment
Share on other sites

Now I'm working on breaking up a string into all possible parts. For example "the dog jumps over the lazy sheep" would break into:

 

 

  • the
  • the dog jumps
  • the dog jumps over
  • the dog jumps over the
  • the dog jumps over the lazy
  • the dog jumps over the lazy sheep
  • dog
  • dog jumps
  • dog jumps over
  • dog jumps over the
  • dog jumps over the lazy
  • dog jumps over the lazy sheep
  • jumps
  • jumps over
  • jumps over the
  • jumps over the lazy
  • jumps over the lazy sheep
  • over
  • over the
  • over the lazy
  • over the lazy sheep
  • the
  • the lazy
  • the lazy sheep
  • lazy
  • lazy sheep
  • sheep

 

 

Long list :P and yes I want similar substrings but I think I can achieve that by modifying your methods. Then when I get the similarities I can find the one I want by simply checking which one is the longest. Anyone have a way to generate all possible word combinations?

Link to comment
Share on other sites

$words = str_word_count('the dog jumps over the lazy sheep', 1);
$sizeof = sizeof($words);
for ($k = 0; $k < $sizeof; ++$k) {//specifies offset
    for ($i = $k; $i < $sizeof; ++$i) {
        for ($j = $k; $j <= $i; ++$j) {
            echo $words[$j];
        }
        echo '<br>';
    }
}

 

Outputs:

 

the
thedog
thedogjumps
thedogjumpsover
thedogjumpsoverthe
thedogjumpsoverthelazy
thedogjumpsoverthelazysheep
dog
dogjumps
dogjumpsover
dogjumpsoverthe
dogjumpsoverthelazy
dogjumpsoverthelazysheep
jumps
jumpsover
jumpsoverthe
jumpsoverthelazy
jumpsoverthelazysheep
over
overthe
overthelazy
overthelazysheep
the
thelazy
thelazysheep
lazy
lazysheep
sheep

Link to comment
Share on other sites

Like this?

 

<?php
$string1 = 'Does the dog jump over the lazy sheep or the spotted cow?';
$string2 = 'The dog jumps over the lazy sheep.';

$similarities = array_intersect(
   explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string1))),
   explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string2)))
);

$similarities = array_values($similarities);

$count = count($similarities);
$matches = array();

for ($i = 0; $i < $count; ++$i) {
for ($x = 0, $xMax = $count - $i; $x < $xMax; ++$x) {
	$m = array();
	for ($j = $i, $jMax = $i + $x; $j <= $jMax; ++$j) {
		$m[] = $similarities[$j];
	}

	$m = join(' ', $m);
	if (!in_array($m, $matches)) {
		$matches[] = $m;
	}
}
}

print_r($matches);

Link to comment
Share on other sites

Alright, here's what I've come up with:

 

<?php
$words1 = str_word_count('does the dog jump over the lazy sheep or the spotted cow', 1);
$sizeof = sizeof($words1);
for ($k = 0; $k < $sizeof; ++$k) {//specifies offset
    for ($i = $k; $i < $sizeof; ++$i) {
        for ($j = $k; $j <= $i; ++$j) {
            $combination[] = $words1[$j];
        }
        $combinations1[] = implode(' ', $combination);
        unset($combination);
    }
}

$words2 = str_word_count('the dog jumps over the lazy sheep', 1);
$sizeof = sizeof($words2);
for ($k = 0; $k < $sizeof; ++$k) {//specifies offset
    for ($i = $k; $i < $sizeof; ++$i) {
        for ($j = $k; $j <= $i; ++$j) {
            $combination[] = $words2[$j];
        }
        $combinations2[] = implode(' ', $combination);
        unset($combination);
    }
}


$similarities = array_intersect(
   $combinations1,
   $combinations2
);

print_r($similarities);
?>

I still need to add some functions and tidy it up a bit and get the longest sentence, but it gets the job done.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.