sug15 Posted September 1, 2009 Share Posted September 1, 2009 Let's say I have two strings of text: "Does the dog jump over the lazy sheep or the spotted cow?" and "The dog jumps over the lazy sheep." I want to match the similarities in the two sentences, in this case it would be "over the lazy sheep". How can I do this? Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/ Share on other sites More sharing options...
akitchin Posted September 1, 2009 Share Posted September 1, 2009 one way would be to split by the space characters, and then run an array_intersect against the results: $first_sentence = 'Does the do jump over the lazy sheep or the spotted cow?'; $second_sentence = 'The dog jumps over the lazy sheep.'; $first_components = explode(' ', $first_sentence); $second_components = explode(' ', $second_sentence); $duplicate_words = array_intersect($first_components, $second_components); print_r($duplicate_words); note that this won't take into account periods and will be case-sensitive. to avoid case sensitivity, you can use strtolower against the original string, as well as str_replace to replace any characters you don't want interfering such as punctuation. have a look in the manual at the other intersect functions if you want to play around with key preservation. Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910446 Share on other sites More sharing options...
ignace Posted September 1, 2009 Share Posted September 1, 2009 What do you exactly mean by similarities? Just if one string contains a word another string has in common? Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910447 Share on other sites More sharing options...
ignace Posted September 1, 2009 Share Posted September 1, 2009 $first_components = explode(' ', $first_sentence); $second_components = explode(' ', $second_sentence); Can also be written as (and may depending on the PHP implementation even be better): $first_components = str_word_count($first_sentence, 1); $second_components = str_word_count($second_sentence, 1); Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910449 Share on other sites More sharing options...
Daniel0 Posted September 1, 2009 Share Posted September 1, 2009 That's a pretty complex task. "The" also appears in both of them as well and so does "the dog". Anyway, try to experiment with something like this: <?php $string1 = 'Does the dog jump over the lazy sheep or the spotted cow?'; $string2 = 'The dog jumps over the lazy sheep.'; $similarities = array_intersect( explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string1))), explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string2))) ); print_r($similarities); Edit: Someone beat me to it. Edit 2: Why would this not be a "similarity"? "Does the dog jump over the lazy sheep or the spotted cow?" "The dog jumps over the lazy sheep." Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910450 Share on other sites More sharing options...
sug15 Posted September 1, 2009 Author Share Posted September 1, 2009 Thanks a lot everyone. I'll play around with array_intersect and the code you guys gave me and see what I come up with. Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910455 Share on other sites More sharing options...
ignace Posted September 1, 2009 Share Posted September 1, 2009 Edit 2: Why would this not be a "similarity"? "Does the dog jump over the lazy sheep or the spotted cow?" "The dog jumps over the lazy sheep." Well using the array_intersect and explode will give you the words both strings have in common. But maybe the OP wants the substrings both strings have in common and want to do something else if it isn't: "Does the dog jump over the lazy sheep or the spotted cow?" "The lazy dog jumps over the sheep." "over the lazy sheep" is only present in the first sentence and not in the second. Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910459 Share on other sites More sharing options...
Daniel0 Posted September 1, 2009 Share Posted September 1, 2009 Right, but there are several substrings that the two strings have in common. When is it enough to regard it as a match? Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910461 Share on other sites More sharing options...
ignace Posted September 1, 2009 Share Posted September 1, 2009 Right, but there are several substrings that the two strings have in common. When is it enough to regard it as a match? Don't know I was just asking what he meant with similarities. Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910463 Share on other sites More sharing options...
sug15 Posted September 1, 2009 Author Share Posted September 1, 2009 Now I'm working on breaking up a string into all possible parts. For example "the dog jumps over the lazy sheep" would break into: the the dog jumps the dog jumps over the dog jumps over the the dog jumps over the lazy the dog jumps over the lazy sheep dog dog jumps dog jumps over dog jumps over the dog jumps over the lazy dog jumps over the lazy sheep jumps jumps over jumps over the jumps over the lazy jumps over the lazy sheep over over the over the lazy over the lazy sheep the the lazy the lazy sheep lazy lazy sheep sheep Long list and yes I want similar substrings but I think I can achieve that by modifying your methods. Then when I get the similarities I can find the one I want by simply checking which one is the longest. Anyone have a way to generate all possible word combinations? Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910468 Share on other sites More sharing options...
ignace Posted September 1, 2009 Share Posted September 1, 2009 $words = str_word_count('the dog jumps over the lazy sheep', 1); $sizeof = sizeof($words); for ($k = 0; $k < $sizeof; ++$k) {//specifies offset for ($i = $k; $i < $sizeof; ++$i) { for ($j = $k; $j <= $i; ++$j) { echo $words[$j]; } echo '<br>'; } } Outputs: the thedog thedogjumps thedogjumpsover thedogjumpsoverthe thedogjumpsoverthelazy thedogjumpsoverthelazysheep dog dogjumps dogjumpsover dogjumpsoverthe dogjumpsoverthelazy dogjumpsoverthelazysheep jumps jumpsover jumpsoverthe jumpsoverthelazy jumpsoverthelazysheep over overthe overthelazy overthelazysheep the thelazy thelazysheep lazy lazysheep sheep Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910470 Share on other sites More sharing options...
Daniel0 Posted September 1, 2009 Share Posted September 1, 2009 Like this? <?php $string1 = 'Does the dog jump over the lazy sheep or the spotted cow?'; $string2 = 'The dog jumps over the lazy sheep.'; $similarities = array_intersect( explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string1))), explode(' ', preg_replace('#[^a-z ]#', '', strtolower($string2))) ); $similarities = array_values($similarities); $count = count($similarities); $matches = array(); for ($i = 0; $i < $count; ++$i) { for ($x = 0, $xMax = $count - $i; $x < $xMax; ++$x) { $m = array(); for ($j = $i, $jMax = $i + $x; $j <= $jMax; ++$j) { $m[] = $similarities[$j]; } $m = join(' ', $m); if (!in_array($m, $matches)) { $matches[] = $m; } } } print_r($matches); Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910472 Share on other sites More sharing options...
sug15 Posted September 1, 2009 Author Share Posted September 1, 2009 Awesome, thanks guys. Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910477 Share on other sites More sharing options...
sug15 Posted September 1, 2009 Author Share Posted September 1, 2009 Alright, here's what I've come up with: <?php $words1 = str_word_count('does the dog jump over the lazy sheep or the spotted cow', 1); $sizeof = sizeof($words1); for ($k = 0; $k < $sizeof; ++$k) {//specifies offset for ($i = $k; $i < $sizeof; ++$i) { for ($j = $k; $j <= $i; ++$j) { $combination[] = $words1[$j]; } $combinations1[] = implode(' ', $combination); unset($combination); } } $words2 = str_word_count('the dog jumps over the lazy sheep', 1); $sizeof = sizeof($words2); for ($k = 0; $k < $sizeof; ++$k) {//specifies offset for ($i = $k; $i < $sizeof; ++$i) { for ($j = $k; $j <= $i; ++$j) { $combination[] = $words2[$j]; } $combinations2[] = implode(' ', $combination); unset($combination); } } $similarities = array_intersect( $combinations1, $combinations2 ); print_r($similarities); ?> I still need to add some functions and tidy it up a bit and get the longest sentence, but it gets the job done. Quote Link to comment https://forums.phpfreaks.com/topic/172731-solved-matching-2-strings-of-text/#findComment-910486 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.