ajay600 Posted March 15, 2010 Share Posted March 15, 2010 i need to identify the similar texts in two web pages and remove themm... i need to store the text that is between the <b> and</b> in an array...that is for example... <b> hello everyone</b> ..i need to store " hello everyone" in an array ... i will do it for all <b> tags in page 1 and then page 2 and then compare the text in both the pages to find similar matches and remove them .. please help me code it in php Link to comment https://forums.phpfreaks.com/topic/195273-identify-same-texts-in-2-web-pages/ Share on other sites More sharing options...
trq Posted March 15, 2010 Share Posted March 15, 2010 Where are you stuck? Post your code. Link to comment https://forums.phpfreaks.com/topic/195273-identify-same-texts-in-2-web-pages/#findComment-1026190 Share on other sites More sharing options...
slurpee Posted March 15, 2010 Share Posted March 15, 2010 Hmm, let's hope I understand you correctly. My code might be a little lazy, but it works for my simple example and might start you off on the right foot. First it matches all <b>.*</b> strings in $text1. Then replace any of those exact <b>.*</b> string found in text2 with an empty string. $text1 = "Hello <b>test 1</b> this is a <b>test 2</b> and <b>test 3</b>!"; $text2 = "Hello <b>test 1</b> this is a <b>test 4</b> and <b>test 2</b>!"; preg_match_all(":<b>.*</b>:msU",$text1,$m); print str_replace($m[0],"",$text2); Output: "Hello this is a <b>test 4</b> and !" Link to comment https://forums.phpfreaks.com/topic/195273-identify-same-texts-in-2-web-pages/#findComment-1026212 Share on other sites More sharing options...
ajay600 Posted March 16, 2010 Author Share Posted March 16, 2010 i found out all the b tags ..but how do i take the text content from the b tags so that i can compare the tes=xts and remove the repeated texts..please help <?php $doc = new DOMDocument(); // An instance of DOMDocument @$doc->loadHTMLFile('http://www.web-source.net/web_design_tips/'); $doc2 = new DOMDocument(); // An instance of DOMDocument @$doc2->loadHTMLFile('http://www.web-source.net/html_codes_chart.htm'); $xpath = new DOMXPath($doc); $xpath2 = new DOMXPath($doc2); $List=array(); $List2=array(); $List[] = $doc->getElementsByTagName("b"); $List2[] = $doc2->getElementsByTagName("b"); $textBoth = array_intersect($List, $List2); foreach ($textBoth as $text) { // Loops through the src strings that are common to both documents $text->parentNode->removeChild($text); } echo $doc->saveHTML(); ?> Link to comment https://forums.phpfreaks.com/topic/195273-identify-same-texts-in-2-web-pages/#findComment-1026802 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.