Jump to content

identify same texts in 2 web pages


ajay600

Recommended Posts

i need to identify the similar texts in two web pages and remove themm...

i need to store the text that is between the <b> and</b> in an array...that is for example... <b> hello everyone</b> ..i need to store " hello everyone" in an array ...

 

i will do it for all <b> tags in page 1 and then page 2 and then compare the text in both the pages to find similar matches and remove them ..

 

please help me code it in php

Link to comment
https://forums.phpfreaks.com/topic/195273-identify-same-texts-in-2-web-pages/
Share on other sites

Hmm, let's hope I understand you correctly.  My code might be a little lazy, but it works for my simple example and might start you off on the right foot.  First it matches all <b>.*</b> strings in $text1.  Then replace any of those exact <b>.*</b> string found in text2 with an empty string.

 

$text1 = "Hello <b>test 1</b> this is a <b>test 2</b> and <b>test 3</b>!";
$text2 = "Hello <b>test 1</b> this is a <b>test 4</b> and <b>test 2</b>!";
preg_match_all(":<b>.*</b>:msU",$text1,$m);
print str_replace($m[0],"",$text2);

 

Output: "Hello  this is a <b>test 4</b> and !"

i found out all the b tags ..but how do i take the text content from the b tags so that i can compare the tes=xts and remove the repeated texts..please help

 

<?php	

$doc = new DOMDocument(); // An instance of DOMDocument
@$doc->loadHTMLFile('http://www.web-source.net/web_design_tips/');

$doc2 = new DOMDocument(); // An instance of DOMDocument
@$doc2->loadHTMLFile('http://www.web-source.net/html_codes_chart.htm');

$xpath = new DOMXPath($doc);
$xpath2 = new DOMXPath($doc2);

$List=array();

$List2=array();

$List[] = $doc->getElementsByTagName("b"); 
$List2[] = $doc2->getElementsByTagName("b"); 

$textBoth = array_intersect($List, $List2);

foreach ($textBoth as $text) 
{ // Loops through the src strings that are common to both documents
  
$text->parentNode->removeChild($text);

}
echo $doc->saveHTML();

?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.