I have a form where users enter a URL and it will need to pull data from the page (all information is within a div that has class="page-main-content"). I need to select the first occurance of an H1 element, along with a handful of other HTML elements. Can anyone help? I have this code as my test.php page. Then in the URL bar for the form, I enter in test2.php
test.php
<?php
if(true)
{
if(!isset($_POST['submit']))
{
?>
<form action="<?php echo htmlspecialchars($_SERVER["PHP_SELF"]); ?>" method="post">
<label for="url">Enter the URL of the article:</label> <input id="url" name="URL" type="text" />
<label for="submit"><input id="submit" class="button" name="submit" type="submit" /></form>
<?php
}
else if(filter_var($_POST['URL'], FILTER_VALIDATE_URL) === false)
{
?>
<div class="error"><p>Error: The URL you entered was invalid. Please try again</p></div>
<form action="<?php echo htmlspecialchars($_SERVER["PHP_SELF"]); ?>" method="post">
<label for="url">Enter the URL of the article:</label> <input id="url" name="URL" type="text" />
<label for="submit"><input id="submit" class="button" name="submit" type="submit" /></form>
<?php
}
else
{
$url=$_POST['URL'];
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
$doc->loadHTMLFile($url);
$emailContents=array();
$xpath=new DomXPath($doc);
$h1Found=false;
//Find element with class="page-main-content"
$results=$xpath->query("//*[contains(@class, 'page-main-content')]");
if (!is_null($results))
{
foreach ($results as $element)
{
$nodes = $element->childNodes;
foreach ($nodes as $node)
{
if(trim($node->textContent, " \n\r\t\0\xC2\xA0")!=='' && $node->nodeName==='h1' && !$h1Found)
{
echo "THIS IS FINDING THE H1-END".$node->textContent."<br>";
$h1Found=true;
}
elseif(trim($node->textContent, " \n\r\t\0\xC2\xA0")!=='')
{
echo $node->textContent. "<br>";
}
}
}
}
}
}
?>
Please ignore the stupid stuff like if(true) because I removed the condition for security reasons.
test2.php
<html>
<head>
<title>My Page</title>
</head>
<body>
<div class="page-main-content">
<h1>h1 test</h1>
<h1>h1 test</h1>
<p><a href="mypage1.html">Hello World!</a></p>
<p><a href="mypage2.html">Another Hello World!</a></p>
</div>
<p>THIS SHOULD NOT BE OUTPUTTED</p>
</body>
</html>
Again, ignore the poor HTML, this is purely for testing purposes. Please help.