Jump to content

[SOLVED] Parsing HTML


soycharliente

Recommended Posts

EDIT: I MARKED LINE 25

 

I have been trying to create a page that will eliminate everything between the < script > tags on a page and then render all the HTML that is leftover. For some reason, the page won't load for me, but when I copy the source, paste it in a text editor, manually delete all the JavaScript, and then just render what is left over, it works fine.

 

I have an error and I don't know why I'm getting this error. I don't know what is incorrect about the code.

Parse error: parse error, unexpected ':', expecting ']' in /fantasypoints.php on line 25

 

<?php
if (isset($_POST["submiturl"]))
{
	$url = $_POST["url"];

	// create a new cURL resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

	// grab URL and pass it to the browser
	$source = curl_exec($ch);

	// close cURL resource, and free up system resources
	curl_close($ch);

	$script_start = strpos($source, '<script');
	while ($script_start > 0)
	{
		$script_start = strpos($source, '<script');
		$script_end = strpos($source, '</script>', $script_start) + 9;
		$source = $source[:$script_start] + $source[$script_end:]; // THIS IS LINE 25
	}
}
?> 
<html>
<body>

<form action="fantasypoints.php" method="post">
<p>Match URL: <input name="url" type="text" value="<?php echo $url; ?>" maxlength="500" size="50" /></p>
<p><input name="submiturl" type="submit" value="Submit" /></p>
</form>

<?php echo $source; ?>

</body>
</html>

Link to comment
https://forums.phpfreaks.com/topic/73222-solved-parsing-html/
Share on other sites

I've made some progress. I've got the substring part down, but now it never exits the while loop and the $source isn't being updated and I can't figure out why.

 

Here's the code thus far:

<?php
if (isset($_POST["submiturl"]))
{
	$url = $_POST["url"];

	// create a new cURL resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

	// grab URL and pass it to the browser
	$source = curl_exec($ch);

	// close cURL resource, and free up system resources
	curl_close($ch);

	while (strpos($source, '<script') !== FALSE)
	{
		$script_start = strpos($source, '<script');
		$script_end = strpos($source, '</script>', $script_start) + 9;
		$source = substr($source, 0, $script_start) .
			substr($source, $script_start, $script_end - $script_start);
	}

	echo $source;
}
?>

<form action="fantasypoints.php" method="post">
<p>Match URL: <input name="url" type="text" value="<?php echo $url; ?>" maxlength="500" size="50" /></p>
<p><input name="submiturl" type="submit" value="Submit" /></p>
</form>

Link to comment
https://forums.phpfreaks.com/topic/73222-solved-parsing-html/#findComment-369682
Share on other sites

This might work for you:

 

<?php
$url = "removejs.htm";
$input = @file_get_contents($url) or die('Could not access file: $url');
$regexp = "(.*)<script(.*)<\/script>(.*)";
if(preg_match_all("/$regexp/si", $input, $matches)) {
	echo $matches[1][0];
	echo $matches[3][0];
	unset($matches[1][0]);
	unset($matches[2][0]);
	unset($matches[3][0]);
}
?>

 

I forgot the while loop for multiple instances, but wouldn't be hard to do

Link to comment
https://forums.phpfreaks.com/topic/73222-solved-parsing-html/#findComment-369696
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.