Jump to content

[SOLVED] Parsing HTML


soycharliente

Recommended Posts

EDIT: I MARKED LINE 25

 

I have been trying to create a page that will eliminate everything between the < script > tags on a page and then render all the HTML that is leftover. For some reason, the page won't load for me, but when I copy the source, paste it in a text editor, manually delete all the JavaScript, and then just render what is left over, it works fine.

 

I have an error and I don't know why I'm getting this error. I don't know what is incorrect about the code.

Parse error: parse error, unexpected ':', expecting ']' in /fantasypoints.php on line 25

 

<?php
if (isset($_POST["submiturl"]))
{
	$url = $_POST["url"];

	// create a new cURL resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

	// grab URL and pass it to the browser
	$source = curl_exec($ch);

	// close cURL resource, and free up system resources
	curl_close($ch);

	$script_start = strpos($source, '<script');
	while ($script_start > 0)
	{
		$script_start = strpos($source, '<script');
		$script_end = strpos($source, '</script>', $script_start) + 9;
		$source = $source[:$script_start] + $source[$script_end:]; // THIS IS LINE 25
	}
}
?> 
<html>
<body>

<form action="fantasypoints.php" method="post">
<p>Match URL: <input name="url" type="text" value="<?php echo $url; ?>" maxlength="500" size="50" /></p>
<p><input name="submiturl" type="submit" value="Submit" /></p>
</form>

<?php echo $source; ?>

</body>
</html>

Link to comment
Share on other sites

I've made some progress. I've got the substring part down, but now it never exits the while loop and the $source isn't being updated and I can't figure out why.

 

Here's the code thus far:

<?php
if (isset($_POST["submiturl"]))
{
	$url = $_POST["url"];

	// create a new cURL resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

	// grab URL and pass it to the browser
	$source = curl_exec($ch);

	// close cURL resource, and free up system resources
	curl_close($ch);

	while (strpos($source, '<script') !== FALSE)
	{
		$script_start = strpos($source, '<script');
		$script_end = strpos($source, '</script>', $script_start) + 9;
		$source = substr($source, 0, $script_start) .
			substr($source, $script_start, $script_end - $script_start);
	}

	echo $source;
}
?>

<form action="fantasypoints.php" method="post">
<p>Match URL: <input name="url" type="text" value="<?php echo $url; ?>" maxlength="500" size="50" /></p>
<p><input name="submiturl" type="submit" value="Submit" /></p>
</form>

Link to comment
Share on other sites

This might work for you:

 

<?php
$url = "removejs.htm";
$input = @file_get_contents($url) or die('Could not access file: $url');
$regexp = "(.*)<script(.*)<\/script>(.*)";
if(preg_match_all("/$regexp/si", $input, $matches)) {
	echo $matches[1][0];
	echo $matches[3][0];
	unset($matches[1][0]);
	unset($matches[2][0]);
	unset($matches[3][0]);
}
?>

 

I forgot the while loop for multiple instances, but wouldn't be hard to do

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.