Jump to content

Recommended Posts

Hello, so im trying to use Curl to connect to a website, but when i try to return the url, i get redirected and it ends up redirecting me back to "myowndomain.com"/back-soon.

is there a way to see why the site is redirecting when using Curl, it doesnt redirect when i connect to the website through a usual web browser..

 

code:

$url = "https://groceries.asda.com/";

$main_page = curlFunction($url);

echo $main_page;

function curlFunction($url)
{
	$cookie_file = "cookie.txt";
	
    // Assigning cURL options to an array
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
        CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
        CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
        CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
        CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
        CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
        CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  // Setting the useragent
        CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
		//this is for the cookie.
		CURLOPT_COOKIESESSION => TRUE,
		CURLOPT_COOKIEFILE => $cookie_file,
		CURLOPT_COOKIEJAR => $cookie_file,
    );
	
    $ch = curl_init();  // Initialising cURL
    curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}

Any help would be great.

sean

Your code is just dumping the HTML out to your browser. If there is any redirect then your browser is doing it, so use your browser's tools to find out why. For example, have it track HTTP requests, find the one that performs the redirect, and see what originated it.

Your code is just dumping the HTML out to your browser. If there is any redirect then your browser is doing it, so use your browser's tools to find out why. For example, have it track HTTP requests, find the one that performs the redirect, and see what originated it.

Thanks for your response,

i have just now installed and tried it with httprequester on firefox,

it appears to display the website without any further redirects.

Edited by seany123

Well, when I tried your code I got a page that was empty except for the logo and black background for the navigation bar, so...

Thanks for the response.

That's strange, im running the exact code and its redirecting.

maybe this a something with my servers settings causing the redirect.

I know that you didn't show the literal contents of some file because your post is missing the opening <?php. Is there anything else you didn't include?

 

this is exactly the code i'm running:

<?php
//error_reporting(E_ALL);
//ini_set('max_execution_time', 0);

$url = "https://groceries.asda.com/";

$main_page = Acurl($url);

echo $main_page;

function Acurl($url)
{
	//$cookie_file = "cookie.txt";
	
    // Assigning cURL options to an array
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
        CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
        CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
        CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
        CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
        CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
        CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  // Setting the useragent
        CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
		//this is for the cookie.
		CURLOPT_COOKIESESSION => TRUE,
		CURLOPT_COOKIEFILE => $cookie_file,
		CURLOPT_COOKIEJAR => $cookie_file,
    );
	
    $ch = curl_init();  // Initialising cURL
    curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}

strange that even with your test it wasn't showing the entire webpage? just parts of the nav bar?

might there be something the website is doing to block a curl connection?

 

<?php

//all your curl stuff here

print_r(curl_getinfo($ch));

Might give some kind of clue.

 

Ok i will give that a try and see what it returns.

strange that even with your test it wasn't showing the entire webpage? just parts of the nav bar?

might there be something the website is doing to block a curl connection?

Have you looked at the HTML source of the page you're trying to copy? It references stylesheets and images and code. When I run your code from my "website" most of those references will break because the files don't exist, and it happens to break in such a way that most of the page is missing/not visible.

 

Unless you cloned the assorted other files, I would expect the same when you try it. The redirect is completely random.

 

 

Here's another thing to try: running it locally. Start the built-in server

php -S localhost:8000 -t /path/to/where/your/test/file/is
then go to http://localhost:8000/file.php and see what happens.

Have you looked at the HTML source of the page you're trying to copy? It references stylesheets and images and code. When I run your code from my "website" most of those references will break because the files don't exist, and it happens to break in such a way that most of the page is missing/not visible.

 

Unless you cloned the assorted other files, I would expect the same when you try it. The redirect is completely random.

 

 

Here's another thing to try: running it locally. Start the built-in server

php -S localhost:8000 -t /path/to/where/your/test/file/is
then go to http://localhost:8000/file.php and see what happens.

 

 

to be completely honest i haven't run into this problem before, usually i'm able to use the function to display the website then i use a simple scraping function which scrapes parts out from the source code.

 

I thought curl downloaded the source code after the includes had been created, so didn't realize that include paths made a difference.

 

I will try running the script locally as you suggested.

 

thanks

sean

Edited by seany123
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.