How to grab information from a webpage?

gigabyt3r · December 4, 2009

How can I grab information from a webpage? For example if I wanted to get the time a search took on google where it says "Results 1 - 10 of about 610,000,000 for php. (0.05 seconds)" How could i get the '0.05' bit from the source to use it as a string?

Thanks in advance

Deoctor · December 4, 2009

try this code

<?php
$ch = curl_init("http://www.example.com/reallybigfile.tar.gz");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$output = curl_exec($ch);

$fh = fopen("out.txt", 'w');
fwrite($fh, $output);
fclose($fh);
?>

this will write the curl(result) of a webpage into a file..

oni-kun · December 4, 2009

How can I grab information from a webpage? For example if I wanted to get the time a search took on google where it says "Results 1 - 10 of about 610,000,000 for php. (0.05 seconds)" How could i get the '0.05' bit from the source to use it as a string?

Thanks in advance

You can use regex to find the string "(x.xx seconds)". I'm not good enough with it to tell you the regex though. To open the web page as a string essentially.. you'd use this:

$query = "php";
$result = file_get_contents("http://www.google.com/search?q=".urlencode($query));
preg_match(....) //Here

MadTechie · December 4, 2009

Example (expanded from oni-kun)

<?php
$query = "php";
preg_match('%\(<b>([^<]*)</b> seconds\)%i',file_get_contents("http://www.google.com/search?q=".urlencode($query)),$found);
echo $found[1];

Deoctor · December 4, 2009

check this code which i have written out

<?php
$url_feed='http://chaitu09986025424.blog.co.in/feed/rss/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://validator.w3.org/feed/check.cgi?url=$url_feed");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$output = curl_exec($ch);

$fh = fopen("out.txt", 'w');
fwrite($fh, $output);
$section = file_get_contents('./out.txt', NULL, NULL, 1256, 95);
//echo $section;
$fh1=fopen("out_1.txt",'w');
fwrite($fh1, $section);
fclose($fh1);
var_dump($section);
fclose($fh);
curl_close($ch);
?>

now this will write the required one in the out_1.txt file and also will store it in the string $section.

so u can check with this one display the result accordingly..

in this u need to change the numbers in the line

$section = file_get_contents('./out.txt', NULL, NULL, 1256, 95);

to match ur criteria

MadTechie · December 4, 2009

And how exactly do that get the seconds ?

my extension of oni-kun code seams more practical, and while it make be necessary to use cURL thats a simple change

Deoctor · December 4, 2009

like here what i am doing is checking for the particular terminology that the curl function will display in the txt file

img alt="[Valid RSS]" title="Valid RSS" src="images/valid-rss.png" /> This is a valid RSS feed.

like the same this can be used to check the particular words like these

for google. (0.21 seconds)

and check the value that is displayed before the words seconds and can print that value or store it some where for a later use..

i think this is not that much impossible task >:(

MadTechie · December 4, 2009

Well the logic your using seams long winded and has pointless code!

why your passing it to w3.org seams a little weird!

a simple download html and extract is all that's needed,

Deoctor · December 4, 2009

actually the code which i have used here is for the validation of the rss feed site for which i have used this site instead of using a reg expression,

coz reg expressions are not so trust worthy after all in my case.

i dont think that has any thing weird in the code if u could have checked it out and seen the result of what i was giving

MadTechie · December 4, 2009

actually the code which i have used here is for the validation of the rss feed site for which i have used this site instead of using a reg expression,

In that case why not just use a XML parse ?

instead of replying on 2 sites to pull out a small section on text!

Deoctor · December 4, 2009

i have used that one too

but there was an issue with some of the rss feed urls,

they are not getting recognised

if (!@$xml=simplexml_load_file("$subscr"))

the above code i have used out..

$subscr is the url of the rss feed i am using out.

as this case failed i am trying to fetch the data by passing it to an other site.

Sign In

How to grab information from a webpage?

Recommended Posts

gigabyt3r

Link to comment

Share on other sites

Deoctor

Link to comment

Share on other sites

oni-kun

Link to comment

Share on other sites

MadTechie

Link to comment

Share on other sites

Deoctor

Link to comment

Share on other sites

MadTechie

Link to comment

Share on other sites

Deoctor

Link to comment

Share on other sites

MadTechie

Link to comment

Share on other sites

Deoctor

Link to comment

Share on other sites

MadTechie

Link to comment

Share on other sites

Deoctor

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information