curl and fread

Eugene · July 8, 2006

Ok, I need a lot of help. Recently I made a little script that downloads a webpages, and saves it to a text file.
[code=php:0]
$ch = curl_init("http://url.com");
$fp = fopen("news.txt", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);
[/code]

How can i select only the news from the txt file since I can't take the news from the website using curl?

Koobi · July 8, 2006

i'm not sure what you meant by "how can i select only the new from the txt file"

question, why don't you want to use [url=http://www.php.net/file_get_contents]file_get_contents()[/url] to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?

Eugene · July 8, 2006

[quote author=Koobi link=topic=99889.msg393670#msg393670 date=1152388506]
i'm not sure what you meant by "how can i select only the new from the txt file"

question, why don't you want to use [url=http://www.php.net/file_get_contents]file_get_contents()[/url] to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?
[/quote]

The news i want to extract from the webpage isn't readable if you use the function fread().
So I'm using curl to write the contens of the page to a txt file, then try and extract the news, but i don't know how.

effigy · July 8, 2006

If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?

Koobi · July 8, 2006

i wouldn't use CURL for something like this.

here's a modified snippet of something i already had:
[code=php:0]
<?php

$source = 'http://www.example.com/';
$destination = './news.txt';

//function to return file data from URL
function getData($url)
{
return file_get_contents($url);
}

//function to write file to disk
//you'll have to implement error handling. i simply return errors but this is not good practice
function writeData($data, $location)
{
if(is_writable($location))
{
// i use the b in the mode for portability of code
if(!$handle = fopen($location, 'w+b'))
{
//couldn't open file
return false;
}

if(fwrite($handle, $data) === false)
{
//file could not be written
return false;
}

fclose($handle);
}
else
{
//file is not writable
return false;
}
}

$data = getData($source);
$result = writeData($data, $destination);
if($result !== false)
{
echo $destination . ' was written to';
}
else
{
echo $destination . ' could not be written';
}
?>
[/code]

let me know if you don't understand anything there.

:edit:
after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.

Eugene · July 8, 2006

[quote author=Koobi link=topic=99889.msg393691#msg393691 date=1152389812]
i wouldn't use CURL for something like this.

here's a modified snippet of something i already had:
[code=php:0]
<?php

$source = 'http://www.example.com/';
$destination = './news.txt';

//function to return file data from URL
function getData($url)
{
return file_get_contents($url);
}

//function to write file to disk
//you'll have to implement error handling. i simply return errors but this is not good practice
function writeData($data, $location)
{
if(is_writable($location))
{
// i use the b in the mode for portability of code
if(!$handle = fopen($location, 'w+b'))
{
//couldn't open file
return false;
}

if(fwrite($handle, $data) === false)
{
//file could not be written
return false;
}

fclose($handle);
}
else
{
//file is not writable
return false;
}
}

$data = getData($source);
$result = writeData($data, $destination);
if($result !== false)
{
echo $destination . ' was written to';
}
else
{
echo $destination . ' could not be written';
}
?>
[/code]

let me know if you don't understand anything there.

:edit:
after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.
[/quote]
It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.

Koobi · July 8, 2006

[quote author=G__F__D link=topic=99889.msg393708#msg393708 date=1152391129]
It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.
[/quote]

your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.

for the answer to your question, refer to effigy's post in this thread.

Eugene · July 8, 2006

[quote author=effigy link=topic=99889.msg393688#msg393688 date=1152389790]
If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?
[/quote]

[quote author=Koobi link=topic=99889.msg393714#msg393714 date=1152392004]
[quote author=G__F__D link=topic=99889.msg393708#msg393708 date=1152391129]
It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.
[/quote]

your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.

for the answer to your question, refer to effigy's post in this thread.
[/quote]
No idea what he means by expressions. The websites doesn't have rss feeds, and there's no direct url to the news.

Koobi · July 8, 2006

a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.

so if your tag is "<myTag>" and "</myTag>":
[code=php:0]
<?php
$tagName = 'myTag';
$pattern = "%<($tagName)>(.*)+</\\1>%im";
$subject = getData('http://example.org/');
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
[/code]
now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.
the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.

i'm assuming there will only be one instance of <myTag>
if there are two, consider using [url=http://www.php.net/manual/en/function.preg-match-all.php]preg_match_all() in the manual[/url]

Eugene · July 8, 2006

[quote author=Koobi link=topic=99889.msg393720#msg393720 date=1152393208]
a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.

so if your tag is "<myTag>" and "</myTag>":
[code=php:0]
<?php
$tagName = 'myTag';
$pattern = "%<($tagName)>(.*)+</\\1>%im";
$subject = getData('http://example.org/');
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
[/code]
now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.
the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.

i'm assuming there will only be one instance of <myTag>
if there are two, consider using [url=http://www.php.net/manual/en/function.preg-match-all.php]preg_match_all() in the manual[/url]
[/quote]

Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )

is the output, seems to work... sort of.

Koobi · July 9, 2006

show me the entire HTML and indicate to me what part of the HTML you want to grab.

Eugene · July 9, 2006

[quote author=Koobi link=topic=99889.msg393882#msg393882 date=1152440427]
show me the entire HTML and indicate to me what part of the HTML you want to grab.
[/quote]
[qoute]
<div class="narrowscroll-bg">
<div class="narrowscroll-bgimg">
<div class="narrowscroll-content">
<dl class="news scroll">
<dt><span class="newsdate">04-Jul-2006</span>[color=red]***[/color]</dt>
<dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;">
</div>
</dd>
<dt>
<span class="newsdate">04-Jul-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;">
</td></tr></table> </dd><dt><span class="newsdate">27-Jun-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;"></td></tr></table></dd></dl></div></div></div></div>
[/quote]

[color=red]***[/color] = What I need to get.

Koobi · July 10, 2006

hmmm i don't think you posted the correct HTML...there seems to be a closing div that doesn't belong, there.
let me know if this is the exact HTML and i will write a regular expression for you.

and please enclose your code in proper code tags the next time.

Eugene · July 11, 2006

I got it, thanks.

Sign In

curl and fread

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information