Jump to content

curl and fread


Eugene

Recommended Posts

Ok, I need a lot of help. Recently I made a little script that downloads a webpages, and saves it to a text file.
[code=php:0]
$ch = curl_init("http://url.com");
$fp = fopen("news.txt", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);
[/code]

How can i select only the news from the txt file since I can't take the news from the website using curl?
Link to comment
Share on other sites

i'm not sure what you meant by "how can i select only the new from the txt file"

question, why don't you want to use [url=http://www.php.net/file_get_contents]file_get_contents()[/url] to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?
Link to comment
Share on other sites

[quote author=Koobi link=topic=99889.msg393670#msg393670 date=1152388506]
i'm not sure what you meant by "how can i select only the new from the txt file"

question, why don't you want to use [url=http://www.php.net/file_get_contents]file_get_contents()[/url] to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?
[/quote]

The news i want to extract from the webpage isn't readable if you use the function fread().
So I'm using curl to write the contens of the page to a txt file, then try and extract the news, but i don't know how.
Link to comment
Share on other sites

If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?
Link to comment
Share on other sites

i wouldn't use CURL for something like this.

here's a modified snippet of something i already had:
[code=php:0]
<?php

$source = 'http://www.example.com/';
$destination = './news.txt';

//function to return file data from URL
function getData($url)
{
return file_get_contents($url);
}


//function to write file to disk
//you'll have to implement error handling. i simply return errors but this is not good practice
function writeData($data, $location)
{
if(is_writable($location))
{
// i use the b in the mode for portability of code
if(!$handle = fopen($location, 'w+b'))
{
//couldn't open file
return false;
}

if(fwrite($handle, $data) === false)
{
//file could not be written
return false;
}

fclose($handle);
}
else
{
//file is not writable
return false;
}
}

$data = getData($source);
$result = writeData($data, $destination);
if($result !== false)
{
echo $destination . ' was written to';
}
else
{
echo $destination . ' could not be written';
}
?>
[/code]


let me know if you don't understand anything there.




:edit:
after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.
Link to comment
Share on other sites

[quote author=Koobi link=topic=99889.msg393691#msg393691 date=1152389812]
i wouldn't use CURL for something like this.

here's a modified snippet of something i already had:
[code=php:0]
<?php

$source = 'http://www.example.com/';
$destination = './news.txt';

//function to return file data from URL
function getData($url)
{
return file_get_contents($url);
}


//function to write file to disk
//you'll have to implement error handling. i simply return errors but this is not good practice
function writeData($data, $location)
{
if(is_writable($location))
{
// i use the b in the mode for portability of code
if(!$handle = fopen($location, 'w+b'))
{
//couldn't open file
return false;
}

if(fwrite($handle, $data) === false)
{
//file could not be written
return false;
}

fclose($handle);
}
else
{
//file is not writable
return false;
}
}

$data = getData($source);
$result = writeData($data, $destination);
if($result !== false)
{
echo $destination . ' was written to';
}
else
{
echo $destination . ' could not be written';
}
?>
[/code]


let me know if you don't understand anything there.




:edit:
after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.
[/quote]
It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.
Link to comment
Share on other sites

[quote author=G__F__D link=topic=99889.msg393708#msg393708 date=1152391129]
It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.
[/quote]

your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.

for the answer to your question, refer to effigy's post in this thread.
Link to comment
Share on other sites

[quote author=effigy link=topic=99889.msg393688#msg393688 date=1152389790]
If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?
[/quote]

[quote author=Koobi link=topic=99889.msg393714#msg393714 date=1152392004]
[quote author=G__F__D link=topic=99889.msg393708#msg393708 date=1152391129]
It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.
[/quote]

your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.

for the answer to your question, refer to effigy's post in this thread.
[/quote]
No idea what he means by expressions. The websites doesn't have rss feeds, and there's no direct url to the news.
Link to comment
Share on other sites

a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.


so if your tag is "<myTag>" and "</myTag>":
[code=php:0]
<?php
$tagName = 'myTag';
$pattern = "%<($tagName)>(.*)+</\\1>%im";
$subject = getData('http://example.org/');
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
[/code]
now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.
the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.

i'm assuming there will only be one instance of <myTag>
if there are two, consider using [url=http://www.php.net/manual/en/function.preg-match-all.php]preg_match_all() in the manual[/url]
Link to comment
Share on other sites

[quote author=Koobi link=topic=99889.msg393720#msg393720 date=1152393208]
a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.


so if your tag is "<myTag>" and "</myTag>":
[code=php:0]
<?php
$tagName = 'myTag';
$pattern = "%<($tagName)>(.*)+</\\1>%im";
$subject = getData('http://example.org/');
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
[/code]
now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.
the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.

i'm assuming there will only be one instance of <myTag>
if there are two, consider using [url=http://www.php.net/manual/en/function.preg-match-all.php]preg_match_all() in the manual[/url]
[/quote]

Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )

is the output, seems to work... sort of.
Link to comment
Share on other sites

[quote author=Koobi link=topic=99889.msg393882#msg393882 date=1152440427]
show me the entire HTML and indicate to me what part of the HTML you want to grab.
[/quote]
[qoute]
<div class="narrowscroll-bg">
<div class="narrowscroll-bgimg">
<div class="narrowscroll-content">
<dl class="news scroll">
<dt><span class="newsdate">04-Jul-2006</span>[color=red]***[/color]</dt>
<dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;">
</div>
</dd>
<dt>
<span class="newsdate">04-Jul-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;">
</td></tr></table> </dd><dt><span class="newsdate">27-Jun-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;"></td></tr></table></dd></dl></div></div></div></div>
[/quote]

[color=red]***[/color] = What I need to get.
Link to comment
Share on other sites

hmmm i don't think you posted the correct HTML...there seems to be a closing div that doesn't belong, there.
let me know if this is the exact HTML and i will write a regular expression for you.

and please enclose your code in proper code tags the next time.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.