Eugene Posted July 8, 2006 Share Posted July 8, 2006 Ok, I need a lot of help. Recently I made a little script that downloads a webpages, and saves it to a text file.[code=php:0]$ch = curl_init("http://url.com");$fp = fopen("news.txt", "w");curl_setopt($ch, CURLOPT_FILE, $fp);curl_setopt($ch, CURLOPT_HEADER, 0);curl_exec($ch);curl_close($ch);fclose($fp);[/code]How can i select only the news from the txt file since I can't take the news from the website using curl? Quote Link to comment Share on other sites More sharing options...
Koobi Posted July 8, 2006 Share Posted July 8, 2006 i'm not sure what you meant by "how can i select only the new from the txt file"question, why don't you want to use [url=http://www.php.net/file_get_contents]file_get_contents()[/url] to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL? Quote Link to comment Share on other sites More sharing options...
Eugene Posted July 8, 2006 Author Share Posted July 8, 2006 [quote author=Koobi link=topic=99889.msg393670#msg393670 date=1152388506]i'm not sure what you meant by "how can i select only the new from the txt file"question, why don't you want to use [url=http://www.php.net/file_get_contents]file_get_contents()[/url] to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?[/quote]The news i want to extract from the webpage isn't readable if you use the function fread(). So I'm using curl to write the contens of the page to a txt file, then try and extract the news, but i don't know how. Quote Link to comment Share on other sites More sharing options...
effigy Posted July 8, 2006 Share Posted July 8, 2006 If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier? Quote Link to comment Share on other sites More sharing options...
Koobi Posted July 8, 2006 Share Posted July 8, 2006 i wouldn't use CURL for something like this.here's a modified snippet of something i already had:[code=php:0]<?php $source = 'http://www.example.com/'; $destination = './news.txt'; //function to return file data from URL function getData($url) { return file_get_contents($url); } //function to write file to disk //you'll have to implement error handling. i simply return errors but this is not good practice function writeData($data, $location) { if(is_writable($location)) { // i use the b in the mode for portability of code if(!$handle = fopen($location, 'w+b')) { //couldn't open file return false; } if(fwrite($handle, $data) === false) { //file could not be written return false; } fclose($handle); } else { //file is not writable return false; } }$data = getData($source);$result = writeData($data, $destination);if($result !== false){ echo $destination . ' was written to';}else{ echo $destination . ' could not be written';}?>[/code]let me know if you don't understand anything there.:edit:after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like. Quote Link to comment Share on other sites More sharing options...
Eugene Posted July 8, 2006 Author Share Posted July 8, 2006 [quote author=Koobi link=topic=99889.msg393691#msg393691 date=1152389812]i wouldn't use CURL for something like this.here's a modified snippet of something i already had:[code=php:0]<?php $source = 'http://www.example.com/'; $destination = './news.txt'; //function to return file data from URL function getData($url) { return file_get_contents($url); } //function to write file to disk //you'll have to implement error handling. i simply return errors but this is not good practice function writeData($data, $location) { if(is_writable($location)) { // i use the b in the mode for portability of code if(!$handle = fopen($location, 'w+b')) { //couldn't open file return false; } if(fwrite($handle, $data) === false) { //file could not be written return false; } fclose($handle); } else { //file is not writable return false; } }$data = getData($source);$result = writeData($data, $destination);if($result !== false){ echo $destination . ' was written to';}else{ echo $destination . ' could not be written';}?>[/code]let me know if you don't understand anything there.:edit:after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.[/quote]It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags. Quote Link to comment Share on other sites More sharing options...
Koobi Posted July 8, 2006 Share Posted July 8, 2006 [quote author=G__F__D link=topic=99889.msg393708#msg393708 date=1152391129]It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.[/quote]your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.for the answer to your question, refer to effigy's post in this thread. Quote Link to comment Share on other sites More sharing options...
Eugene Posted July 8, 2006 Author Share Posted July 8, 2006 [quote author=effigy link=topic=99889.msg393688#msg393688 date=1152389790]If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?[/quote][quote author=Koobi link=topic=99889.msg393714#msg393714 date=1152392004][quote author=G__F__D link=topic=99889.msg393708#msg393708 date=1152391129]It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.[/quote]your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.for the answer to your question, refer to effigy's post in this thread.[/quote]No idea what he means by expressions. The websites doesn't have rss feeds, and there's no direct url to the news. Quote Link to comment Share on other sites More sharing options...
Koobi Posted July 8, 2006 Share Posted July 8, 2006 a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.so if your tag is "<myTag>" and "</myTag>":[code=php:0]<?php$tagName = 'myTag';$pattern = "%<($tagName)>(.*)+</\\1>%im";$subject = getData('http://example.org/');preg_match($pattern, $subject, $matches);print_r($matches);?>[/code]now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.i'm assuming there will only be one instance of <myTag>if there are two, consider using [url=http://www.php.net/manual/en/function.preg-match-all.php]preg_match_all() in the manual[/url] Quote Link to comment Share on other sites More sharing options...
Eugene Posted July 8, 2006 Author Share Posted July 8, 2006 [quote author=Koobi link=topic=99889.msg393720#msg393720 date=1152393208]a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.so if your tag is "<myTag>" and "</myTag>":[code=php:0]<?php$tagName = 'myTag';$pattern = "%<($tagName)>(.*)+</\\1>%im";$subject = getData('http://example.org/');preg_match($pattern, $subject, $matches);print_r($matches);?>[/code]now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.i'm assuming there will only be one instance of <myTag>if there are two, consider using [url=http://www.php.net/manual/en/function.preg-match-all.php]preg_match_all() in the manual[/url][/quote]Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )is the output, seems to work... sort of. Quote Link to comment Share on other sites More sharing options...
Koobi Posted July 9, 2006 Share Posted July 9, 2006 show me the entire HTML and indicate to me what part of the HTML you want to grab. Quote Link to comment Share on other sites More sharing options...
Eugene Posted July 9, 2006 Author Share Posted July 9, 2006 [quote author=Koobi link=topic=99889.msg393882#msg393882 date=1152440427]show me the entire HTML and indicate to me what part of the HTML you want to grab.[/quote][qoute]<div class="narrowscroll-bg"><div class="narrowscroll-bgimg"><div class="narrowscroll-content"><dl class="news scroll"><dt><span class="newsdate">04-Jul-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;"></div> </dd><dt><span class="newsdate">04-Jul-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;"> </td></tr></table> </dd><dt><span class="newsdate">27-Jun-2006</span>[color=red]***[/color]</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">[color=red]***[/color]</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;"></td></tr></table></dd></dl></div></div></div></div>[/quote][color=red]***[/color] = What I need to get. Quote Link to comment Share on other sites More sharing options...
Koobi Posted July 10, 2006 Share Posted July 10, 2006 hmmm i don't think you posted the correct HTML...there seems to be a closing div that doesn't belong, there.let me know if this is the exact HTML and i will write a regular expression for you.and please enclose your code in proper code tags the next time. Quote Link to comment Share on other sites More sharing options...
Eugene Posted July 11, 2006 Author Share Posted July 11, 2006 I got it, thanks. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.