Jump to content


Photo

curl and fread


  • Please log in to reply
13 replies to this topic

#1 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 08 July 2006 - 07:07 PM

Ok, I need a lot of help. Recently I made a little script that downloads a webpages, and saves it to a text file.
$ch = curl_init("http://url.com");
$fp = fopen("news.txt", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);

How can i select only the news from the txt file since I can't take the news from the website using curl?

#2 Koobi

Koobi
  • Staff Alumni
  • Advanced Member
  • 419 posts
  • LocationColombo, Sri Lanka | South Asia

Posted 08 July 2006 - 07:55 PM

i'm not sure what you meant by "how can i select only the new from the txt file"

question, why don't you want to use file_get_contents() to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?

#3 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 08 July 2006 - 08:08 PM

i'm not sure what you meant by "how can i select only the new from the txt file"

question, why don't you want to use file_get_contents() to retrieve the page and then use fopen/flock/fwrite/fclose to do the writing? Any particular reason you want to use CURL?


The news i want to extract from the webpage isn't readable if you use the function fread().
So I'm using curl to write the contens of the page to a txt file, then try and extract the news, but i don't know how.

#4 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 08 July 2006 - 08:16 PM

If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#5 Koobi

Koobi
  • Staff Alumni
  • Advanced Member
  • 419 posts
  • LocationColombo, Sri Lanka | South Asia

Posted 08 July 2006 - 08:16 PM

i wouldn't use CURL for something like this.

here's a modified snippet of something i already had:
<?php

	$source = 'http://www.example.com/';
	$destination = './news.txt';

	//function to return file data from URL
	function getData($url)
	{
		return file_get_contents($url);
	}


	//function to write file to disk
	//you'll have to implement error handling. i simply return errors but this is not good practice
	function writeData($data, $location)
	{
		if(is_writable($location))
		{
			// i use the b in the mode for portability of code
			if(!$handle = fopen($location, 'w+b'))
			{
				//couldn't open file
				return false;
			}

			if(fwrite($handle, $data) === false)
			{
				//file could not be written
				return false;
			}

			fclose($handle);
		}
		else
		{
			//file is not writable
			return false;
		}
	}

$data = getData($source);
$result = writeData($data, $destination);
if($result !== false)
{
	echo $destination . ' was written to';
}
else
{
	echo $destination . ' could not be written';
}
?>


let me know if you don't understand anything there.




:edit:
after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.

#6 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 08 July 2006 - 08:38 PM

i wouldn't use CURL for something like this.

here's a modified snippet of something i already had:

<?php

	$source = 'http://www.example.com/';
	$destination = './news.txt';

	//function to return file data from URL
	function getData($url)
	{
		return file_get_contents($url);
	}


	//function to write file to disk
	//you'll have to implement error handling. i simply return errors but this is not good practice
	function writeData($data, $location)
	{
		if(is_writable($location))
		{
			// i use the b in the mode for portability of code
			if(!$handle = fopen($location, 'w+b'))
			{
				//couldn't open file
				return false;
			}

			if(fwrite($handle, $data) === false)
			{
				//file could not be written
				return false;
			}

			fclose($handle);
		}
		else
		{
			//file is not writable
			return false;
		}
	}

$data = getData($source);
$result = writeData($data, $destination);
if($result !== false)
{
	echo $destination . ' was written to';
}
else
{
	echo $destination . ' could not be written';
}
?>


let me know if you don't understand anything there.




:edit:
after reading edffigy's post i think i know what you meant to say...but yeah, like effigy said, you might have to use regular expressions, depending on what the HTML is like.

It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.

#7 Koobi

Koobi
  • Staff Alumni
  • Advanced Member
  • 419 posts
  • LocationColombo, Sri Lanka | South Asia

Posted 08 July 2006 - 08:53 PM

It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.


your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.

for the answer to your question, refer to effigy's post in this thread.

#8 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 08 July 2006 - 09:03 PM

If the website does not have an url where you can access the news by itself, you'll need to determine how the news is represented in the HTML and use regular expressions. I assume this web site does not have any feeds (RSS), which would make this easier?


It wrote the entire source code to the news.txt file. I caould have done the myself. I can i get some contents of the txt file between 2 certatin tags.


your initial question wasn't clear which is why i posted that function because i thought that was what you were asking for.

for the answer to your question, refer to effigy's post in this thread.

No idea what he means by expressions. The websites doesn't have rss feeds, and there's no direct url to the news.

#9 Koobi

Koobi
  • Staff Alumni
  • Advanced Member
  • 419 posts
  • LocationColombo, Sri Lanka | South Asia

Posted 08 July 2006 - 09:13 PM

a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.


so if your tag is "<myTag>" and "</myTag>":
<?php
$tagName = 'myTag';
$pattern = "%<($tagName)>(.*)+</\\1>%im";
$subject = getData('http://example.org/');
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.
the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.

i'm assuming there will only be one instance of <myTag>
if there are two, consider using preg_match_all() in the manual

#10 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 08 July 2006 - 09:52 PM

a regular expression is a string that can match a pattern of strings in case you don't know exactly what you're looking for.


so if your tag is "<myTag>" and "</myTag>":

<?php
$tagName = 'myTag';
$pattern = "%<($tagName)>(.*)+</\\1>%im";
$subject = getData('http://example.org/');
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
now depending on what exactly you want, you can write the contents of $matches. you might want to clean up the first indice of $matches since it would contain the name of the tag, according to my expression.
the above regular expression should work...let me know if it doesn't i'm a bit rusty with my regular expressions.

i'm assuming there will only be one instance of <myTag>
if there are two, consider using preg_match_all() in the manual


Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) )

is the output, seems to work... sort of.

#11 Koobi

Koobi
  • Staff Alumni
  • Advanced Member
  • 419 posts
  • LocationColombo, Sri Lanka | South Asia

Posted 09 July 2006 - 10:20 AM

show me the entire HTML and indicate to me what part of the HTML you want to grab.

#12 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 09 July 2006 - 03:51 PM

[quote author=Koobi link=topic=99889.msg393882#msg393882 date=1152440427]
show me the entire HTML and indicate to me what part of the HTML you want to grab.
[/quote]
[qoute]
<div class="narrowscroll-bg">
<div class="narrowscroll-bgimg">
<div class="narrowscroll-content">
<dl class="news scroll">
<dt><span class="newsdate">04-Jul-2006</span>***</dt>
<dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">***</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;">
</div>
</dd>
<dt>
<span class="newsdate">04-Jul-2006</span>***</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">***</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;">
</td></tr></table> </dd><dt><span class="newsdate">27-Jun-2006</span>***</dt> <dd><table width="100%"><tr><td style="text-align: justify; vertical-align: top;">***</td> <td style="padding-left: 1em; text-align: right; vertical-align: top;"></td></tr></table></dd></dl></div></div></div></div>
[/quote]

*** = What I need to get.

#13 Koobi

Koobi
  • Staff Alumni
  • Advanced Member
  • 419 posts
  • LocationColombo, Sri Lanka | South Asia

Posted 10 July 2006 - 11:33 AM

hmmm i don't think you posted the correct HTML...there seems to be a closing div that doesn't belong, there.
let me know if this is the exact HTML and i will write a regular expression for you.

and please enclose your code in proper code tags the next time.

#14 Eugene

Eugene
  • Members
  • PipPipPip
  • Advanced Member
  • 126 posts

Posted 11 July 2006 - 05:36 PM

I got it, thanks.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users