Jump to content

Recommended Posts

simplest way too do this is to go on the web site copy

and punt into sting

 

<?

$title=""Zimbabwe PM pledges 'new chapter' ";

 

$text= "Zimbabwe's new Prime Minister Morgan Tsvangirai vows to stabilise the shattered economy and end political violence.";

 

echo "$title <br> $text";

 

?>

simplest way too do this is to go on the web site copy

and punt into sting

 

lol I am sure he wants it done automatically/dynamically.

 

You need to look at the html source, find common tags then use either preg_match or a series of explode's to grab the data. Of course to retrieve the site data you will have to pull in the full page html source with file_get_contents and or curl.

<?php
$content = file_get_contents('http://www.bbc.co.uk/');

preg_match('~"hpFeatureBoxInt">(.+?)</div>~s', $content, $matches);

list($title, $body) = explode("</h3>", $matches[1]);
$title = trim(strip_tags($title));
$body = trim(strip_tags($body));

echo "Title: $title <br />";
echo "Body: $body <br />";

die();
?>

 

First I viewed the source of the BB page you wanted to parse, second I looked for an identifing tag of what you wanted out of that page.

 

I found

<div id="hpFeatureBoxInt">
<h2><span class="dy">Top News Story</span></h2>
<h3><a href="/go/homepage/i/int/news/world/1/-/news/1/hi/world/africa/7884282.stm"><img width="201" height="150" src="/feedengine/homepage/images/_45468316_84737466_201x150.jpg" alt="Morgan Tsvangirai addresses crowds"/>Zimbabwe PM pledges 'new chapter'</a></h3>
	<p>Zimbabwe's new Prime Minister Morgan Tsvangirai vows to stabilise the shattered economy and end political violence.</p>

	<p id="fbilisten"><a href="/go/homepage/i/int/news/heading/-/news/">More from BBC News</a>
</p>
</div>

 

I did a quick search for "hpFeatureBoxIn"> and did not find any other matches on the page, which means finding that would return the right result.

 

Next I used file_get_contents to retrieve the html source of the website you wanted to parse and put it into a string.

 

I then used that string in preg_match with the regex: [em]'~"hpFeatureBoxInt">(.+?)</div>~s'[/em] which finds the tag and grabs everything being the starting tag and the ending div tag and stored into an array of matches. The match was stored in the "1" index of the array, the "0" index returns what was found with the original tags in tact, so we do not want that one.

 

Next I used explode to separate the match to 2 separate variables, $title and $body. I explode'd it </h3> cause that separated the two.

 

Next I used strip_tags to remove any html tags left and trim the extra whitespaces. Now you have the two items inside strings to display them how you want.

 

Questions let me know.

didnit work for me i see body that it lol

 

Works great on my end, are you using a host that does not fopen_url ? If so, grabbing the data via curl should solve that problem and make it work.

 

Either way, I did test it and it is working great on my box.

No clue. My bet is that file_get_contents is not working on your end, for whatever reason. Maybe you are blocked from viewing that site?

 

There are a lot of scenarios that would make that not work. Try using curl to retrieve the web page data and see if that makes it work. If you are trying this on a shared host, chances are they disallow fopen_url which would make the file_get_contents function not work for remote urls.

can this effect the file_get_contents ?

 

max_execution_time = 900    ; Maximum execution time of each script, in seconds

max_input_time = 60 ; Maximum amount of time each script may spend parsing request data

;max_input_nesting_level = 64 ; Maximum input variable nesting level

memory_limit = 128M      ; Maximum amount of memory a script may consume (128MB)

 

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.