Jump to content

Recommended Posts

Hi,

I am having a problem with scraping the data from the website. I can't be able to output the data to my php after I have scraping the data from the website. On my php it show as a empty page.

here is the html source I want to scrape:

<span id="row3Time" class="zc-ssl-pg-time">11:00 AM</span>
<a id="rowTitle3" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
<ul class="zc-icons">
<li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
</li>
<li class="zc-ssl-pg" id="row1-4" style="">

<span id="row4Time" class="zc-ssl-pg-time">12:00 PM</span>
<a id="rowTitle4" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
<ul class="zc-icons">
<li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>
</li>
<li class="zc-ssl-pg" id="row1-5" style="">

<span id="row5Time" class="zc-ssl-pg-time">1:00 PM</span>
<a id="rowTitle5" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
<ul class="zc-icons">
<li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul>

here is the php source:


<?php

$contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
preg_match('/<a id="rowTitle3" class="zc-ssl-pg-title"[.*]<\/a>/i', $data, $matches);
$rowtitle = $matches[1];
echo $rowtitle."<br>\n";
?>



And here is the php output:

<br>



does anyone know how I can scraping the data from that website using with <a id=rowTitle3 to the end of the page?

any advice would be much appreicated.

Thanks in advance

Edited by mark103
Link to comment
https://forums.phpfreaks.com/topic/276972-scraping-the-data-from-website/
Share on other sites

Try changing


$contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');

to


$data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');

also remember that arrays normally start at element 0 not 1 so you are looking for $matches[0] if the data is put into an array.

thanks you very much for your help, but there is a problem. There is no output data when I am using this:

<?php

$data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
$p = "/a id='rowTitle1' class='zc-ssl-pg-title'>(.*)<\/a>/";
preg_match($p, $html, $match);
echo $match[0];
?>

i am not really sure if i have done it wrong.

 

can you help?

Edited by mark103

The problem is in your regular expression. In your first post, you can fix the regex by simply removing the square brackets ([]) leaving the characters inside. That matches the sample input you gave in your first post, but your newest expression is completely different so I'm not sure what exactly you are trying to match.

 

You probably want to do something like this:

$data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title".*<\/a>/im', $data, $matches);
$titles = $matches[0];

print_r($titles);

If you are NOT trying to get all the titles, which ones do you want?

You can use parentheses to capture segments:

$data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im', $test, $matches);
$titles = $matches[1];

print_r($titles);

Sorry, I switched the variables on accident. This should work:

$data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im', $data, $matches);
$titles = $matches[1];

print_r($titles);

You could either use cURL (or similar) to send the correct cookie for your timezone (I see that is an option on the site), or you could combine the day headers with the time and use strtotime() with the correct time addition to create a timestamp of the correct date/time.

thanks, could you please post the code that i could use cURL or strtotime to get the correct time 5 hours back from my current time to get the correct data in that website, e.g my current time is 10pm and i look for the time that is 5 hours backward which it is 5pm and get the data that show at 5pm??

Edited by mark103

You might have to choose a timezone that is in the US for the cURL method to work. I'm not sure where you are, but I could only get 4:00 and 6:00 by trying Hawaii and Alaska respectively. If you can get the website to show the correct time while you are browsing it, let me know how and I can help. Otherwise, you might have to use the other method.

What you have to do is find a relationship between the dates and the times. Usually the only way is by relating the physical locations, fortunately, the HTML actually had numbers that related so I've adjusted the regex accordingly. After putting all the variables into a format where they can be related, they can be iterated through. Since you want to do date math, the dates' relationships to their times will actually change when the time carries over to a different day. Because of this, the output probably shouldn't be done until after all the time adjustments are complete.

 

Here is an example of how this works. I've included the original scraped text in parentheses in the output so you can see what it was converted to. You should be able to take this code and adjust the output to meet your needs.

$test = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');

//Find all header dates
preg_match_all('/<li class="zc-ssl-sp" id="dayLabel(\d+-\d+)">([^<]+)<\/li>/mi', $test, $matches);
//Find all listings
preg_match_all('/<li class="zc-ssl-pg" id="row(\d+-\d+)" style="">[^<]+<span id="row\d+Time" class="zc-ssl-pg-time">([^<]+)<\/span>[^>]+>([^<]+)<\/a>/mi', $test, $matches2);

//Set arrays
$days = $matches[2];
$day_nums = $matches[1];
$listing_nums = $matches2[1];
$listing_times = $matches2[2];
$listing_titles = $matches2[3];

$j=0;	//listings pointer
foreach ($day_nums as $i => $day_num)
{
	$date = fixDate($days[$i]);	//Change words that strtotime can't parse
	$next = $i+1;
	if (!isset($day_nums[$next]))
		break;
	while ($listing_nums[$j] != $day_nums[$next])	//loop through until the header number matches the listing number
	{
		$time = trim($listing_times[$j]);
		$datetime = date('M j, Y g:iA', strtotime($date . ' ' . $time . ' -5 hours'));
		echo '('.$days[$i].'-'.$listing_times[$j].') '.$datetime . ' - ' . $listing_titles[$j] .'<br/>';
		$j++;
	}
}

function fixDate($date)
{
	$find = array(
		'/Last Night/',
		'/(?:^[^,]+,)|(?:Night)/',
		'/Tonight/'
	);
	$replace = array(
		'Yesterday',
		'',
		'Today',
	);
	
	return preg_replace($find, $replace, $date);
}

I hope that helps.

Thanks, I have input the code in my php and I saw the list of title included the time. You have got it wrong there and you don't understand what I want to achieve. Let me explain to you again. I want to scrape the data in the current time in the USA that are 5 hours behind my current time which my current time is 3:00am and the usa time is 10:00pm.

 

Please see the data that show in the programme current time like this:

10:00 PM Baseball Tonight

    LIVE

11:00 PM SportsCenter

    LIVE

Tomorrow
12:00 AM SportsCenter

    LIVE

1:00 AM SportsCenter

    LIVE

2:00 AM SportsCenter

    LIVE

3:00 AM SportsCenter

4:00 AM SportsCenter

Now I hope you get my point?

Edited by mark103

Yes, BUT I SAID I WANT TO SCRAPE THE  TITLE THAT IS ON TODAY IN THE CURRENT TIME UNTIL TO THE END OF THE PAGE AND NOT YESTERDAY. I WANT TO DISPLAY THEM IN MY PHP:

 

The USA current time is 10:00PM

10:00 PM Baseball Tonight

    LIVE

11:00 PM SportsCenter

    LIVE

Tomorrow
12:00 AM SportsCenter

    LIVE

1:00 AM SportsCenter

    LIVE

2:00 AM SportsCenter

    LIVE

3:00 AM SportsCenter

4:00 AM SportsCenter

Not like this:

( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - Around the Horn
( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - Pardon the Interruption
( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter Special
(Last Night-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter Special: On the Clock
(Last Night-7:00 PM) Dec 31, 1969 7:00PM - NFL Live
(Last Night-7:00 PM) Dec 31, 1969 7:00PM - Baseball Tonight
(Last Night-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter
( Today-7:00 PM) Dec 31, 1969 7:00PM - Outside the Lines
( Today-7:00 PM) Dec 31, 1969 7:00PM - College Football Live

Are you thick???????

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.