mark103 Posted April 15, 2013 Share Posted April 15, 2013 (edited) Hi,I am having a problem with scraping the data from the website. I can't be able to output the data to my php after I have scraping the data from the website. On my php it show as a empty page.here is the html source I want to scrape: <span id="row3Time" class="zc-ssl-pg-time">11:00 AM</span> <a id="rowTitle3" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a> <ul class="zc-icons"> <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul> </li> <li class="zc-ssl-pg" id="row1-4" style=""> <span id="row4Time" class="zc-ssl-pg-time">12:00 PM</span> <a id="rowTitle4" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a> <ul class="zc-icons"> <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul> </li> <li class="zc-ssl-pg" id="row1-5" style=""> <span id="row5Time" class="zc-ssl-pg-time">1:00 PM</span> <a id="rowTitle5" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a> <ul class="zc-icons"> <li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul> here is the php source: <?php $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); preg_match('/<a id="rowTitle3" class="zc-ssl-pg-title"[.*]<\/a>/i', $data, $matches); $rowtitle = $matches[1]; echo $rowtitle."<br>\n"; ?> And here is the php output: <br> does anyone know how I can scraping the data from that website using with <a id=rowTitle3 to the end of the page?any advice would be much appreicated.Thanks in advance Edited April 15, 2013 by mark103 Link to comment Share on other sites More sharing options...
davidannis Posted April 15, 2013 Share Posted April 15, 2013 looks like you put the data into $contents and then look for a match in $data. I think you need to use a single variable. Link to comment Share on other sites More sharing options...
mark103 Posted April 15, 2013 Author Share Posted April 15, 2013 (edited) yeah I am looking for to match the data, could you please post the source for variable that I am looking for to match the data and then to output them to my php? Edited April 15, 2013 by mark103 Link to comment Share on other sites More sharing options...
davidannis Posted April 15, 2013 Share Posted April 15, 2013 Try changing $contents = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); to $data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); also remember that arrays normally start at element 0 not 1 so you are looking for $matches[0] if the data is put into an array. Link to comment Share on other sites More sharing options...
mark103 Posted April 15, 2013 Author Share Posted April 15, 2013 (edited) thanks you very much for your help, but there is a problem. There is no output data when I am using this: <?php $data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); $p = "/a id='rowTitle1' class='zc-ssl-pg-title'>(.*)<\/a>/"; preg_match($p, $html, $match); echo $match[0]; ?> i am not really sure if i have done it wrong. can you help? Edited April 15, 2013 by mark103 Link to comment Share on other sites More sharing options...
lemmin Posted April 15, 2013 Share Posted April 15, 2013 The problem is in your regular expression. In your first post, you can fix the regex by simply removing the square brackets ([]) leaving the characters inside. That matches the sample input you gave in your first post, but your newest expression is completely different so I'm not sure what exactly you are trying to match. You probably want to do something like this: $data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title".*<\/a>/im', $data, $matches); $titles = $matches[0]; print_r($titles); If you are NOT trying to get all the titles, which ones do you want? Link to comment Share on other sites More sharing options...
mark103 Posted April 15, 2013 Author Share Posted April 15, 2013 Thanks, that is what i really want but the problem is it will ouput the href link. I only want to output the text without the href. Do you know how i can extract the text to output in my php without the href link? Link to comment Share on other sites More sharing options...
lemmin Posted April 15, 2013 Share Posted April 15, 2013 You can use parentheses to capture segments: $data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im', $test, $matches); $titles = $matches[1]; print_r($titles); Link to comment Share on other sites More sharing options...
mark103 Posted April 15, 2013 Author Share Posted April 15, 2013 Thanks but it still show no data output. Please fix the code. Link to comment Share on other sites More sharing options...
lemmin Posted April 15, 2013 Share Posted April 15, 2013 Sorry, I switched the variables on accident. This should work: $data = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im', $data, $matches); $titles = $matches[1]; print_r($titles); Link to comment Share on other sites More sharing options...
mark103 Posted April 15, 2013 Author Share Posted April 15, 2013 thanks for your help. I can't be able to output the correct data in current time, e.g my local time is 9:00pm and the current time for the data is 4:00pm. I can only output the data before the current time. can you help? Link to comment Share on other sites More sharing options...
lemmin Posted April 15, 2013 Share Posted April 15, 2013 You could either use cURL (or similar) to send the correct cookie for your timezone (I see that is an option on the site), or you could combine the day headers with the time and use strtotime() with the correct time addition to create a timestamp of the correct date/time. Link to comment Share on other sites More sharing options...
mark103 Posted April 16, 2013 Author Share Posted April 16, 2013 (edited) thanks, could you please post the code that i could use cURL or strtotime to get the correct time 5 hours back from my current time to get the correct data in that website, e.g my current time is 10pm and i look for the time that is 5 hours backward which it is 5pm and get the data that show at 5pm?? Edited April 16, 2013 by mark103 Link to comment Share on other sites More sharing options...
lemmin Posted April 16, 2013 Share Posted April 16, 2013 You might have to choose a timezone that is in the US for the cURL method to work. I'm not sure where you are, but I could only get 4:00 and 6:00 by trying Hawaii and Alaska respectively. If you can get the website to show the correct time while you are browsing it, let me know how and I can help. Otherwise, you might have to use the other method. Link to comment Share on other sites More sharing options...
mark103 Posted April 16, 2013 Author Share Posted April 16, 2013 I come from the UK so I don't know how to use cURL to get the timezone before scraping the right data in the same row as the time that match my current time. There is no other website I can use, this is the only one I can use. Could you please help? Link to comment Share on other sites More sharing options...
lemmin Posted April 16, 2013 Share Posted April 16, 2013 Are you scraping the day information as well? Or does that not matter? Link to comment Share on other sites More sharing options...
mark103 Posted April 16, 2013 Author Share Posted April 16, 2013 on the day like today to wednesday. Link to comment Share on other sites More sharing options...
lemmin Posted April 16, 2013 Share Posted April 16, 2013 What you have to do is find a relationship between the dates and the times. Usually the only way is by relating the physical locations, fortunately, the HTML actually had numbers that related so I've adjusted the regex accordingly. After putting all the variables into a format where they can be related, they can be iterated through. Since you want to do date math, the dates' relationships to their times will actually change when the time carries over to a different day. Because of this, the output probably shouldn't be done until after all the time adjustments are complete. Here is an example of how this works. I've included the original scraped text in parentheses in the output so you can see what it was converted to. You should be able to take this code and adjust the output to meet your needs. $test = file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); //Find all header dates preg_match_all('/<li class="zc-ssl-sp" id="dayLabel(\d+-\d+)">([^<]+)<\/li>/mi', $test, $matches); //Find all listings preg_match_all('/<li class="zc-ssl-pg" id="row(\d+-\d+)" style="">[^<]+<span id="row\d+Time" class="zc-ssl-pg-time">([^<]+)<\/span>[^>]+>([^<]+)<\/a>/mi', $test, $matches2); //Set arrays $days = $matches[2]; $day_nums = $matches[1]; $listing_nums = $matches2[1]; $listing_times = $matches2[2]; $listing_titles = $matches2[3]; $j=0; //listings pointer foreach ($day_nums as $i => $day_num) { $date = fixDate($days[$i]); //Change words that strtotime can't parse $next = $i+1; if (!isset($day_nums[$next])) break; while ($listing_nums[$j] != $day_nums[$next]) //loop through until the header number matches the listing number { $time = trim($listing_times[$j]); $datetime = date('M j, Y g:iA', strtotime($date . ' ' . $time . ' -5 hours')); echo '('.$days[$i].'-'.$listing_times[$j].') '.$datetime . ' - ' . $listing_titles[$j] .'<br/>'; $j++; } } function fixDate($date) { $find = array( '/Last Night/', '/(?:^[^,]+,)|(?:Night)/', '/Tonight/' ); $replace = array( 'Yesterday', '', 'Today', ); return preg_replace($find, $replace, $date); } I hope that helps. Link to comment Share on other sites More sharing options...
mark103 Posted April 17, 2013 Author Share Posted April 17, 2013 (edited) Thanks, I have input the code in my php and I saw the list of title included the time. You have got it wrong there and you don't understand what I want to achieve. Let me explain to you again. I want to scrape the data in the current time in the USA that are 5 hours behind my current time which my current time is 3:00am and the usa time is 10:00pm. Please see the data that show in the programme current time like this: 10:00 PM Baseball Tonight LIVE 11:00 PM SportsCenter LIVE Tomorrow 12:00 AM SportsCenter LIVE 1:00 AM SportsCenter LIVE 2:00 AM SportsCenter LIVE 3:00 AM SportsCenter 4:00 AM SportsCenter Now I hope you get my point? Edited April 17, 2013 by mark103 Link to comment Share on other sites More sharing options...
lemmin Posted April 17, 2013 Share Posted April 17, 2013 I thought you said you were five hours behind. Just change the minus (-) to a plus (+) in strtotime() and it will add five hours instead of subtracting it. $datetime = date('M j, Y g:iA', strtotime($date . ' ' . $time . ' +5 hours')); Link to comment Share on other sites More sharing options...
mark103 Posted April 17, 2013 Author Share Posted April 17, 2013 Yes, BUT I SAID I WANT TO SCRAPE THE TITLE THAT IS ON TODAY IN THE CURRENT TIME UNTIL TO THE END OF THE PAGE AND NOT YESTERDAY. I WANT TO DISPLAY THEM IN MY PHP: The USA current time is 10:00PM 10:00 PM Baseball Tonight LIVE 11:00 PM SportsCenter LIVE Tomorrow 12:00 AM SportsCenter LIVE 1:00 AM SportsCenter LIVE 2:00 AM SportsCenter LIVE 3:00 AM SportsCenter 4:00 AM SportsCenter Not like this: ( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - Around the Horn ( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - Pardon the Interruption ( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Yesterday-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter Special (Last Night-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter Special: On the Clock (Last Night-7:00 PM) Dec 31, 1969 7:00PM - NFL Live (Last Night-7:00 PM) Dec 31, 1969 7:00PM - Baseball Tonight (Last Night-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - SportsCenter ( Today-7:00 PM) Dec 31, 1969 7:00PM - Outside the Lines ( Today-7:00 PM) Dec 31, 1969 7:00PM - College Football Live Are you thick??????? Link to comment Share on other sites More sharing options...
Joshua F Posted April 18, 2013 Share Posted April 18, 2013 I think it would possibly be faster and easier to do with this. Just a suggestion. Link to comment Share on other sites More sharing options...
trq Posted April 18, 2013 Share Posted April 18, 2013 Closing this topic. OP no longer deserves help on the subject. Link to comment Share on other sites More sharing options...
Recommended Posts