seany123 Posted May 25, 2011 Share Posted May 25, 2011 is it possible to collect data from another website and insert it into my db?, lets say for example: http://www.imdb.com/title/tt0285331/episodes#season-1 could i somehow get the Episode name eg: Episode 1: 12:00 a.m.-1:00 a.m. and the description Jack Bauer is called to his office because there's a threat on the life of a US Senator who's running for President; Jack also discovers that his daughter has skipped out her bedroom window. and place that into a table in my db? any help would be great. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/ Share on other sites More sharing options...
Fadion Posted May 25, 2011 Share Posted May 25, 2011 Simply put, you can use file_get_contents() to get the page output and run a regex to filter out what information you want. That's basic scrapping. From what I saw, the season information are put in a structure like the following one: <div class="filter-all filter-year-2001"> <hr /> <table cellspacing="0" cellpadding="0"> <tr> <td valign="top"> <div class="episode_slate_container"><div class="episode_slate_missing"></div></div> </td> <td valign="top"> <h3>Season 1, Episode 1: <a href="/title/tt0502165/">12:00 a.m.-1:00 a.m.</a></h3> <span class="less-emphasis">Original Air Date—<strong>6 November 2001</strong></span><br> Jack Bauer is called to his office because there's a threat on the life of a US Senator who's running for President... </td> </tr> </table> </div> Running a regex to get what's inside <div class="filter-all filter-year-2001"> will return you all the seasons information. I'm not a regex expert, but I wrote just a simple one to get you that information. You'll have to figure out by yourself how to get the title and description without any html around it. <?php $str = file_get_contents('http://www.imdb.com/title/tt0285331/episodes#season-1'); preg_match_all('|\<div class=\"filter-all filter-year-2001\"\>(.+)\</div\>|', $str, $matches); print_r($matches); ?> EDIT: I noticed that <div class="filter-all filter-year-2001"> changes based on the season's year. You can easily run a loop from the starting year to the ending one. Scrapping is a b*tch Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1219913 Share on other sites More sharing options...
seany123 Posted May 25, 2011 Author Share Posted May 25, 2011 Simply put, you can use file_get_contents() to get the page output and run a regex to filter out what information you want. That's basic scrapping. From what I saw, the season information are put in a structure like the following one: <div class="filter-all filter-year-2001"> <hr /> <table cellspacing="0" cellpadding="0"> <tr> <td valign="top"> <div class="episode_slate_container"><div class="episode_slate_missing"></div></div> </td> <td valign="top"> <h3>Season 1, Episode 1: <a href="/title/tt0502165/">12:00 a.m.-1:00 a.m.</a></h3> <span class="less-emphasis">Original Air Date—<strong>6 November 2001</strong></span><br> Jack Bauer is called to his office because there's a threat on the life of a US Senator who's running for President... </td> </tr> </table> </div> Running a regex to get what's inside <div class="filter-all filter-year-2001"> will return you all the seasons information. I'm not a regex expert, but I wrote just a simple one to get you that information. You'll have to figure out by yourself how to get the title and description without any html around it. <?php $str = file_get_contents('http://www.imdb.com/title/tt0285331/episodes#season-1'); preg_match_all('|\<div class=\"filter-all filter-year-2001\"\>(.+)\</div\>|', $str, $matches); print_r($matches); ?> EDIT: I noticed that <div class="filter-all filter-year-2001"> changes based on the season's year. You can easily run a loop from the starting year to the ending one. Scrapping is a b*tch thankyou, i will defiantly be looking into this more, you say its a b*th but it beats having to C+P every single episode lol Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1219995 Share on other sites More sharing options...
Maq Posted May 25, 2011 Share Posted May 25, 2011 It's called "screen scraping", Google it. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220047 Share on other sites More sharing options...
seany123 Posted May 26, 2011 Author Share Posted May 26, 2011 im still struggling 1: im not really understanding how this works... preg_match_all('|\<div class=\"filter-all filter-year-2001\"\>(.+)\</div\>|', $str, $matches); how can i make it loop so it picks up all of the years? 2:how i then create a loop so it picks up all the episodes, and grabs certain things like the episode name... and assign them a variable? then place them into a mysql db? Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220707 Share on other sites More sharing options...
xyph Posted May 26, 2011 Share Posted May 26, 2011 I think you're out of your league here. It's going to be hard to provide you with a solution that isn't simply doing the work for you. You're going to have to read up a LOT on RegEx, or grab some sort of PHP html parser class that you can use to filter out specific parts of the website you plan on scraping. If you can show me that you can grab the source code of the pages you want to scrape from, I'll continue to help. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220724 Share on other sites More sharing options...
seany123 Posted May 26, 2011 Author Share Posted May 26, 2011 yes im alot out of my league here, however if i wasn't i wouldn't be on this forum seeking help now would i lol as for showing you i can grab the source, it seems rather pointless as GuiltyGear has already posted code to show all the data i need above. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220730 Share on other sites More sharing options...
xyph Posted May 26, 2011 Share Posted May 26, 2011 The problem with asking for help when dealing with things out of your league is you won't understand the solution given. Once you understand how something works, we're here to help you fit it in to your current solution. If you don't understand what's going on, when things go wrong you're right back here, rather than trying to debug it yourself. You don't learn. Sorry, I figured you wanted to grab multiple seasons per script execution, a loop of some sort. Once you have the HTML, it's simple RegEx to extract the parts you want. I'll give you a start. <?php $html = file_get_contents( 'http://www.imdb.com/title/tt0285331/episodes#season-1' ); $pattern = '%<h3>(.+?): <[^>]++>([^<]++)</a>%'; preg_match_all($pattern, $html, $result, PREG_SET_ORDER); print_r( $result ); ?> Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220748 Share on other sites More sharing options...
seany123 Posted May 26, 2011 Author Share Posted May 26, 2011 The problem with asking for help when dealing with things out of your league is you won't understand the solution given. Once you understand how something works, we're here to help you fit it in to your current solution. If you don't understand what's going on, when things go wrong you're right back here, rather than trying to debug it yourself. You don't learn. Sorry, I figured you wanted to grab multiple seasons per script execution, a loop of some sort. Once you have the HTML, it's simple RegEx to extract the parts you want. I'll give you a start. <?php $html = file_get_contents( 'http://www.imdb.com/title/tt0285331/episodes#season-1' ); $pattern = '%<h3>(.+?): <[^>]++>([^<]++)</a>%'; preg_match_all($pattern, $html, $result, PREG_SET_ORDER); print_r( $result ); ?> you are correct helping me just gave me 10 more questions *sight* is there anyway from using $result to insert into my mysql db? once i get it into my mysql db i can just use php/mysql to get everything i need. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220795 Share on other sites More sharing options...
xyph Posted May 26, 2011 Share Posted May 26, 2011 Yes there is a way, but again, if you're not sure on how to take an array of values and insert them into a DB, you have much more to learn before jumping into something like this. I suggest hiring a programmer to get this done for you if you don't have time or care to learn - that way when you run into other datasets that may not match the RegEx sample I provided they'll be able to support it and make the changes you need. Here's a sample code of taking a multi-dimensional array and inserting it into a MySQL database <?php $sql = new mysqli( 'localhost', 'root', '', 'test' ); if ($sql->connect_error) { die('Connect Error (' . $sql->connect_errno . ') ' . $sql->connect_error); } $data = array( array( 'foo', 'bar' ), array( 'hello', 'world' ), array( 'more', 'data' ) ); foreach( $data as $key => $val ) $data[$key] = '\''.implode('\',\'',$val).'\''; $query_data = '('.implode('),(',$data).')'; $query = 'INSERT INTO `table` (`col1`,`col2`) VALUES '.$query_data; if( $sql->query($query) === TRUE ) echo 'Data added'; else echo 'Data failed to add'; // close $sql->kill($sql->thread_id); $sql->close(); ?> Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220828 Share on other sites More sharing options...
seany123 Posted May 26, 2011 Author Share Posted May 26, 2011 i dont have the funds that would be required to pay for a programmer, considering this is for a non-profit website... this is just along way over my head and i doubt i could learn it in any reasonable time frame, looks like i will just have to resort to the old fashioned C+P thanks for all help received. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220846 Share on other sites More sharing options...
Maq Posted May 26, 2011 Share Posted May 26, 2011 You've been a moderately active member here for 2 1/2 years, surely you know how to insert into a database. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220852 Share on other sites More sharing options...
xyph Posted May 26, 2011 Share Posted May 26, 2011 Dunno why you're being so pessimistic. RegEx is a relatively easy syntax to learn. If you can learn PHP/C++, I don't see how you should have issues with RegEx. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220854 Share on other sites More sharing options...
seany123 Posted May 26, 2011 Author Share Posted May 26, 2011 You've been a moderately active member here for 2 1/2 years, surely you know how to insert into a database. yes i know how to insert a normal variables etc into a table, but as its been pointed out this is a 'multi-dimensional array' which i have no experience in and 2 1/2 years wow has it really been that long. Dunno why you're being so pessimistic. RegEx is a relatively easy syntax to learn. If you can learn PHP/C++, I don't see how you should have issues with RegEx. im not being pessimistic, im just weighing up the time it would take to learn the language and then write a reasonably decent script against the time it would take to C+P.. for now C+P is the best option although i know in the long learning the language would be better. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220856 Share on other sites More sharing options...
xyph Posted May 26, 2011 Share Posted May 26, 2011 Oh, I thought you meant C++ typo, not copy and paste (C+P) If you plan on doing any string-oriented programming, knowing RegEx will be an asset and worth learning. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1220862 Share on other sites More sharing options...
seany123 Posted May 27, 2011 Author Share Posted May 27, 2011 Oh, I thought you meant C++ typo, not copy and paste (C+P) If you plan on doing any string-oriented programming, knowing RegEx will be an asset and worth learning. yes maybe ill learn this in the future, out of curiousity, how much do you think a job like the 1 above would cost? Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1221127 Share on other sites More sharing options...
xyph Posted May 27, 2011 Share Posted May 27, 2011 It really depends on how flexible the script has to be, and whether you want a front-end, user/pass system etc. If this is just a one time, execute and delete kind of thing, I don't see it being more than an hours worth of code. $25-50 would be what I would charge. That would be a PHP script with variables at the top for you to modify, no real front end. Quote Link to comment https://forums.phpfreaks.com/topic/237398-collecting-data-from-websites/#findComment-1221333 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.