bissquitt Posted December 31, 2008 Share Posted December 31, 2008 I am thinking PHP would be easiest since this is for a webpage but I am a novice at best. Any help would be appreciated. The regex for parsing each page I should be able to get myself with some time. The portion of the code to "visit each page" is what I am having trouble with. the whole script will read in a list of CIN's (probably from mysql, I can do that easy) and go to the following website with the variable CIN and rip info from the page and store it into the database. http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&ci d=2072 It will return: CID / Name / Section / Instructor / Title / Author / ISBN / Edition / New Price / Used price (if there is one) if there are multiple books it returns the above again (repeating the name / section / instructor) Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/ Share on other sites More sharing options...
premiso Posted December 31, 2008 Share Posted December 31, 2008 So are you wanting someone to write this for you? Do you have code started? If you want to do this I would suggest cURL if cURL is not available file_get_contents or file will also work. To parse it either preg_match OR split, strstr, list will all be functions you would want to use. Good luck! Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726951 Share on other sites More sharing options...
Maq Posted December 31, 2008 Share Posted December 31, 2008 I am thinking PHP would be easiest since this is for a webpage but I am a novice at best. Any help would be appreciated. The regex for parsing each page I should be able to get myself with some time. The portion of the code to "visit each page" is what I am having trouble with. the whole script will read in a list of CIN's (probably from mysql, I can do that easy) and go to the following website with the variable CIN and rip info from the page and store it into the database. http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&ci d=2072 It will return: CID / Name / Section / Instructor / Title / Author / ISBN / Edition / New Price / Used price (if there is one) if there are multiple books it returns the above again (repeating the name / section / instructor) Do you have a question? Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726959 Share on other sites More sharing options...
bissquitt Posted December 31, 2008 Author Share Posted December 31, 2008 premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site. I will take a look at those functions. As far as the parsing is concerned I plan to use preg_match(). Maq: the questions was how to approach the problem of returning the web pages in a way that I can parse and read it. And to elaborate on the questions should I load all the pages into one giant file and then parse that or load it one class at a time, parse, store, repeat? Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726961 Share on other sites More sharing options...
premiso Posted December 31, 2008 Share Posted December 31, 2008 premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site. I will take a look at those functions. As far as the parsing is concerned I plan to use preg_match(). Well good, cause if you did I was just gonna direct you to the freelance section. If it were my script, I would do it 3-5 at a time. To avoid a script timeout and memory issues. If 3-5 you find is too much then limit it to 2-3. This depends on server connection to the site and how much data is being retrieved each call. PHP has a timeout of 30 seconds, most browsers about 2-5 minutes without data being sent to the page. For PHP's timeout set_time_limit should do the trick. If you want to keep the browser alive then look into ob_flush and flush functions to do that. Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726966 Share on other sites More sharing options...
Maq Posted December 31, 2008 Share Posted December 31, 2008 premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site. Some people may give you their pre-made scripts but usually don't build them from scratch. IMO this is what you should do, pseudo: for($i=0; icURL("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=$i"); //notice the $i var //use your regex to extract the appropriate information //store it somewhere, CSV maybe? } Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726968 Share on other sites More sharing options...
dennismonsewicz Posted December 31, 2008 Share Posted December 31, 2008 This has nothing to do with this thread but what does IMO mean? LOL I keep seeing it but have no idea what it means Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726971 Share on other sites More sharing options...
premiso Posted December 31, 2008 Share Posted December 31, 2008 This has nothing to do with this thread but what does IMO mean? LOL I keep seeing it but have no idea what it means In My Opinion. Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726976 Share on other sites More sharing options...
dennismonsewicz Posted December 31, 2008 Share Posted December 31, 2008 oh well that was simple enough... carry on with the thread... *bows out* Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726980 Share on other sites More sharing options...
Maq Posted December 31, 2008 Share Posted December 31, 2008 IMHO = in my honest opinion @bissquitt You're better off trying to write this script and coming back with specific answers. Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-726987 Share on other sites More sharing options...
bissquitt Posted December 31, 2008 Author Share Posted December 31, 2008 premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site. Some people may give you their pre-made scripts but usually don't build them from scratch. IMO this is what you should do, pseudo: for($i=0; i<$num_cids; $i++) { cURL("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=$i"); //notice the $i var //use your regex to extract the appropriate information //store it somewhere, CSV maybe? } when using Curl how is it that i access the page info? Is it dumped into an array like the mysql_fetch_query? While I know what you provided is psudo, I would imagine your Curl line would be many lines though i am the one requesting your assistance so I could be wrong. Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-727025 Share on other sites More sharing options...
premiso Posted December 31, 2008 Share Posted December 31, 2008 implode If it does return an array use implode to but it into one single line =) Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-727031 Share on other sites More sharing options...
Maq Posted December 31, 2008 Share Posted December 31, 2008 Please read cURL. For your circumstances you may want to use file_get_contents(), it's a little easier to use. You should also Google Screen scrape because there are many classes, already made, to handle what you're looking for. Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-727034 Share on other sites More sharing options...
bissquitt Posted January 1, 2009 Author Share Posted January 1, 2009 Ok so I got it working with the below code. I am having an issue with the results though. It appears to return just the html of the page without any of the database queries that make it useful. curl seemed overly complicated for what I wanted to do. (this is just a debug test page on my way to the full script so all it does is query one class) http://bookscrooge.com/test/parsebook.php is my page http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=2072 is the actual site Thoughts on why this may be or how to overcome it? $infile = fopen("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=2072", "r"); while(($line = fgets($infile)) !== FALSE) { echo $line; } fclose($infile); Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-727464 Share on other sites More sharing options...
bissquitt Posted January 1, 2009 Author Share Posted January 1, 2009 So i was fooling around with curl after the previous issue and got the following error message. I put the page back to the way I had it so the other issue can still be seen. Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /f1/content/books/public/test/parsebook.php on line 24 Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-727479 Share on other sites More sharing options...
Maq Posted January 2, 2009 Share Posted January 2, 2009 It appears to return just the html of the page without any of the database queries that make it useful. You will never be able to get queries or any server side code for that matter. It all just gets rendered to HTML and put on the browser. Quote Link to comment https://forums.phpfreaks.com/topic/138986-scripting-help/#findComment-728192 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.