Scripting help

bissquitt · December 31, 2008

I am thinking PHP would be easiest since this is for a webpage but I am a novice at best. Any help would be appreciated. The regex for parsing each page I should be able to get myself with some time. The portion of the code to "visit each page" is what I am having trouble with.

the whole script will read in a list of CIN's (probably from mysql, I can do that easy) and go to the following website with the variable CIN and rip info from the page and store it into the database.

http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&ci

d=2072

It will return:

CID / Name / Section / Instructor / Title / Author / ISBN / Edition / New Price / Used price (if there is one)

if there are multiple books it returns the above again (repeating the name / section / instructor)

premiso · December 31, 2008

So are you wanting someone to write this for you? Do you have code started?

If you want to do this I would suggest cURL if cURL is not available file_get_contents or file will also work.

To parse it either preg_match OR split, strstr, list will all be functions you would want to use.

Good luck!

Maq · December 31, 2008

I am thinking PHP would be easiest since this is for a webpage but I am a novice at best. Any help would be appreciated. The regex for parsing each page I should be able to get myself with some time. The portion of the code to "visit each page" is what I am having trouble with.

the whole script will read in a list of CIN's (probably from mysql, I can do that easy) and go to the following website with the variable CIN and rip info from the page and store it into the database.

http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&ci

d=2072

It will return:

CID / Name / Section / Instructor / Title / Author / ISBN / Edition / New Price / Used price (if there is one)

if there are multiple books it returns the above again (repeating the name / section / instructor)

Do you have a question?

bissquitt · December 31, 2008

premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site. I will take a look at those functions. As far as the parsing is concerned I plan to use preg_match().

Maq: the questions was how to approach the problem of returning the web pages in a way that I can parse and read it. And to elaborate on the questions should I load all the pages into one giant file and then parse that or load it one class at a time, parse, store, repeat?

premiso · December 31, 2008

premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site. I will take a look at those functions. As far as the parsing is concerned I plan to use preg_match().

Well good, cause if you did I was just gonna direct you to the freelance section.

If it were my script, I would do it 3-5 at a time. To avoid a script timeout and memory issues. If 3-5 you find is too much then limit it to 2-3.

This depends on server connection to the site and how much data is being retrieved each call. PHP has a timeout of 30 seconds, most browsers about 2-5 minutes without data being sent to the page. For PHP's timeout set_time_limit should do the trick. If you want to keep the browser alive then look into ob_flush and flush functions to do that.

Maq · December 31, 2008

premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site.

Some people may give you their pre-made scripts but usually don't build them from scratch.

IMO this is what you should do, pseudo:

for($i=0; icURL("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=$i"); //notice the $i var
//use your regex to extract the appropriate information
//store it somewhere, CSV maybe?
}

dennismonsewicz · December 31, 2008

This has nothing to do with this thread but what does IMO mean? LOL I keep seeing it but have no idea what it means

premiso · December 31, 2008

This has nothing to do with this thread but what does IMO mean? LOL I keep seeing it but have no idea what it means

In My Opinion.

dennismonsewicz · December 31, 2008

oh well that was simple enough... carry on with the thread... *bows out*

Maq · December 31, 2008

IMHO = in my honest opinion

@bissquitt

You're better off trying to write this script and coming back with specific answers.

bissquitt · December 31, 2008

premisio: If someone wants to volunteer and do it then I would be grateful but my impression was that this wasn't that kind of site.

Some people may give you their pre-made scripts but usually don't build them from scratch.

IMO this is what you should do, pseudo:
for($i=0; i<$num_cids; $i++) {
cURL("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=$i"); //notice the $i var
//use your regex to extract the appropriate information
//store it somewhere, CSV maybe?
}

when using Curl how is it that i access the page info? Is it dumped into an array like the mysql_fetch_query? While I know what you provided is psudo, I would imagine your Curl line would be many lines though i am the one requesting your assistance so I could be wrong.

premiso · December 31, 2008

implode

If it does return an array use implode to but it into one single line =)

Maq · December 31, 2008

Please read cURL.

For your circumstances you may want to use file_get_contents(), it's a little easier to use.

You should also Google Screen scrape because there are many classes, already made, to handle what you're looking for.

bissquitt · January 1, 2009

Ok so I got it working with the below code. I am having an issue with the results though. It appears to return just the html of the page without any of the database queries that make it useful. curl seemed overly complicated for what I wanted to do. (this is just a debug test page on my way to the full script so all it does is query one class)

http://bookscrooge.com/test/parsebook.php is my page

http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=2072 is the actual site

Thoughts on why this may be or how to overcome it?

$infile = fopen("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=2072", "r");

while(($line = fgets($infile)) !== FALSE) {

echo $line;

}

fclose($infile);

bissquitt · January 1, 2009

So i was fooling around with curl after the previous issue and got the following error message. I put the page back to the way I had it so the other issue can still be seen.

Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /f1/content/books/public/test/parsebook.php on line 24

Maq · January 2, 2009

It appears to return just the html of the page without any of the database queries that make it useful.

You will never be able to get queries or any server side code for that matter. It all just gets rendered to HTML and put on the browser.

Sign In

Scripting help

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information