Jump to content

Scripting help


bissquitt

Recommended Posts

I am thinking PHP would be easiest since this is for a webpage but I am a novice at best. Any help would be appreciated. The regex for parsing each page I should be able to get myself with some time. The portion of the code to "visit each page" is what I am having trouble with.

 

 

the whole script will read in a list of CIN's (probably from mysql, I can do that easy) and go to the following website with the variable CIN and rip info from the page and store it into the database.

 

http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&ci

d=2072

 

It will return:

 

CID / Name / Section / Instructor / Title / Author / ISBN / Edition / New Price / Used price (if there is one)

 

if there are multiple books it returns the above again (repeating the name / section / instructor)

Link to comment
Share on other sites

I am thinking PHP would be easiest since this is for a webpage but I am a novice at best. Any help would be appreciated. The regex for parsing each page I should be able to get myself with some time. The portion of the code to "visit each page" is what I am having trouble with.

 

 

the whole script will read in a list of CIN's (probably from mysql, I can do that easy) and go to the following website with the variable CIN and rip info from the page and store it into the database.

 

http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&ci

d=2072

 

It will return:

 

CID / Name / Section / Instructor / Title / Author / ISBN / Edition / New Price / Used price (if there is one)

 

if there are multiple books it returns the above again (repeating the name / section / instructor)

 

Do you have a question?

Link to comment
Share on other sites

premisio: If someone wants to volunteer and do it then I would be grateful  but my impression was that this wasn't that kind of site. I will take a look at those functions. As far as the parsing is concerned I plan to use preg_match().

 

Maq: the questions was how to approach the problem of returning the web pages in a way that I can parse and read it. And to elaborate on the questions should I load all the pages into one giant file and then parse that or load it one class at a time, parse, store, repeat?

Link to comment
Share on other sites

premisio: If someone wants to volunteer and do it then I would be grateful  but my impression was that this wasn't that kind of site. I will take a look at those functions. As far as the parsing is concerned I plan to use preg_match().

 

Well good, cause if you did I was just gonna direct you to the freelance section.

 

If it were my script, I would do it 3-5 at a time. To avoid a script timeout and memory issues. If 3-5 you find is too much then limit it to 2-3.

 

This depends on server connection to the site and how much data is being retrieved each call. PHP has a timeout of 30 seconds, most browsers about 2-5 minutes without data being sent to the page. For PHP's timeout set_time_limit should do the trick. If you want to keep the browser alive then look into ob_flush and flush functions to do that.

Link to comment
Share on other sites

premisio: If someone wants to volunteer and do it then I would be grateful  but my impression was that this wasn't that kind of site.

 

Some people may give you their pre-made scripts but usually don't build them from scratch.

 

IMO this is what you should do, pseudo:

 

for($i=0; icURL("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=$i"); //notice the $i var
//use your regex to extract the appropriate information
//store it somewhere, CSV maybe?
}

 

 

Link to comment
Share on other sites

premisio: If someone wants to volunteer and do it then I would be grateful  but my impression was that this wasn't that kind of site.

 

Some people may give you their pre-made scripts but usually don't build them from scratch.

 

IMO this is what you should do, pseudo:

 

for($i=0; i<$num_cids; $i++) {
cURL("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=$i"); //notice the $i var
//use your regex to extract the appropriate information
//store it somewhere, CSV maybe?
}

 

when using Curl how is it that i access the page info? Is it dumped into an array like the mysql_fetch_query? While I know what you provided is psudo, I would imagine your Curl line would be many lines though i am the one requesting your assistance so I could be wrong.

Link to comment
Share on other sites

Ok so I got it working with the below code. I am having an issue with the results though. It appears to return just the html of the page without any of the database queries that make it useful. curl seemed overly complicated for what I wanted to do. (this is just a debug test page on my way to the full script so all it does is query one class)

 

http://bookscrooge.com/test/parsebook.php is my page

 

http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=2072 is the actual site

 

Thoughts on why this may be or how to overcome it?

 

$infile = fopen("http://bookstore.umbc.edu/SelectCourses.aspx?src=2&type=2&stoid=9&trm=Spring%2009&cid=2072", "r");

while(($line = fgets($infile)) !== FALSE) {

echo $line;

}

fclose($infile);

 

Link to comment
Share on other sites

So i was fooling around with curl after the previous issue and got the following error message. I put the page back to the way I had it so the other issue can still be seen.

 

 

Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /f1/content/books/public/test/parsebook.php on line 24

Link to comment
Share on other sites

It appears to return just the html of the page without any of the database queries that make it useful.

 

You will never be able to get queries or any server side code for that matter.  It all just gets rendered to HTML and put on the browser.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.