Zane Posted December 6, 2006 Share Posted December 6, 2006 Oh this is such a hard one..Alrightmy school apparently has, from limited sources, NO Database to store all the courses offered.They use DOS programs to give whatevr output they need, so I'm sure they at least have some sort of old school CSV file somewhere I can't find.but anyway...I'm trying to port all available data, which is just HTML pages (horribly organized ones) to a MySQL database.I've done great so far with all the courseshttp://www.southwesterncc.edu/acadprog/desc/aca-bus.htmI got all of them sortedI've now moved to another yet horrifying HTML file that I want to port, but I'm having trouble finding any patterns since it's just text and spaces.but my goal is to get every class listed herehttp://www.southwesterncc.edu/acadprog/spring/index.htminto a database...so I could sort by teacher or location, etcDoes anyone have any ideas hereI know I'm gong to have to use Regex most definitely, unless I want to go insane.I'm not asking for you to write me a regex pattern though, I'm just asking if anyone can't spot any logical patternsor describe a logical procedure I could use to at least get each class into an array of some sort. Quote Link to comment Share on other sites More sharing options...
jsladek Posted December 6, 2006 Share Posted December 6, 2006 Access does a pretty good job of importing things like that. Here is a quick run though. http://www.iobe.net/Temp.csvYou would still have to go in and verify all looks good and it looks like you would have to fix some of the urls... Or take another run of importing it and delelet some of the seperators that I did not delete at the end...-John Quote Link to comment Share on other sites More sharing options...
c4onastick Posted December 6, 2006 Share Posted December 6, 2006 I'd use that "Crdt Building... etc" deal at the top to start the regex match. I think in this situation you can get away with using things like "\s+" since I have no idea if those are all spaces (guessing yes, maybe some tabs). I'm sure you see the regex patterns you could apply, I've had decent luck pulling things like this into Excel (probably very similar to the access route above). From there I've got a nice little script I found a while ago to generate SQL from the Excel XML. (Its easy to verify the data and manually fix it if need be in Excel). Quote Link to comment Share on other sites More sharing options...
c4onastick Posted December 6, 2006 Share Posted December 6, 2006 I know you didn't really want a regex pattern, but I started on it (as a little challenge for myself, and I had some spare time) and it seems to be working. Though I'd share it with you, save you some time.[code]([A-Z]{3} [0-9]{3}[A-Z]?)\s+([A-Z]{2}[1-9])\s([A-Za-z ]+)\s([1-5]\.0)\s([A-Za-z]+, [A-Za-z]+)\s+([A-Za-z]+(?: [A-Za-z]+)?)\s+([-MTWHFSU]{1,5})\s+(\d\d:\d\d-\d\d:\d\d[AP])[/code]Tried to do it all in one foul-swoop. Few draw-backs: Only gets classes with one or two word location, I can't figure out how to get the comments that are after classes like "See advisor for placement.". It'll pull all the fields out. Quote Link to comment Share on other sites More sharing options...
Zane Posted December 6, 2006 Author Share Posted December 6, 2006 thanks a lot man for that pattern...it helped a little.i don't know why I didn't think of it earlier but I starred at it long enough and realized that if there were more than one space, it wasn't meant to be in the same field so I replaced any space that exceeded 1 with a commaso I think I'm getting closerit looks WAY more feasible now Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.