Jump to content

I almost see a pattern, but don't


Zane

Recommended Posts

Oh this is such a hard one..

Alright
my school apparently has, from limited sources, NO Database to store all the courses offered.
They use DOS programs to give whatevr output they need, so I'm sure they at least have some sort of old school CSV file somewhere I can't find.
but anyway...


I'm trying to port all available data, which is just HTML pages (horribly organized ones) to a MySQL database.
I've done great so far with all the courses
http://www.southwesterncc.edu/acadprog/desc/aca-bus.htm
I got all of them sorted

I've now moved to another yet horrifying HTML file that I want to port, but I'm having trouble finding any patterns since it's just text and spaces.
but my goal is to get every class listed here
http://www.southwesterncc.edu/acadprog/spring/index.htm
into a database...so I could sort by teacher or location, etc

Does anyone have any ideas here
I know I'm gong to have to use Regex most definitely, unless I want to go insane.
I'm not asking for you to write me a regex pattern though, I'm just asking if anyone can't spot any logical patterns
or describe a logical procedure I could use to at least get each class into an array of some sort.
Link to comment
Share on other sites

Access does a pretty good job of importing things like that. Here is a quick run though.  http://www.iobe.net/Temp.csv

You would still have to go in and verify all looks good and it looks like you would have to fix some of the urls... Or take another run of importing it and delelet some of the seperators that I did not delete at the end...


-John
Link to comment
Share on other sites

I'd use that "Crdt  Building... etc" deal at the top to start the regex match. I think in this situation you can get away with using things like "\s+" since I have no idea if those are all spaces (guessing yes, maybe some tabs). I'm sure you see the regex patterns you could apply, I've had decent luck pulling things like this into Excel (probably very similar to the access route above). From there I've got a nice little script I found a while ago to generate SQL from the Excel XML. (Its easy to verify the data and manually fix it if need be in Excel).
Link to comment
Share on other sites

I know you didn't really want a regex pattern, but I started on it (as a little challenge for myself, and I had some spare time) and it seems to be working. Though I'd share it with you, save you some time.
[code]([A-Z]{3} [0-9]{3}[A-Z]?)\s+([A-Z]{2}[1-9])\s([A-Za-z ]+)\s([1-5]\.0)\s([A-Za-z]+, [A-Za-z]+)\s+([A-Za-z]+(?: [A-Za-z]+)?)\s+([-MTWHFSU]{1,5})\s+(\d\d:\d\d-\d\d:\d\d[AP])[/code]
Tried to do it all in one foul-swoop. Few draw-backs: Only gets classes with one or two word location, I can't figure out how to get the comments that are after classes like "See advisor for placement.". It'll pull all the fields out.
Link to comment
Share on other sites

thanks a lot man for that pattern...

it helped a little.

i don't know why I didn't think of it earlier but I starred at it long enough and realized that if there were more than one space, it wasn't meant to be in the same field so I replaced any space that exceeded 1 with a comma
so I think I'm getting closer
it looks WAY more feasible now
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.