Jump to content

Two Noob coding Q's on searching URL's


Agrajag

Recommended Posts

I'm brand new to PHP coding (but was a old-time VB coder). I figured out how to search a URL for a string. What I'm looking to figure out is two-fold:

 

1. I want search a specific site and grab some content there that's date-specific. That date is listed in the site's Title tag so the title tag will say something like "This weekend's stats for February 22-23, 2013 ---".

 

I can search for the title tag but how to I go about actually turning just the key piece I need into a string? 

 

2. Also on this page I need to find roughly 80 pieces of data that will be all the same except for one unknown string in the middle of 80 TD/URL/name tags (movie titles). How do I get the 80 items into an array? I'm sure I'll use a FOR loop or WHILE loop but what's the code for this look like to get those lines into strings?

 

Thanks.

Link to comment
Share on other sites

This is probably going to centre around two main functions.

 

Firstly file_get_contents (http://www.php.net/manual/en/function.file-get-contents.php) to grab the page data. You can enter a url as the file name.

 

Then you'll want to break out your regular expressions and use a function such as  preg match (http://php.net/manual/en/function.preg-match.php).  This returns an array of matches.

 

 

 

I'm brand new to PHP coding (but was a old-time VB coder). I figured out how to search a URL for a string. What I'm looking to figure out is two-fold:

 

1. I want search a specific site and grab some content there that's date-specific. That date is listed in the site's Title tag so the title tag will say something like "This weekend's stats for February 22-23, 2013 ---".

 

I can search for the title tag but how to I go about actually turning just the key piece I need into a string? 

 

2. Also on this page I need to find roughly 80 pieces of data that will be all the same except for one unknown string in the middle of 80 TD/URL/name tags (movie titles). How do I get the 80 items into an array? I'm sure I'll use a FOR loop or WHILE loop but what's the code for this look like to get those lines into strings?

 

Thanks.

Link to comment
Share on other sites

Actually, I would not recommend regular expressions for this job. For parsing HTML the DOMdocument class is a much better choice.

If it was just grabbing the title, or parts of it, then RegExps might have been servicable, but seeing as you want to grab more stuff from the page you should use a proper parser.

 

Incidentally: To grab the date from the title, after you've fetched it with DOMdocument, you will probably need to use a RegExp. But by then it's just a simple string you're feeding it, which makes it a rather trivial problem. ;)

Edited by Christian F.
Link to comment
Share on other sites

Thanks guys. First I'd like to tackle this the non-DOM way and then go about it the DOM way to round out my understanding. For now, the first way I'm still unclear on as the dynamic nature of the strings involved is throwing me for a loop here.

 

Can you provide a code segment to explain this? For now I'm doing a file_get_contents to load up the hole HTML page. Once I have that I can easily find the title tag. However, since the month/date (plus spaces and a dash) are dynamic AND I need the text after it) I'm not at all sure how to segment out the middle section. Explode isn't going to cut it I don't believe as the delimiter isn't the same on both sides. Remember the string is essentially:

 

<title>This weekend's stats for February 22-23, 2013 ---</title>

 

Then the next hurdle is how to adjust that to be able to do the same thing for the movie titles as, in this case, a preg_match seems obvious. I need an array of upwards of 80 movies to select from. That looks like this: (assuming I can paste this):

 

<td><font size="2"><a href="/movies/?id=safehaven.htm"><b>Safe Haven</b></a></font></td>

 

The text I'll need there is the URL and, separately, the textual movie title.

 

Thanks again.

Edited by Agrajag
Link to comment
Share on other sites

Jessica, 

 

I don't understand the "right" way at all based on the docs I looked at. I'd like to get an understanding of it the way that's more parallel to the old way I learned and then apply that to the new way. Come on... Your post offered nothing of value. How about saying, "I think a better approach would be to do it the more accepted way first and that's done like this:"

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.