Jump to content

Extracting Website Source Information


jamesbrauman

Recommended Posts

Hey all, brace yourselfs, this is going to be somewhat a long post....

 

I'm not sure if this is possible in php or not, but I thought I would at least try as PHP is the only scripting language I am somewhat comfortable with with the exception of HTML, CSS.

 

There is a website, http://www.ahajokes.com/, which contains an archive of funny jokes. Each of these jokes is stored on a different page. What I am attempting to do, through PHP, is navigate to each of those pages, save the joke to my database, and move on to the next.

 

There will be quite a couple of steps involved, I believe. First of all, we need to open the web source for the index page of http://www.ahajokes.com/. The source for this page is here:

 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Aha! Jokes: Clean Humor and Funny Pictures!</title>
<meta name="keywords" content="jokes,funny,pictures,humor,cartoons,clean,laugh,comedy,hilarious,free">
<meta name="description" content="Jokes, funny pictures, free cartoons, humor, fun pages, and more!">
</head>
<body text="#000000" link="#0000FF" vlink="#0000FF" bgcolor="#FFFFFF">
<!-- Jokes Clean Humor Cartoons Funny Pictures -->
<table align=center bgcolor="#000000" border=0 cellpadding=5 cellspacing=4 width=600>
<tr bgcolor="#FFFFAA">
<td width=123 valign=top><br><br><br><br><img src="g/brief.gif" alt="Humor"></td>
<td bgcolor="#DDDDDD"><center><font face="times new roman,helvetica"><b><font size="+4" color="#000099" face="comic sans ms,times new roman">AhaJokes.com</font></b><br><img src="g/black.gif" height=2 width=440 vspace=7></center>Thousands of clean jokes, funny pictures, cartoons, funny audio, funny videos, and more. Search for humor by keyword, by topic, or even by date!  Plus, sign up to get humor in your e-mail three times a week!</td></tr>
</table>
<br>
<table align=center bgcolor="#006699" border=0 cellpadding=0 cellspacing=0 width=600>
<tr>
<td><a href="http://www.ahajokes.com/"><img src="g/smlogo.gif" height=64 width=132 alt="Jokes" border=0></a></td>
<td bgcolor="#FFFFFF" valign=bottom>
<CENTER>
<a href="http://www.needmypassword.com">
<object width="468" height="60">
<param name="movie" value="online-password-storage.swf">
<embed src="online-password-storage.swf" width="468" height="60">
</embed>
</object></a>
Never Forget Your Passwords Again!  <a href="http://www.needmypassword.com">Register for Free</a>
</CENTER></td>
</tr>
<tr><td align=center bgcolor="#000000"> </td><td align=center bgcolor="#000000"> <font color="#FFFFFF" face="arial,helvetica" size=2><b>The Leader in Clean Humor and Uncontrollable Laughter!</b></font></td></tr>
<tr><td valign=top width=132><font face="times new roman,helvetica">
<br>
<table align=center width="120" bgcolor="#FFFF00" border="0" cellpadding="0" cellspacing="0">
<tr><td align=center colspan=3 bgcolor="#000000"><font face="arial,helvetica" size="3" color="#FFFFFF"><b>Mailing List</b></font></td></tr>
<tr><td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td>
<td valign="top" align=center width=118><font face="arial" size="2"><b>Enter your E-MAIL address BELOW to GET FREE daily HUMOR by e-mail!</b></font><form method=post action="http://www.ahajokes.com/mlm/mlm.cgi"><input type="text" name="address" size="12"><input type=hidden name=action value="Subscribe"><br><input type="submit" value="Subscribe!"><br><br></td></form>
<td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
<tr><td colspan=3 bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
</table><br>
<table align=center width="120" bgcolor="#BBCCFF" border="0" cellpadding="0" cellspacing="0">
<tr><td align=center colspan=3 bgcolor="#000000"><font face="arial" size="2" color="#FFFFFF"><b>Laugh Links</b></font></td></tr>
<tr><td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td>
<td valign="top" width=118>
<font face="arial,helvetica" size="-1">
<b>
- <a href="http://www.ahajokes.com/funny_cartoons.html" style="text-decoration:none">Funny Cartoons</a><br>
- <a href="http://www.ahajokes.com/random_jokes.html" style="text-decoration:none">Random Jokes</a><br>
- <a href="http://www.ahajokes.com/fun_pages.html" style="text-decoration:none">Fun Pages</a><br>
- <a href="http://www.ahajokes.com/funny_videos.html" style="text-decoration:none">Funny Videos</a><br>
- <a href="http://www.ahajokes.com/funny_audio.html" style="text-decoration:none">Funny Audio</a><br>
- <a href="http://www.ahajokes.com/fun_downloads.html" style="text-decoration:none">Fun Downloads</a><br>
- <a href="http://www.ahajokes.com/fun_games.html" style="text-decoration:none">Fun Games</a><br>
- <a href="http://www.ahajokes.com/funny_links.html" style="text-decoration:none">Funny Links</a><br>
> Featured Today<br>
- <a href="whatnew.html" style="text-decoration:none">What's new?</a><br>
- <a href="http://www.ahajokes.com/joke_of_the_day.shtml" style="text-decoration:none">Joke of the Day</a><br>
- <a href="http://www.ahajokes.com/funny_pic_of_the_day.shtml" style="text-decoration:none">Funny Pic of Day</a><br>
> Other Options<br>
- <a href="contact.html" style="text-decoration:none">Contact us</a><br>
- <a href="link.html" style="text-decoration:none">Link to us</a><br>
- <a href="http://www.ahajokes.com/submit_a_joke.html" style="text-decoration:none">Submit a Joke</a><br>
</b>
</font>
</td>
<td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
<tr><td colspan=3 bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
</table><br>
<table align=center width="120" bgcolor="#BBCCFF" border="0" cellpadding="0" cellspacing="0">
<tr><td align=center colspan=3 bgcolor="#000000"><font face="arial" size="2" color="#FFFFFF"><b>Come back soon!</b></font></td></tr>
<tr><td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td><td valign="top" align=center width=118><font face="arial" size="2"><b>
<SCRIPT LANGUAGE="JavaScript">
if ((navigator.appName == "Microsoft Internet Explorer") && (parseInt(navigator.appVersion) >= 4)) {
var url="http://www.ahajokes.com";
var title="Aha! Jokes";
document.write('<A HREF="javascript:window.ext');
document.write('ernal.AddFavorite(url,title);" ');
document.write('onMouseOver=" window.status=');
document.write("'Add our site to your favorites!'; return true ");
document.write('"onMouseOut=" window.status=');
document.write("' '; return true ");
document.write('">Add our site to your favorites!</a>');
}
else {
var msg = "Don't forget to bookmark this ";
if(navigator.appName == "Netscape") msg += "site! (CTRL+D)";
document.write(msg); }
</script>
<noscript>Add our site to your favorites!</noscript><br>
<img src="g/black.gif" width=105 height=1 vspace=1>
<a href="startpage.html">Make this site<br>your start page!</a><br><br>
</b></font></td><td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
<tr><td colspan=3 bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
</table><br>
</font></td>
<td height=400 width=468 bgcolor="#D3D3D3" valign=top>
<table align=center border=0 cellpadding=0 cellspacing=0 width=450><tr><td><font face="times new roman,helvetica">
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#990099"><font color="#FFFFFF"><b>Latest Updates and Announcements!</b></font></td>
</tr>
</table>
<br>
<ul>
<li>5:01 PM 6/19/2005: <a href="holiday_jokes.html">Holiday Humor</a>: <a href="fathers_day_jokes.html">Father's Day Jokes</a>
<li>2:48 PM 3/3/2005: <a href="funny_cartoons.html">Funny Cartoons</a>: <a href="crt470.html">In Case of Emergency</a>
<li>2:43 PM 3/3/2005: <a href="funny_cartoons.html">Funny Cartoons</a>: <a href="crt102.html">The Amazing Bill</a>
<li>2:40 PM 3/3/2005: <a href="funny_cartoons.html">Funny Cartoons</a>: <a href="crt599.html">Always Stay in Stock</a>
<li>6:26 PM 3/2/2005: <a href="funny_cartoons.html">Funny Cartoons</a>: <a href="crt254.html">Throw More Than Snow</a>
<li>6:25 PM 3/2/2005: <a href="funny_cartoons.html">Funny Cartoons</a>: <a href="crt371.html">Your Tax Dollars</a>
<li>5:38 PM 3/2/2005: <a href="funny_cartoons.html">Funny Cartoons</a>: <a href="crt249.html">Alien Brooklyn Landing</a>
<li><b><a href="whatnew.html">See recent updates, changes, and new announcements</a>...</b>
</ul>
</font>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#CC0000"><font color="#FFFFFF"><b>Search for Jokes by Category</b></font></td>
</tr>
</table>
<br>
[b]<table align=center border=0 width=444>
<tr>
<td width=148 valign=top><b><a href="animal_jokes.html">Animals</a></b></td>
<td width=148 valign=top><b><a href="funny_answering_machine_messages.html">Answer Machine</a></b></td>
<td width=148 valign=top rowspan=6><img src="g/small.gif" alt="Jokes by Category" width=114 height=137></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="aviation_jokes.html">Aviation</a></b></td>
<td width=148 valign=top><b><a href="bar_jokes.html">Bar Jokes</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="blind_jokes.html">Blind Jokes</a></b></td>
<td width=148 valign=top><b><a href="blonde_jokes.html">Blonde Jokes</a></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="business_jokes.html">Business</a></b></td>
<td width=148 valign=top><b><a href="computer_jokes.html">Computers</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="crazy_jokes.html">Crazy Jokes</a></b></td>
<td width=148 valign=top><b><a href="dumb_laws.html">Dumb Laws</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="english_jokes.html">English</a></b></td>
<td width=148 valign=top><b><a href="funny_bumper_stickers.html">Car Bumpers</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="ethnic_jokes.html">Ethnic Jokes</a></b></td>
<td width=148 valign=top><b><a href="farmer_jokes.html">Farmers</a></b></td>
<td width=148 valign=top><b><a href="fishing_jokes.html">Fishing</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="food_jokes.html">Food Jokes</a></b></td>
<td width=148 valign=top><b><a href="funny_ads.html">Funny Ads</a></b></td>
<td width=148 valign=top><b><a href="funny_guides.html">Funny Guides</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="funny_insults.html">Insults</a></b></td>
<td width=148 valign=top><b><a href="funny_puns.html">Funny Puns</a></b></td>
<td width=148 valign=top><b><a href="funny_tests.html">Funny Tests</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="funny_true_stories.html">True Stories</a></b></td>
<td width=148 valign=top><b><a href="funny_thoughts.html">Thoughts</a></b></td>
<td width=148 valign=top><b><a href="gender_jokes.html">Gender</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="golf_jokes.html">Golf Jokes</a></b></td>
<td width=148 valign=top><b><a href="heaven_jokes.html">Heaven, Hell</a></b></td>
<td width=148 valign=top><b><a href="holiday_jokes.html">Holidays</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="idiot_jokes.html">Idiots</a></b></td>
<td width=148 valign=top><b><a href="indian_jokes.html">Indians</a></b></td>
<td width=148 valign=top><b><a href="kids_jokes.html">Kids Jokes</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="military_jokes.html">Military</a></b></td>
<td width=148 valign=top><b><a href="lawyer_jokes.html">Lawyers</a></b></td>
<td width=148 valign=top><b><a href="light_bulb_jokes.html">Light Bulbs</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="math_jokes.html">Math Jokes</a></b></td>
<td width=148 valign=top><b><a href="marriage_jokes.html">Marriage</a></b></td>
<td width=148 valign=top><b><a href="medical_jokes.html">Medical</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="music_jokes.html">Music</a></b></td>
<td width=148 valign=top><b><a href="office_jokes.html">Office Jokes</a></b></td>
<td width=148 valign=top><b><a href="old_age_jokes.html">Old Age</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="funny_one_liners.html">One Liners</a></b></td>
<td width=148 valign=top><b><a href="parent_jokes.html">Parent</a></b></td>
<td width=148 valign=top><b><a href="police_jokes.html">Police Jokes</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="political_jokes.html">Political</a></b></td>
<td width=148 valign=top><b><a href="redneck_jokes.html">Redneck</a></b></td>
<td width=148 valign=top><b><a href="religious_jokes.html">Religious</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="school_jokes.html">School</a></b></td>
<td width=148 valign=top><b><a href="science_jokes.html">Science</a></b></td>
<td width=148 valign=top><b><a href="shopping_jokes.html">Shopping</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="sick_jokes.html">Sick Jokes</a></b></td>
<td width=148 valign=top><b><a href="sports_jokes.html">Sports</a></b></td>
<td width=148 valign=top><b><a href="state_jokes.html">State Jokes</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="top_ten_lists.html">Top Lists</a></b></td>
<td width=148 valign=top><b><a href="travel_jokes.html">Travel Jokes</a></b></td>
<td width=148 valign=top><b><a href="yo_mama_jokes.html">Yo Mama</a></b></td>
</tr>
</table>[/b]
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#009999"><font color="#FFFFFF"><b>Funny Features for Today!</b></font></td>
</tr>
</table>
<br>
<table border=0 width="100%" cellspacing=0 cellpadding=0>
<tr>
<td><img src="g/small3.gif" width=129 height=108 alt="Funny features for today" hspace=4></td>
<td>
<b>Funny Joke:</b> View the <a href="joke_of_the_day.shtml">Joke of the Day</a><br>
<b>Cartoon:</b> <a href="funny_picture_of_the_day.shtml">Funny Picture of the Day</a><br>
<b>Fun download:</b> <a href="fun_download_of_the_day.shtml">Fun Download of the Day</a><br>
<b>Funny audio:</b> <a href="audio_file_of_the_day.shtml">Funny Audio of the Day</a><br>
<b>Funny video:</b> <a href="funny_video_of_the_day.shtml">Funny Video of the Day</a><br>
</td>
</tr></table>
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#000099"><font color="#FFFFFF"><b>Want Other Types of Humor?</b></font></td>
</tr>
</table>
<br>
<table align=center border=0 width=440>
<tr>
<td width=148 valign=top><b><a href="funny_videos.html">Funny Videos</a></b></td>
<td width=148 valign=top><b><a href="fun_downloads.html">Fun Downloads</a></b></td>
<td width=148 valign=top><b><a href="funny_audio.html">Funny Audio</a></b></td>
</tr>
</table>
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#007700"><font color="#FFFFFF"><b>Search for Funny Pictures by Category</b></font></td>
</tr>
</table>
<br>
<table align=center border=0 width="100%">
<tr>
<td>
<li><a href="funny_animal_pictures.html">Funny Animal Pictures</a><BR>
<li><a href="funny_business_pictures.html">Business, Money, and Work</a><BR>
<li><a href="funny_computer_cartoons.html">Funny Computer Cartoons</a><BR>
<li><a href="funny_food_pictures.html">Food, Eating, and Dining</a><BR>
<li><a href="funpics05.html">Parents, Babies, and Kids</a><BR>
<li><a href="funpics06.html">Battle of the Sexes, Gender</a><BR>
<li><a href="funpics07.html">Miscellaneous Funny Pictures</a><BR>
<li><a href="funpics08.html">General Public and People</a><BR>
<li><a href="funpics09.html">Politics and the Government</a><BR>
<li><a href="funpics10.html">Holidays, Seasonal, Travel</a><BR>
<li><a href="funpics11.html">Sports, Fitness, Football</a><BR>
<li><a href="funpics12.html">Travel, Vehicles, and Autos</a><BR>
<li><a href="funny_faces.html">Funny Faces</a>, Strange, Morph<BR>
<li>View all of the <a href="funny_cartoons.html">funny cartoons</a>!<BR>
</td>
<td>
<center><img src="g/small9.gif" width=156 height=186 alt="Funny Pictures" vspace=5 hspace=10></center>
</td>
</tr>
</table>
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#000066"><font color="#FFFFFF"><b>Search for Jokes by Keyword</b></font></td>
</tr>
</table>
<br>
</font>
<CENTER><TABLE BORDER="0" CELLPADDING="1" CELLSPACING="0" BGCOLOR="#aaaaaa"><TR><FORM METHOD="POST" ACTION="search/search.pl"><TD><TABLE BORDER="0" CELLPADDING="12" CELLSPACING="0" BGCOLOR="#ffffcc">
<TR><TD ALIGN="center" VALIGN="middle">Match <SELECT NAME="Match"><OPTION VALUE=1>All Terms<OPTION VALUE=0 SELECTED>Any Term</SELECT> in Search Index: <SELECT NAME="Realm"><OPTION VALUE="All">[ All ]</SELECT><BR><INPUT NAME="Terms" SIZE="42" VALUE="" STYLE="font-family: Courier New; font-size:10pt;"><INPUT TYPE="submit" VALUE=" Search " WIDTH="120"></TD>
</TR></TABLE></TD></FORM></TR></TABLE><BR></CENTER>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#770077"><font color="#FFFFFF"><b>More Fun and Humor!</b></font></td>
</tr>
</table>
<br>
<table align=center border=0 width=440>
<tr>
<td width=146 valign=top><b><a href="link.html">Link to Aha!</a></b></td>
<td width=146 valign=top rowspan=5><br><img src="g/small2.gif" width=101 height=92 alt="More fun and humor" vspace=5></td>
<td width=148 valign=top><b><a href="weblib.html">Online Weblibs</a></b></td>
</tr>
<tr>
<td width=146 valign=top><b><a href="random_jokes.html">Random Humor</a></b></td>
<td width=146 valign=top><b><a href="survey.html">Survey Aha!</a></b></td>
</tr>
<tr>
<td width=148 valign=top><b><a href="funny_links.html">Funny Links</a> <sup><font size="-1" color="#FF0000" face="verdana">New!</font></sup></b></td>
<td valign=top><b><a href="archive.html">Laughing Gas</a></b></td>
</tr>
<tr>
<td valign=top><b><a href="shop.html">Shop for comedy</a></b></td>
<td width=146 valign=top><b><a href="tell.html">Tell a Friend!</a></b></td>
</tr>
<tr>
<td width=146 valign=top><b><a href="about.html">About Aha!</a></b></td>
<td valign=top><b><a href="fun_pages.html">Fun Pages</a> <sup><font size="-1" color="#FF0000" face="verdana">New!</font></sup></b></td>
</tr>
</table>
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#999900"><font color="#FFFFFF"><b>Voice Your Opinion!</b></font>
</td>
</tr>
</table>
<br>
<a name="#poll"></a>
<table align=center border=0 width=440>
<tr><td colspan=3><b>What would you rather give up in life?</b></td></tr>
<tr>
<form action=http://www.ahajokes.com/vote/vote.pl method=get>
<input type="hidden" name="name" value="first">
<td width=146 valign=top><input type="radio" name="a" value="1">Computers<br></b></td>
<td width=146 valign=top><input type="radio" name="a" value="2">Showers<br></b></td>
<td width=148 valign=top><input type="radio" name="a" value="3">Television<br></b></td>
</tr>
<tr>
<td width=146 valign=top><input type="radio" name="a" value="4">Sports<br></b></td>
<td width=146 valign=top><input type="radio" name="a" value="5">Telephone<br></b></td>
<td width=148 valign=top><input type="radio" name="a" value="6">Driving<br></b></td>
</tr>
<tr><td colspan=3><center><input type=submit value="Cast your vote!"> <font size="-1"><a href=http://www.ahajokes.com/vote/vote.pl?name=first&action=view>View Results</a></font></center></td>
</form>
</tr>
</table>
<br>
<table align=center border=0 cellspacing=1 cellpadding=2 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#009999"><font color="#FFFFFF"><b>Other Humor Sites...</b></font>  </td>
</tr>
</table>
<br>
<table border=0 width="98%">
<tr>
<td><li><a href="http://www.asianjoke.com/" target="_blank">Asian Joke</a></td>
<td><li><a href="http://www.getfunnypictures.com/" target="_blank">Funny Pictures</a></td>
<td><li><a href="http://www.wowfunny.com/" target="_blank">Wow Funny</a></td>
</tr>
<tr>
<td><li><a href="http://www.veryfunnyvideos.com/" target="_blank">Funny Videos</a></td>
<td><li><a href="http://www.randomlaughs.com/" target="_blank">Random Jokes</a></td>
<td><li><a href="http://www.jokeswarehouse.com/" target="_blank">Jokes Warehouse</a></td>
</tr>
<tr>
<td><li><a href="http://www.funny-spot.com/" target="_blank">Funny Spot</a></td>
<td><li><a href="http://www.lifeisajoke.com/" target="_blank">Life is a Joke</a></td>
<td><li><a href="http://www.spicyjoke.com/" target="_blank">Spicy Joke</a></td>
</tr>
</table><br>
</td></tr></table></td></tr>
<tr><td bgcolor="#000000" colspan=2><img src="g/clr.gif" height=1 width=600 alt=""></td></tr>
<tr><td valign=bottom align=left><img src="g/bleft.gif" height=10 width=10 alt=""></td><td valign=bottom align=right><img src="g/bright.gif" height=10 width=10 alt=""></td></tr>
<tr><td align=center bgcolor="#FFFFFF" colspan=2><small><font face="times new roman,helvetica">Copyright © 2005. Reproduction of this site in part or whole is strictly prohibited. Use subject to <a href="terms.html">terms</a>. [ <a href="corporate.html">Corporate Center<a> | <a href="tell.html">Tell a Friend About Our Site</a> | <a href="contact.html">Contact Us</a> ]</FONT></SMALL></td></tr>
</table>
</body></html>

The part I have highlighted bold is what I am interested in, as this part contains the hyperlinks to a list of jokes for each category. I need to obtain the href of each of those links in the table, and save them for later use, preferably saving them to an array.

 

The next step in this somewhat complex idea is to visit each of those pages, obtain the link to each joke on that page, and save it in an array for later use. I will be using a while loop for this (while we are not at the end of the links, save each link).The source to each of those pages (I just picked the first one) is this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Animal Jokes, Pet Jokes, Creature Jokes, and More!</title>
<meta name="description" content="Animal Jokes, pet jokes, and creature jokes about wildlife, dogs, cats, and much more!">
<meta name="keywords" content="Animal Jokes,pets,creatures,dogs,cats,mice,tigers,chickens,frogs,birds">
</head>
<body text="#000000" link="#0000FF" vlink="#0000FF" bgcolor="#FFFFFF">
<!-- Animal Jokes -->
<table border=0 bgcolor="#000000" align=center width=600><tr><td>
<table align=center bgcolor="#000000" border=0 cellpadding=2 cellspacing=3 width=600>
<tr bgcolor="#FFFFAA">
<td width=123 valign=top><br><br><br><br><img src="g/brief2.gif" width=117 height=64 alt="Animal Jokes"></td>
<td bgcolor="#D3D3D3"><center><font face="times new roman,helvetica"><h1>Animal Jokes<hr size=1></h1></center>Animal Jokes and humor about pets, creatures, dogs, cats, mice, frogs, tigers, wildlife, and much more! For more animal jokes, also check out the <a href="http://www.ahajokes.com/animal_jokes_for_kids.html">animal jokes for kids</a> section of Aha! Jokes!</td></tr>
</table>
</td></tr></table>
<br>
<table align=center bgcolor="#006699" border=0 cellpadding=0 cellspacing=0 width=600>
<tr>
<td><a href="http://www.ahajokes.com/"><img src="g/smlogo.gif" height=64 width=132 alt="Jokes" border=0></a></td>
<td bgcolor="#FFFFFF" valign=bottom>
<CENTER>
<a href="http://www.needmypassword.com">
<object width="468" height="60">
<param name="movie" value="online-password-storage.swf">
<embed src="online-password-storage.swf" width="468" height="60">
</embed>
</object></a>
Never Forget Your Passwords Again!  <a href="http://www.needmypassword.com">Register for Free</a>
</CENTER></td>

</tr>
<tr><td bgcolor="#000000" colspan=2> <font color="#FFFFFF" face="arial,helvetica" size=2><b>Location: <a href="http://www.ahajokes.com/clean_jokes.html"><font color="#FFFFFF">Clean Jokes</font></a> > Animal Jokes</b></font></td>
<tr><td valign=top width=132><font face="times new roman,helvetica">
<br>
<table align=center width="120" bgcolor="#FFFF00" border="0" cellpadding="0" cellspacing="0">
<tr><td align=center colspan=3 bgcolor="#000000"><font face="arial,helvetica" size="3" color="#FFFFFF"><b>Mailing List</b></font></td></tr>
<tr><td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td>
<td valign="top" align=center width=118><font face="arial" size="2"><b>Enter your E-MAIL address BELOW for FREE daily JOKES by E-MAIL!</b></font><form method=post action="http://www.ahajokes.com/mlm/mlm.cgi"><input type="text" name="address" size="12"><input type=hidden name=action value="Subscribe"><br><input type="submit" value="Subscribe!"><br><br></td></form>
<td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
<tr><td colspan=3 bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
</table><br>
<table align=center width="120" bgcolor="#BBCCFF" border="0" cellpadding="0" cellspacing="0">
<tr><td align=center colspan=3 bgcolor="#000000"><font face="arial" size="2" color="#FFFFFF"><b>Laugh Links</b></font></td></tr>
<tr><td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td>
<td valign="top" width=118>
<font face="arial,helvetica" size="-1">
<b>
- <a href="http://www.ahajokes.com/funny_jokes.html" style="text-decoration:none">Funny Jokes</a><br>
- <a href="http://www.ahajokes.com/funny_cartoons.html" style="text-decoration:none">Funny Cartoons</a><br>
- <a href="http://www.ahajokes.com/random_jokes.html" style="text-decoration:none">Random Jokes</a><br>
- <a href="http://www.ahajokes.com/fun_pages.html" style="text-decoration:none">Fun Pages</a><br>
- <a href="http://www.ahajokes.com/funny_videos.html" style="text-decoration:none">Funny Videos</a><br>
- <a href="http://www.ahajokes.com/funny_audio.html" style="text-decoration:none">Funny Audio</a><br>
- <a href="http://www.ahajokes.com/fun_downloads.html" style="text-decoration:none">Fun Downloads</a><br>
- <a href="http://www.ahajokes.com/funny_links.html" style="text-decoration:none">Funny Links</a><br>
> Featured Today<br>
- <a href="whatnew.html" style="text-decoration:none">What's new?</a><br>
- <a href="http://www.ahajokes.com/joke_of_the_day.shtml" style="text-decoration:none">Joke of the Day</a><br>
- <a href="http://www.ahajokes.com/funny_pic_of_the_day.shtml" style="text-decoration:none">Funny Pic of Day</a><br>
> Other Options<br>
- <a href="contact.html" style="text-decoration:none">Contact us</a><br>
- <a href="link.html" style="text-decoration:none">Link to us</a><br>
- <a href="http://www.ahajokes.com/submit_a_joke.html" style="text-decoration:none">Submit a Joke</a><br>
</b>
</font>
</td>
<td width="1" bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
<tr><td colspan=3 bgcolor="#000000"><img src="g/clr.gif" width="1" height="1" alt=""></td></tr>
</table><br>
</font></td>
<td height=400 width=468 bgcolor="#D3D3D3" valign=top>
<table align=center border=0 cellpadding=0 cellspacing=0 width=450><tr><td><br><font face="times new roman,helvetica">

<b><font face="arial,helvetica" size="-1"><center>Joke Links: [ <a href="http://www.ahajokes.com/funny_ads.html">Funny Ads</a> | <a href="http://www.ahajokes.com/">Jokes</a> | <a href="#funny_animal_pictures">Funny Animal Pictures</a> ]</center></font></b><br>
<center><b><font color="#990099">Each and every day, click to read the <a href="animal_joke_of_the_day.shtml">animal joke of the day</a>!</font></b></center><br>
<table align=center border=0 cellspacing=1 cellpadding=3 width="98%" BGCOLOR="#000000">
<tr>
<td bgcolor="#990000"><font color="#FFFFFF"><b>Here are Animal Jokes ... Click on the joke's link to read it ...</b></font></td>
</tr>
</table>
<br>
[b]<table border=0 align=center width="99%">
<tr><td valign=top>
<li><a href="ani001.html">Question and Answer</a>
<li><a href="ani002.html">Buying a New Bird</a>
<li><a href="ani003.html">Boasting about Races</a>
<li><a href="ani004.html">Frog Calls a Psychic</a>
<li><a href="ani005.html">Chickens with Books</a>
<li><a href="ani006.html">Tiger Walking Guide</a>
<li><a href="ani007.html">Snail with a Fast Car</a>
<li><a href="ani008.html">Very Insulting Parrot</a>
<li><a href="ani009.html">Pet's Guide to Life</a>
<li><a href="ani010.html">Dog Who Plays Chess</a>
<li><a href="ani011.html">Three Tough Mice</a>
<li><a href="ani012.html">Steven Wright Dogs</a>
<li><a href="ani013.html">The Mad Cow Disease</a>
<li><a href="ani014.html">Where Bats Get Blood</a>
<li><a href="ani015.html">Don't Talk to Parrots</a>
<li><a href="ani016.html">The Seeing Eye Dog</a>
<li><a href="ani017.html">Feeding Pigs Faster</a>
<li><a href="ani018.html">A Horrible Dog Fight</a>
<li><a href="ani019.html">A Very Smart Dog</a>
<li><a href="ani020.html">Animal Football Game</a>
<li><a href="ani021.html">The Cat's Dictionary</a>
<li><a href="ani022.html">Two Fools Flying</a>
<li><a href="ani023.html">A Dog Who Can Fly</a>
<li><a href="ani024.html">The Plumber is Here</a>
<li><a href="ani025.html">Cat's Chalkboard</a>
<li><a href="ani026.html">A Preacher's Parrot</a>
<li><a href="ani027.html">A Burglar's Problems</a>
<li><a href="ani028.html">A Smart Talking Dog</a>
<li><a href="ani029.html">Handling Baby Bears</a>
<li><a href="ani030.html">A Human's Chalkboard</a>
<li><a href="ani031.html">Instrument Flying Guide</a>
<li><a href="ani032.html">Two Roaches Talking</a>
<li><a href="ani033.html">Incredibly Smart Dog</a>
</td>
<td valign=top>
<img src="g/parrot.gif" width=124 height=131 alt="Animal Jokes" vspace=16>
<li><a href="ani034.html">The Dog's Chalkboard</a>
<li><a href="ani035.html">I Think I'm a Chicken</a>
<li><a href="ani036.html">Bad Dog for the Blind</a>
<li><a href="ani037.html">Cat Technical Support</a>
<li><a href="ani038.html">Terrible Car Accident</a>
<li><a href="ani039.html">Question and Answer</a>
<li><a href="ani040.html">Question and Answer</a>
<li><a href="ani041.html">Question and Answer</a>
<li><a href="ani042.html">Question and Answer</a>
<li><a href="ani043.html">Question and Answer</a>
<li><a href="ani044.html">Cat on the Internet</a>
<li><a href="ani045.html">Dogs Don't Understand</a>
<li><a href="ani046.html">Buy Alligator Shoes</a>
<li><a href="ani047.html">Cow on Train Tracks</a>
<li><a href="ani048.html">Two Angry Neighbors</a>
<li><a href="ani049.html">I'm at the Worst Zoo</a>
<li><a href="ani050.html">Sounds of the Wild</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani051.html">Giving Cats Pills</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani052.html">Feline Physics Laws</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani053.html">The Feline Diet</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani054.html">Dogs and Light Bulbs</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani055.html">Dog Property Rules</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani056.html">Horses at the Race</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
<li><a href="ani057.html">Installing a Carpet</a> <sup><font color="#FF0000" size="-1">New!</font></sup>
</td>
</tr>
</table>[/b]
<br>
<table align=center border=0 cellspacing=1 cellpadding=3 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#007700"><font color="#FFFFFF"><b>Search for Jokes by Keyword</b></font></td>
</tr>
</table>
<br>
<CENTER><TABLE BORDER="0" CELLPADDING="3" CELLSPACING="1" BGCOLOR="#000000" width="98%"><TR><FORM METHOD="POST" ACTION="search/search.pl"><TD BGCOLOR="#FFFFAA" height=70><CENTER>
<b><font face="verdana,arial" size="-1">I Want to Match <SELECT NAME="Match"><OPTION VALUE=1>All Terms<OPTION VALUE=0 SELECTED>Any Term</SELECT> in Search Index:</font> <SELECT NAME="Realm"><OPTION VALUE="All">All</SELECT><BR><INPUT NAME="Terms" SIZE="40" VALUE="" STYLE="font-family: Courier New; font-size:10pt;"> <INPUT TYPE="submit" VALUE=" Search " WIDTH="120">
</CENTER></TD></FORM></TR></TABLE><a name="#funny_animal_pictures"><BR></CENTER>
<table align=center border=0 cellspacing=1 cellpadding=3 width="98%" bgcolor="#000000">
<tr>
<td bgcolor="#772277"><font color="#FFFFFF"><b>Animal Cartoons and Funny Pictures!</b></font></td>
</tr>
</table>
<br>
<table border=0><tr><td>
<ul>
<li><a href="crt039.html">What an amazing dog</a>
<li><a href="crt068.html">A mouse is in trouble</a>
<li><a href="crt248.html">Dog with sunglasses</a>
<li><a href="crt247.html">The dog goes diving</a>
<li><a href="crt023.html">No feeding the bears</a>
<li><a href="crt212.html">Cat got the mouse</a>
<li><a href="crt018.html">Mice are exploring here</a>
<li><a href="crt167.html">Tough dog on his bike</a>
<li><a href="crt155.html">Walking many dogs</a>
<li><a href="crt153.html">The dog is a gangster</a>
<li><a href="crt287.html">When you shave a cat</a>
<li><a href="crt281.html">Frogs in a classroom</a>
<li><a href="crt277.html">Fish drinking water</a>
<li><a href="crt125.html">Squirrel is eating nuts</a>
<li><a href="crt122.html">Turtle is on a journey</a>
<li><a href="crt100.html">Counterfeiting animals</a>
<li><a href="crt079.html">The cat caught the mouse</a>
<li><a href="crt076.html">Observe the huge cat</a>
<li><a href="crt063.html">This cat needs coffee</a>
<li><a href="crt045.html">More mouse revenge</a>
</ul>
</td>
<td>
<img src="g/cartoon.gif" width=130 height=122 alt="Related cartoons" hspace=50 vspace=15><br>
<ul>
<li><a href="crt036.html">Revenge of the mouse</a>
<li><a href="crt043.html">The chocolate rabbits</a>
<li><a href="crt006.html">This squirrel likes beer</a>
<li><a href="crt128.html">The Jail Bird</a>
<li><a href="crt008.html">The revenge of Rudolph</a>
<li><a href="crt252.html">Cat is an alcoholic?</a>
<li><a href="crt254.html">Dog is eating a cat</a>
<li><a href="crt243.html">Watch out for deer</a>
<li><a href="crt241.html">Dog has a death wish</a>
<li><a href="crt211.html">The well-trained cat</a>
<li><a href="crt208.html">A kangaroo problem</a>
<li><a href="crt189.html">Elephant is in trouble</a>
</ul>
</td>
</tr></table>
<table align=center border=0 cellspacing=0 cellpadding=2 width="98%">
<tr>
<td bgcolor="#000066"><font color="#FFFFFF"><b>Also see</b></font></td>
</tr>
</table>
<br>
<ul>
<li>Aha! <a href="http://www.ahajokes.com/">Jokes</a> > <a href="http://www.ahajokes.com/random_animal_jokes.shtml">Random Animal Jokes</a>
<li>Aha! <a href="http://www.ahajokes.com/">Jokes</a> > <a href="http://www.ahajokes.com/animal_jokes_for_kids.html">Animal Jokes for Kids</a><br>
</ul>
</font></td></tr></table></td></tr>
<tr><td bgcolor="#000000" colspan=2><img src="g/clr.gif" height=1 width=600 alt=""></td></tr>
<tr><td valign=bottom align=left><img src="g/bleft.gif" height=10 width=10 alt=""></td><td valign=bottom align=right><img src="g/bright.gif" height=10 width=10 alt=""></td></tr>
<tr><td align=center bgcolor="#FFFFFF" colspan=2><small><font face="times new roman,helvetica">Copyright © 2005. Reproduction of this site in part or whole is strictly prohibited. Use subject to <a href="terms.html">terms</a>.<br>[ <b><a href="http://www.ahajokes.com/">Jokes</a></b> | <a href="corporate.html">Corporate Center<a> | <a href="http://www.getfunnypictures.com/" target="_blank">Funny Pictures</a> ]</FONT></SMALL></td></tr>
</table>
</body></html>

Again, the part which I have highlighted in that page source is what I am interested in. It is a table with links leading to each joke page. I need to obtain the href adress for each of these pages and save them to an array for later use. Then continue with the aforementioned loop until we have an array of an address to every page.

 

After those two steps I think I could figure the last step out on my own. If anyone could point me in the right direction for starting on this project (or even tell me if is possible) it would be greatly appreciated. My concern is:

-Once I have the websource in a string, I know I could save every href address in that table. My question is, how do I find that table, how do I know it is the right table at all (It could be a different one!)

 

Also if anyone could point me in the direction for obtaining the page source for a link on the internet to a string using php, it would be greatly appreciated.

 

Thankyou for reading!!  :D

Link to comment
Share on other sites

I know this sounds silly and I don't mean this as a smart remark, but why not just ask for the jokes? I had a challenge like this once and the solution was simply to ask for the database. It looks almost like everything has an html page, but that can be deceiving sometimes. Not totally sure on the answer to your question other than some pain. I wish you luck.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.