Jump to content

Only getting entries which contain a certain text, ignoring others


Recommended Posts

I'm building a service where I need to retrieve and parse information from another page on the internet.  It's a page from Bungie.net's Halo 3 fileshare.  Here's mine:

 

http://www.bungie.net/stats/halo3/fileshare.aspx?gamertag=HaLo2FrEeEk

 

I need to get all the film and film clip entries off the page.  There are only a few identifying features for each slot that the contents are a film or film clip is the image, the url of which is either:

 

.../images/halo3stats/fileshareiconssm/filmclips/sm/...

 

or:

 

.../images/halo3stats/fileshareiconssm/films/sm/...

 

Or the Film Length field.  Now, I need to pull out the title, film length, and the h3fileid from every slot that is a film, but ONLY if it's a film.  Is there a way I can do this efficiently?

 

What I have in mind now is sorta hard on the server.  The regex parses the contents and gets a list of all the h3fileid's for each film slot (using the image url), then goes through and recursively retrieves the code from the page for each of those files and gets it's title and film length.  It's a lot of work for the server, I fear, and it will probably take too long.  Since the information I need is all on the actual fileshare page, is there a way I can just get title, film length, and h3fileid for each film and filmclip item, and only those items?

 

Thank you in advance for any help.

Sorry for the double post, I would have edited, but it wouldn't let me.  Here was the edit:

 

Actually, now that I think about it, I only need the h3fileid and title of each item, I'll get the film length whent he user selects which film he wants.  So can I just get the title and h3fileid for only film and filmclip items?

 

Here is an example of a NON film/filmclip item:

 

<div class="slotWrap"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_messageBoxPanelPanel">
<div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_messageBoxPanel">

</div>
</div>
<div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_outerShell" class="user_content_mini_outer_shell">


<div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_screenshotBoxPanelPanel">
	<div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_screenshotBoxPanel" class="user_content_mini_box">

	<div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_ajaxLoadingPanel" style="display:none;">

		<img src="/images/ajax-loading-horizontal.gif" alt="Loading..." width="30px" height="13px" style="text-align: center;" />

		</div>
	<div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_fsItemPanelPanel">
			<div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_fsItemPanel" class="user_content_mini_box_inner">

	    <div class="shareTitle">
		    <ul class="infoA">
			    <li class="float_right">0%</li>
			    <li><h3><a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_titleLink" href="/Online/Halo3UserContentDetails.aspx?h3fileid=61127153" target="primaryWindow">Valhalla X-Mas</a></h3></li>
			    <li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_fileFlagsListItem" class="float_right">
				    <a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_hlFavIcon" href="/Online/Halo3UserContentDetails.aspx?h3fileid=61127153" target="primaryWindow"></a>
				    <a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_hlFileSetIcon" href="/Online/Halo3UserContentDetails.aspx?h3fileid=61127153" target="primaryWindow"><img id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_set_flag_image" src="/images/halo3stats/fileshareicons/linkedfile_icon.gif" alt="Part of File Set" style="height:16px;width:16px;border-width:0px;" /></a>
			    </li>
			    <li>Created 12.22.2008 by <a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_authorLink" href="/Stats/Halo3/Default.aspx?player=HaLo2FrEeEk++++" target="primaryWindow">HaLo2FrEeEk    </a></li>
		    </ul>
		</div>
		<div class="share-mid">
		<a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_thumbnailLink" class="relative_image_container" href="/Online/Halo3UserContentDetails.aspx?h3fileid=61127153" target="primaryWindow"><img id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_fileThumbnail" class="not_screenshot_pic" src="/images/halo3stats/fileshareiconssm/maps/sm/valhalla.gif" style="border-width:0px;" /></a>


		<div class="shareCommon">
		    <ul class="infoC">
			    <li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_xboxDownload_listitem"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_xboxDownloadButtonPanel">
					<a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_xboxDownloadButton" href="javascript:__doPostBack('ctl00$mainContent$shareRepeater$ctl00$fileshareitem$xboxDownloadButton','')">Download to Halo 3</a>
				</div></li><li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_dl_listitem">22 downloads</li><li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_map_listitem">Map: Valhalla</li>
			    
		    </ul>
		</div>
		</div>
		<div class="clear"></div>
		<!--[if IE]><div class="IE_description_fix"><![endif]-->
     		 <div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_bottomArea" class="description">
					   
			 The crew of V-398 barely survived their unplanned landing in...	    		
	    
				</div>
	    <!--[if IE]></div><![endif]-->
		<div class="ssMoreDetails"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_moreDetailsLinkPanel">
					<a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_moreDetailsLink" href="/Online/Halo3UserContentDetails.aspx?h3fileid=61127153" target="primaryWindow">more details</a>
				</div></div>


			</div>
		</div>


	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_moreDetailsLinkPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_reportSpamLinkPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_reportResultsLabelPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_saveGalleryLinkButtonPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_fsItemPanelPanel">

	</div>
</div>	


</div>
<div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_empty_fsItemPanelPanel">

</div>
	<div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_unpinned_ssPanelPanel">

</div>
<div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_actionBarPanel">
<div id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_actionBar" class="bottom_bar">

		<ul class="links">
			<li class="slotNum">Slot : 1 </li>

			<li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_copyListItem"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_shareCopyButtonPanel">
		<a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_shareCopyButton" href="javascript:__doPostBack('ctl00$mainContent$shareRepeater$ctl00$fileshareitem$shareCopyButton','')">copy to my share</a>
	</div></li>
			<li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_removeListItem"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_removeButtonPanel">
		<a onclick="return confirm('Are you sure you wish to remove this item?');" id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_removeButton" href="javascript:__doPostBack('ctl00$mainContent$shareRepeater$ctl00$fileshareitem$removeButton','')">delete</a>
	</div></li>
			<li id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_groupListItem"><a id="ctl00_mainContent_shareRepeater_ctl00_fileshareitem_groupButton" onclick="return openFileSetAddWindow(61127153,0,'ctl00_topLevelControls_fileSetWindow');" href="../StatControls/#">add to file set</a></li>
		</ul>			

</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_shareCopyButtonPanel">

</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_trophyLinkButtonPanel">

</div>
</div>
	<div id="ctl00_ctl00_mainContent_shareRepeater_ctl00_fileshareitem_adminInfoPanel">

</div>
</div>

 

And here is one that is a filmclip:

 

<div class="slotWrap"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_messageBoxPanelPanel">
<div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_messageBoxPanel">

</div>
</div>
<div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_outerShell" class="user_content_mini_outer_shell">


<div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_screenshotBoxPanelPanel">
	<div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_screenshotBoxPanel" class="user_content_mini_box">

	<div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_ajaxLoadingPanel" style="display:none;">

		<img src="/images/ajax-loading-horizontal.gif" alt="Loading..." width="30px" height="13px" style="text-align: center;" />

		</div>
	<div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_fsItemPanelPanel">
			<div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_fsItemPanel" class="user_content_mini_box_inner">

	    <div class="shareTitle">
		    <ul class="infoA">
			    <li class="float_right">0%</li>
			    <li><h3><a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_titleLink" href="/Online/Halo3UserContentDetails.aspx?h3fileid=21305912" target="primaryWindow">flare</a></h3></li>
			    
			    <li>Created 11.17.2007 by <a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_authorLink" href="/Stats/Halo3/Default.aspx?player=Chewyy+++++++++" target="primaryWindow">Chewyy         </a></li>
		    </ul>
		</div>
		<div class="share-mid">
		<a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_thumbnailLink" class="relative_image_container" href="/Online/Halo3UserContentDetails.aspx?h3fileid=21305912" target="primaryWindow"><img id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_fileThumbnail" class="not_screenshot_pic" src="/images/halo3stats/fileshareiconssm/filmclips/sm/construct.gif" style="border-width:0px;" /></a>


		<div class="shareCommon">
		    <ul class="infoC">
			    <li id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_xboxDownload_listitem"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_xboxDownloadButtonPanel">
					<a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_xboxDownloadButton" href="javascript:__doPostBack('ctl00$mainContent$shareRepeater$ctl01$fileshareitem$xboxDownloadButton','')">Download to Halo 3</a>
				</div></li><li id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_dl_listitem">0 downloads</li><li id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_length_listitem">Film Length: 00:00:05</li>
			    
		    </ul>
		</div>
		</div>
		<div class="clear"></div>
		<!--[if IE]><div class="IE_description_fix"><![endif]-->
     		 <div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_bottomArea" class="description">
					   
			 suicide by flare	    		
	    
				</div>
	    <!--[if IE]></div><![endif]-->
		<div class="ssMoreDetails"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_moreDetailsLinkPanel">
					<a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_moreDetailsLink" href="/Online/Halo3UserContentDetails.aspx?h3fileid=21305912" target="primaryWindow">more details</a>
				</div></div>


			</div>
		</div>


	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_moreDetailsLinkPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_reportSpamLinkPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_reportResultsLabelPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_saveGalleryLinkButtonPanel">

	</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_fsItemPanelPanel">

	</div>
</div>	


</div>
<div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_empty_fsItemPanelPanel">

</div>
	<div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_unpinned_ssPanelPanel">

</div>
<div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_actionBarPanel">
<div id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_actionBar" class="bottom_bar">

		<ul class="links">
			<li class="slotNum">Slot : 2 </li>

			<li id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_copyListItem"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_shareCopyButtonPanel">
		<a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_shareCopyButton" href="javascript:__doPostBack('ctl00$mainContent$shareRepeater$ctl01$fileshareitem$shareCopyButton','')">copy to my share</a>
	</div></li>
			<li id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_removeListItem"><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_removeButtonPanel">
		<a onclick="return confirm('Are you sure you wish to remove this item?');" id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_removeButton" href="javascript:__doPostBack('ctl00$mainContent$shareRepeater$ctl01$fileshareitem$removeButton','')">delete</a>
	</div></li>
			<li id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_groupListItem"><a id="ctl00_mainContent_shareRepeater_ctl01_fileshareitem_groupButton" onclick="return openFileSetAddWindow(21305912,0,'ctl00_topLevelControls_fileSetWindow');" href="../StatControls/#">add to file set</a></li>
		</ul>			

</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_shareCopyButtonPanel">

</div><div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_trophyLinkButtonPanel">

</div>
</div>
	<div id="ctl00_ctl00_mainContent_shareRepeater_ctl01_fileshareitem_adminInfoPanel">

</div>
</div>

<pre>
<?php
$content = file_get_contents('http://www.bungie.net/stats/halo3/fileshare.aspx?gamertag=HaLo2FrEeEk');
$pieces = explode('<div class="shareTitle">', $content);
foreach ($pieces as $piece) {
	if (!preg_match('%/images/halo3stats/fileshareiconssm/film(?:clip)?s/sm/%', $piece)) {
		continue;
	}
	preg_match('%
		fileshareitem_titleLink[^>]+?
		h3fileid=(?P<h3fileid>\d+)
		[^>]+>
		(?P<title>.+?)
		</a>
		.+?
		Film\s+Length:\s+
		(?P<length>[\d:]+)
	%xis', $piece, $matches);
	$result = array();
	foreach (array_keys($matches) as $key) {
		if (!is_numeric($key)) {
			array_push($result, $matches[$key]);
		}
	}
	print_r($result);

}
?>
</pre>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.