Jump to content

Expert needed!


drisate

Recommended Posts

Hey guys i need to retreive vars from a foreign website but i am very bad with regular expretions. I need to retreive from a board page the username and posted date for every treads of the page the HTML looks like this and i need to retreive the red parts:

 

-------------------------------------

<table cellpadding="4" cellspacing="1" border="0" style="width:100%" class="tableinborder">

  <tr align="left">

    <td class="tablea" valign="top"><a name="post7580219" id="post7580219"></a>

    <table style="width:100%" cellpadding="4" cellspacing="0" border="0" class="tablea_fc">

      <tr>

        <td style="width:100%" class="smallfont"><span class="normalfont"><b>

        <a href="profile.php?userid=31486"> Da LaZ </a></b></span> <br />

        Heavy Fighter <br />

        <img src="en_images_ogame/star2.gif" border="0" alt title /><img src="en_images_ogame/star2.gif" border="0" alt title /><img src="en_images_ogame/star2.gif" border="0" alt title />

        <br />

        <br />

        <img src="images/avatars/avatar-49031.jpg" border="0" alt="images/avatars/avatar-49031.jpg" title /><br />

        <br />

        Registration Date: 03-10-2006<br />

        Posts: 1,102<br />

        Universe: uni1<br />

        Alliance: pirates<br />

        <br />

        <img src="en_images_ogame/spacer.gif" width="159" height="1" border="0" alt title /></td>

        </tr>

        </table>

        </td>

        <td class="tablea" valign="top" style="width:100%">

        <table style="width:100%" cellpadding="4" cellspacing="0" border="0" class="tablea_fc">

          <tr>

            <td style="width:100%" class="normalfont" align="left">

            <table style="width:100%" cellpadding="4" cellspacing="0" border="0" class="tablea_fc">

              <tr>

                <td><span class="smallfont"><b>evil vs POW</b></span></td>

                <td align="right" nowrap="nowrap">

                <a href="addreply.php?postid=7580219">

                <img src="en_images_ogame/replypost.gif" border="0" alt="Reply to this Post" title="Reply to this Post" /></a>

                <a href="addreply.php?action=quote&postid=7580219">

                <img src="en_images_ogame/quote.gif" border="0" alt="Post Reply with Quote" title="Post Reply with Quote" /></a>

                <a href="editpost.php?postid=7580219">

                <img src="en_images_ogame/editpost.gif" border="0" alt="Edit/Delete Posts" title="Edit/Delete Posts" /></a>

                <a href="report.php?postid=7580219">

                <img src="en_images_ogame/report.gif" border="0" alt="Report Post to a Moderator" title="Report Post to a Moderator" /></a>       

                <a href="javascript:self.scrollTo(0,0);">

                <img src="en_images_ogame/goup.gif" border="0" alt="Go to the top of this page" title="Go to the top of this page" /></a></td>

              </tr>

            </table>

            <hr size="1" class="threadline" />

            <div align="center">

              <br />

              message</div>

            </td>

            </tr>

          </table>

          </td>

        </tr>

        <tr>

          <td class="tablea" align="center" nowrap="nowrap">

          <span class="smallfont">

          <a href="thread.php?postid=7580219#post7580219">

          <img src="en_images_ogame/posticon.gif" border="0" alt title /></a> 03-24-2009

          <span class="time">02:37</span></span>  </td>

          <td class="tablea" align="left" style="width:100%" valign="middle">

          <span class="smallfont">

          <img src="en_images_ogame/user_offline.gif" border="0" alt="Da LaZ is offline" title="Da LaZ is offline" />

          <a href="search.php?action=user&userid=31486">

          <img src="en_images_ogame/search.gif" border="0" alt="Search for Posts by Da LaZ" title="Search for Posts by Da LaZ" /></a>

          <a href="usercp.php?action=buddy&add=31486">

          <img src="en_images_ogame/homie.gif" border="0" alt="Add Da LaZ to your Buddy List" title="Add Da LaZ to your Buddy List" /></a>

          <a href="pms.php?action=newpm&userid=31486">

          <img src="en_images_ogame/pm.gif" border="0" alt="Send a Private Message to Da LaZ" title="Send a Private Message to Da LaZ" /></a>

          </span></td>

        </tr>

        </table>

        </td>

        </tr>

        </table>

-------------------------------------

 

So objective 1, loop the page for every threads

objective 2 for each loops, extract the username and posted date

 

If you need a full page exemple, this is one: http://board.ogame.org/thread.php?threadid=537635

Link to comment
Share on other sites

I'll supply the meat and potatoes, you supply the gravy if you get my drift ;)

 

$userName = array();
$postDate = array();
date_default_timezone_set('America/Montreal'); // *set this value to correct timezone of server in question

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://board.ogame.org/thread.php?threadid=537635');
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//a[substring(@href,1,19) ="profile.php?userid="]'); // extract user
$spanTag = $xpath->query('//td[@class="tablea" or @class="tableb"]/span'); // extract post date

foreach ($aTag as $aVal) {
$userName[] = $aVal->nodeValue; // store user name into array $user
}

foreach ($spanTag as $spanVal) {
if(preg_match('#(??:\d{2}-){2}\d{4}|Today,) \d{2}:\d{2}#', $spanVal->nodeValue, $match)){
	$match[0] = str_replace('Today,', date('m-d-Y'), $match[0]); // if found, replace 'Today,' with today's date in xx-xx-2009 format
	$postDate[] = $match[0]; // store post date into array $time
}
}

echo '<pre>'.print_r($userName, true); // outputs all user names
echo '<pre>'.print_r($postDate, true); // outputs all post dates
// *We set the default time zone in case the sequence 'Today,' is found within the time entry, which we convert to today's date using date(). Otherwise, there will be a Strict Standards Notice

 

Output:

Array
(
    [0] => Da LaZ
    [1] => Zombie
    [2] => .GameOver
    [3] => .GameOver
    [4] => .GameOver
    [5] => kepone factory
    [6] => Necessary Evil
    [7] => greenie
)
Array
(
    [0] => 03-24-2009 02:37
    [1] => 03-26-2009 19:13
    [2] => 03-27-2009 23:45
    [3] => 03-29-2009 22:45
    [4] => 04-01-2009 20:24
    [5] => 04-01-2009 20:39
    [6] => 04-01-2009 20:49
    [7] => 04-10-2009 03:40
)

Link to comment
Share on other sites

I forgot about 'Yesterday,' as a possibility, so after this line:

$match[0] = str_replace('Today,', date('m-d-Y'), $match[0]); // if found, replace 'Today,' with today's date in xx-xx-2009 format

You can add:

$match[0] = str_replace('Yesterday,', date('m-d-Y', strtotime("-1 day")), $match[0]); // if found, replace 'Yesterday,' with today's date -1 day in xx-xx-2009 format

Link to comment
Share on other sites

As I went out for a walk, I pondered about this thread and what I provided, and thought it could be faster (how much, I don't know.. didn't time it).

Therefore, I tweaked the snippet here and there. This version has tighter entry extraction and doesn't use regex (and now I'm done):

 

$userName = array();
$postDate = array();
date_default_timezone_set('America/Montreal'); // *set this value to correct timezone of server in question

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://board.ogame.org/thread.php?threadid=537635');
$xpath = new DOMXPath($dom);
$aTag = $xpath->query('//a[substring(@href,1,19) ="profile.php?userid="]'); // extract user
$spanTag = $xpath->query('//td[@class="tablea" or @class="tableb"]/span[contains(.,":")]'); // extract post date

foreach ($aTag as $aVal) {
$userName[] = $aVal->nodeValue; // store user name into array $user
}

foreach ($spanTag as $spanVal){
if(strlen($spanVal->nodeValue) < 27){
	$spanVal->nodeValue = trim($spanVal->nodeValue);
	$spanVal->nodeValue = (substr($spanVal->nodeValue, 0, 6) == 'Today,')? str_replace('Today,', date('m-d-Y'), $spanVal->nodeValue) : $spanVal->nodeValue;
	$spanVal->nodeValue = (substr($spanVal->nodeValue, 0, 10) == 'Yesterday,')? str_replace('Yesterday,', date('m-d-Y', strtotime("-1 day")), $spanVal->nodeValue) : $spanVal->nodeValue;
	$postDate[] = $spanVal->nodeValue; // store post date into array $time
}
}

echo '<pre>'.print_r($userName, true); // outputs all user names
echo '<pre>'.print_r($postDate, true); // outputs all post dates

 

Either version should accomplish the same thing.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.