Jump to content

How do I open another site's page and strip text from it for my page?


babysitter

Recommended Posts

I want to go to another site, get a specific page, collect the html and read specific parts of it and then display those bits in my page.

 

I have used the following in a website to collect the lat and long from one site to use in Google maps as in the UK we cannot use Geocode because the postcode/zipcode data is copyrighted.

 

I am afraid I can't wok out exactly what it is doing to get the lat and long.  My reverse engineering skills are letting me down.

 

Can anyone help please?

 

<?

if (isset($HTTP_GET_VARS['s'])){

 

$html = "";

 

$URL =file("http://www.schoolswebdirectory.co.uk/schoolinfo2.php?ref=29571".str_replace(" ","",$HTTP_GET_VARS['s'])."&advanced=&client=public&addr2=&quicksearch=".$HTTP_GET_VARS['s']."&addr3=&addr1=");

 

foreach ($URL as $url){

$html = $html.$url;

}

$html=strip_tags($html);

$page = explode("(",$html);

$lat = explode(")",$page[2]);

$longg = explode(")",$page[3]);

 

echo $lat;

echo $longg;

 

}

?>

I want to go to another site, get a specific page, collect the html and read specific parts of it and then display those bits in my page.

I had a simmlar problem which was solved in the "Regex within PHP" area. This may help you: http://www.phpfreaks.com/forums/index.php/topic,120594.0.html

It is breaking up the page by the "(" and ")" characters.  It knows that there are a fixed number of "(" before the latitude and longitude turn up.  It's a very un-sophisticated method :)

 

Try printing out out each variable used during the processing to get a better understanding of what's going on.

 

Particularly print out $page (use var_dump($page))

Of course it would help if I had included the origonal code not the oiece I was working on!!! Doh!

 

Thank you for your replies, much appreciated.

 

 

<?

if (isset($HTTP_GET_VARS['s'])){

 

$html = "";

 

$URL =file("http://www.multimap.com/map/browse.cgi?client=public&search_result=&db=pc&lang=&keepicon=true&pc=".str_replace(" ","",$HTTP_GET_VARS['s'])."&advanced=&client=public&addr2=&quicksearch=".$HTTP_GET_VARS['s']."&addr3=&addr1=");

 

foreach ($URL as $url){

$html = $html.$url;

}

$html=strip_tags($html);

$page = explode("(",$html);

$lat = explode(")",$page[2]);

$longg = explode(")",$page[3]);

 

 

 

}

?>

I am getting increasingly frustrated trying to extract information from the following html as there are so many pieces of html code next to the text I want to extract.  I am finding it very difficult to create a regular expression that works to extract the highlighted bits below.  Is there anyone who can help me with the regular expressions please?

 

 

<tr align="left">

            <td width="80" height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">School: </font></div></td>

            <td width="227" valign="bottom" class="listtext1"><font size=2>Kendal Nursery School</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Street: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>Queens Road</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Town: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>Kendal</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">County: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>Cumbria</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Postcode: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>LA9 4PH</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="top" class="listtext1"> </td>

            <td height="22" valign="bottom" class="listtext1"> </td>

<td height="22" valign="top" class="listtext1">

 

 

 

 

      </td>

          </tr>

          <tr align="left">

            <td height="23" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">School Website: </font> </div></td>

            <td height="23" colspan="2" valign="bottom" class="listtext1"><font size="2"><A HREF='http://www.kendalnurseryschoolbrantfield.co.uk/'>http://www.kendalnurseryschoolbrantfield.co.uk/</A><BR></font></td>

 

 

 

The full HTML is below if required

 

 

 

 

 

 

 

 

 

 

<body bgcolor="#FEF5DE">

 

<CENTER>

  <table width="120" height="8" border="0" cellspacing="0" cellpadding="0">

    <tr>

      <td> </td>

    </tr>

  </table>

  <table width="615" height="363" border="1" cellpadding="0" cellspacing="0" bordercolor="#999999" bgcolor="#FFFFFF">

    <tr>

      <td width="611" height="361"><table width="96%" height="431" border="0" align="center" cellpadding="2" cellspacing="0" bgcolor="#FFFFFF">

        <tbody>

          <tr align="left">

            <td colspan="2" valign="top" class="pagetext2"></td>

            <td valign="top" class="pagetext2"></td>

          </tr>

          <tr align="left" valign="top">

            <td height="64" colspan="3" class="pagetext2"><div align="left"><font color="#FF0000" size="6" face="Arial, Helvetica, sans-serif">schools</font><font color="#009900" size="6" face="Arial, Helvetica, sans-serif"><font color="#FF6600">web</font><font color="#FF0000">directory</font></font><font color="#FF6600" size="4">.co.uk</font></div>

                <font size="2" color="#000000"><u>School

                  Information</u></font>

                </td>

          </tr>

          <tr align="left" valign="middle">

            <td height="23" valign="bottom" class="listtext1"><div align="left">

              <div align="right"><font color="#000000" size="1">Our Ref No : </font></div>

            </div></td>

            <td height="23" valign="bottom" class="pagetext2"><font size= 2 color = red >31443</font> </td>

            <td width="268" rowspan="6" class="pagetext2"><div align="center"> <a href="image-link.php?ref=31443"> <img src="images/180x120.jpg" width="180" height="120" border="1" /></a> <br />

            </div></td>

          </tr>

          <tr align="left">

            <td width="80" height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">School: </font></div></td>

            <td width="227" valign="bottom" class="listtext1"><font size=2>Kendal Nursery School</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Street: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>Queens Road</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Town: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>Kendal</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">County: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>Cumbria</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Postcode: </font></div></td>

            <td height="22" valign="bottom" class="listtext1"><font size=2>LA9 4PH</font></td>

          </tr>

          <tr align="left">

            <td height="22" valign="top" class="listtext1"> </td>

            <td height="22" valign="bottom" class="listtext1"> </td>

<td height="22" valign="top" class="listtext1">

 

 

 

 

      </td>

          </tr>

          <tr align="left">

            <td height="23" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">School Website: </font> </div></td>

            <td height="23" colspan="2" valign="bottom" class="listtext1"><font size="2"><A HREF='http://www.kendalnurseryschoolbrantfield.co.uk/'>http://www.kendalnurseryschoolbrantfield.co.uk/</A><BR></font></td>

            </tr>

          <tr align="left">

            <td height="23" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Alumni Website: </font> </div></td>

            <td height="23" colspan="2" valign="bottom" class="listtext1"><font size="2"><A HREF=''></A><BR></font></td>

            </tr>

          <tr align="left">

            <td height="23" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Tel: </font> </div></td>

            <td height="23" valign="bottom" class="listtext1"><font size="2">01539 773626</font></td>

            <td height="23" valign="top" class="listtext1"><font size="2">click for <a href="http://uk8.multimap.com/map/browse.cgi?client=public&db=pc&addr1=&client=public&addr2=&advanced=&addr3=&pc=LA9 4PH" target="_blank" class="link_submenu">map</a></font></td>

          </tr>

          <tr align="left">

            <td height="23" valign="bottom" class="listtext1" ><div align="right" class="listtext1">Fax:</div></td>

            <td height="23" valign="bottom" class="listtext1"><font size="2">01539 773626</font></td>

            <td valign="top" class="pagetext2"></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">Type:</font></div></td>

            <td height="22" valign="bottom" class="listtext1">

nur   <font size = -1 color = red>Independent</font></td>

            <td valign="top" class="pagetext2"><span class="tabletext1"><a href="i-spy360.php"></a></span></td>

          </tr>

          <tr align="left">

            <td height="22" valign="bottom" class="listtext1"><div align="right"><font color="#000000" size="1">LEA:</font></div></td>

            <td height="22" valign="bottom" class="listtext1">Cumbria</td>

            <td valign="top" class="pagetext2"><a href="i-spy360.php"></a></td>

          </tr>

          <tr align="left">

            <td height="23" valign="bottom" class="tabletext1"> </td>

            <td height="23" valign="bottom" class="pagetext2"> </td>

            <td valign="top" class="pagetext2"><div align="right"><a href="javascript: history.back(1)" class="link_submenu">BACK</a> </div></td>

             

<!-- http://www.multimap.com/map/browse.cgi?client=public&pc=NR2%204DX -->

          </tr>

          <tr align="left" valign="middle">

            <td height="74" colspan="3" class="pagetext2"><div  class="listtext2" align="center"><font face="Geneva, Arial, Helvetica, sans-serif" size="1"><br />

                  </font><font face="Geneva, Arial, Helvetica, sans-serif">To correct or change any of the above information</font><font face="Geneva, Arial, Helvetica, sans-serif" size="1"><span class="listtext2">please email us <a href="mailto:[email protected]?Subject=School Ref 31443, Kendal Nursery School, LA9 4PH - Updates!">here</a></span> </font></font></div>

              <p> </p>

              <div align="center"><font face="Geneva, Arial, Helvetica, sans-serif" size="1">Copyright

              © Deepspace Web Services Ltd 1999-2007, All rights reserved</font></div></td>

          </tr>

        </tbody>

      </table></td>

    </tr>

  </table>

  <p> </p>

</CENTER>

<p> </p>

</body>

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.