Jump to content

php pull html from another page


digitalrjs

Recommended Posts

This code can read documents and files.  Do set it up as strings?

 

<?php
$dirpath = "images/news/";
$dl = opendir($dirpath);
//limit characters

while (false !== ($file = readdir($dl)) && $count < 3) {



    
    //admit sub directories
    if (!is_dir("$dirpath/$file")) {
       // $mod = filemtime("$dirpath/$file");
        $files[$file] = $mod;                      // store in array
	//limit characters
	$limit = 6;
	//subtractstring
	$count++;

    }
}
closedir($dl);
//Close Select

asort($files);                                     // sort array by date

foreach ($files as $fn => $mod)
{
    //$dt = date ('d M y H:i:s', $mod);
    echo "
<br>
<table>
<tr>
<td align='left'>
<img src='images/pdf.jpg' alt='adobe pdf' align='left' height='24' hspace='8' vspace='0' width='25'>


<a href='$dirpath/$fn'>".substr($fn,0,$limit). "</a>  
</tr>
</td>
</table>



" ;


}

//echo "<a href='$dirpath/$fn'>$fn</a> ($dt) <br />";
?>    

Here is the page this.html.

 

<table border="0" width="517" height="160" align="left">
<tbody>
<tr>
<td colspan="3" valign="top">
<strong>August 19, 2008 <br>
WEST WATEREE CHRONICLE <br>
</strong><u><font color="#0000ff"><a href="images/emailupdates/duipdf.pdf" target="_blank" title="GENERAL 2008 LEGISLATIVE UPDATE">TOUGHER PENALTIES COULD REDUCE DUI FATALITIES - GUEST EDITORIAL BY JOEL LOURIE</a></font></u> 
<p>
<strong>August 8, 2008 <br>
</strong><u><font color="#0000ff"><a href="images/emailupdates/joellouriepdf.pdf" target="_blank" title="GENERAL 2008 LEGISLATIVE UPDATE">2008 LEGISLATIVE UPDATE</a></font></u>

</p>
<strong>July 17, 2008</strong><br />
   <em><strong>THE COUNTRY CHRONICLE<br /></strong></em><u><font color="#0000ff"><a href="images/emailupdates/pdfonline.pdf" target="_blank" title="GENERAL ASSEMBLY TAKES STAND AGAINST DARFUR - GUEST EDITORIAL BY JOEL LOURIE">GENERAL ASSEMBLY TAKES STAND AGAINST DARFUR - GUEST EDITORIAL BY JOEL LOURIE</a></font></u><a href="images/emailupdates/nenewscigtax.pdf" target="_blank" title="The Northeast News - Raise the Cigarette Tax"><br />
<br>
<strong>July 17, 2008</strong><br />
   <em><strong>THE COUNTRY CHRONICLE<br /></strong></em><u><font color="#0000ff"><a href="images/emailupdates/pdfonline.pdf" target="_blank" title="GENERAL ASSEMBLY TAKES STAND AGAINST DARFUR - GUEST EDITORIAL BY JOEL LOURIE">GENERAL ASSEMBLY TAKES STAND AGAINST DARFUR - GUEST EDITORIAL BY JOEL LOURIE</a></font></u><a href="images/emailupdates/nenewscigtax.pdf" target="_blank" title="The Northeast News - Raise the Cigarette Tax"><br />
<br>
<strong>July 17, 2008</strong><br />
   <em><strong>THE COUNTRY CHRONICLE<br /></strong></em><u><font color="#0000ff"><a href="images/emailupdates/pdfonline.pdf" target="_blank" title="GENERAL ASSEMBLY TAKES STAND AGAINST DARFUR - GUEST EDITORIAL BY JOEL LOURIE">GENERAL ASSEMBLY TAKES STAND AGAINST DARFUR - GUEST EDITORIAL BY JOEL LOURIE</a></font></u><a href="images/emailupdates/nenewscigtax.pdf" target="_blank" title="The Northeast News - Raise the Cigarette Tax"><br />
   </a>
   </td>
   </tr>
   </tbody>
   </table>

 

 


I need php code where I can pull the list of pdf the way it is but the first 3.

Where are you getting this file from. It's horribly formatted. There are duplicate entries and some anchor tags that aren't even closed.

 

You're gonna wanna use regex to parse this file, but I'm not going to attempt it with such inconsistent formatting. There are duplicate titles with different anchor text, duplicate anchors, and as stated before, unclosed anchor tags.

 

If you can give me a clean source I can write to a regex to scrap the URLs.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.