schapel Posted June 28, 2009 Share Posted June 28, 2009 Hey all, I've run into a little problem with a script I've been working on. The script grabs a bunch of HTML from a separate web page, and then displays the results onto my page. I have already put in some basic filtering using the strip_tags function of PHP so that only A TR TD TH and P are allowed as HTML tags when the code is put in. However, my next step is to convert the links that are coming back to a specific absolute URL prefix. For example, the links are being returned as <a href="/test/blabla.html">text link</a> which is accurate to how they are on the page the URL is grabbed from. However, I need to loop through all the A tags and add a direct URL before the HREF so that the links actually work when clicked. If they are left as is, the link behaves as if that URL is located on my page, which it isn't. I guess my question is, is there a simple function to accomplish this feature in any of the major PHP frameworks or in the core PHP code that I have missed? I found some long code examples on how to accomplish this but I was hoping there is just some simple function I can use instead that might be easier. Thanks! Link to comment https://forums.phpfreaks.com/topic/164009-filtering-scraped-content-with-php-add-absolute-url-to-relative-url/ Share on other sites More sharing options...
dzelenika Posted June 28, 2009 Share Posted June 28, 2009 Take a look to PHP's DOM extension HTML page is an XML document. The easiest way to change XML is through DOM. Link to comment https://forums.phpfreaks.com/topic/164009-filtering-scraped-content-with-php-add-absolute-url-to-relative-url/#findComment-865192 Share on other sites More sharing options...
schapel Posted June 28, 2009 Author Share Posted June 28, 2009 I'm using regular expressions to scrape the html I want to begin with, as from what I understood it is the fastest way. Wouldn't the output need to be converted to XML first, and then run through DOM parameters? It seems like a very indirect way of doing it with some unnecessary steps, but then again I'm not really sure... Link to comment https://forums.phpfreaks.com/topic/164009-filtering-scraped-content-with-php-add-absolute-url-to-relative-url/#findComment-865204 Share on other sites More sharing options...
cunoodle2 Posted June 28, 2009 Share Posted June 28, 2009 Maybe I'm missing something here but couldn't you just do something like this.... <?php $url = "http://www.SiteYouWantToAdd.Com"; $text = str_replace('<a href="/', '<a href="$url/', $text); ?> Link to comment https://forums.phpfreaks.com/topic/164009-filtering-scraped-content-with-php-add-absolute-url-to-relative-url/#findComment-865206 Share on other sites More sharing options...
dzelenika Posted June 28, 2009 Share Posted June 28, 2009 I'm using regular expressions to scrape the html I want to begin with, as from what I understood it is the fastest way. Wouldn't the output need to be converted to XML first, and then run through DOM parameters? It seems like a very indirect way of doing it with some unnecessary steps, but then again I'm not really sure... Does this looks complicated or indirect way to solve your problem? <?php $doc = new DOMDocument; $doc->load('page.html'); $items = $doc->getElementsByTagName('a'); for ($i = 0; $i < $items->length; $i++) { $items->item($i)->setAttribute('href', '/test/blabla.html'); } ?> Link to comment https://forums.phpfreaks.com/topic/164009-filtering-scraped-content-with-php-add-absolute-url-to-relative-url/#findComment-865207 Share on other sites More sharing options...
schapel Posted June 28, 2009 Author Share Posted June 28, 2009 No, I was just asking Although, that script doesn't really accomplish or fit with what I'm doing, but I get the jist of what you're saying. I already have the scraped content stored in a string variable, nicely filtered, and I would need to ADD a partial string to the beginning of the HREF tag rather than just cycle through and replace all of them. I suppose I'll have to read up more on DOM, thanks for your help. Link to comment https://forums.phpfreaks.com/topic/164009-filtering-scraped-content-with-php-add-absolute-url-to-relative-url/#findComment-865215 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.