Lukus Posted January 29, 2008 Share Posted January 29, 2008 Hi guys I've had a search around, but haven't found anything that suits my needs, and my regex isn't good enough to manipulate what I have found to work for me. So I'd appreciate any advice you can give me to help with my problem Basically, I've written a script which pulls html data from 2000+ pages, and then puts the data for each page into a new file with a new template wrapped around it. Everything works fine, except now I'd like to do tidy the data I've output by removing any absolute links found, but leaving relative links in tact. Here's an example of the content I'm dealing with (it's all stored in a variable, say $content): [<a href="http://www.one-absolute-link.com/test.html">site map </a>]</em></p> [<a href="http://www.absolute-link.com/test.html">comments</a>] [<a href="http://www.another-absolute-link.com/test.html">search </a>]</em></p> <h4 align="left"><img src="../images/arrow.gif" width="15" height="12"> Header</h4> <ul> <li><a href="members.html">Text1</a></li> </ul> <ul> <li><a href="agendas/index.html">Text2</a><br> </li> <li><a href="minutes/index.html">Text3</a><br> </li> <li><a href="papers/index.html">Text4</a></li> <li><a href="reports/index.html">Text5</a><br> </li> </ul> <p align="left">[<a href="http://www.one-absolute-link.com/test.html">Abs Link</a>] [<a href="http://www.one-absolute-link.com/test.html">Abs Link]</a></p> Note how messy the html is, this is one of the problems I was having as links often span multiple lines. I'd like to be able to run a function on $content, which removes any absolute links it finds, but leaves relative links intact. My ideal output would be: <h4 align="left"><img src="../images/arrow.gif" width="15" height="12"> Header</h4> <ul> <li><a href="members.html">Text1</a></li> </ul> <ul> <li><a href="agendas/index.html">Text2</a><br> </li> <li><a href="minutes/index.html">Text3</a><br> </li> <li><a href="papers/index.html">Text4</a></li> <li><a href="reports/index.html">Text5</a><br> </li> </ul> <p align="left"></p> (I still expect the output html to be just as messy and unformatted, but this can't really be helped when dealing with so many pages) If anyone could point me in the right direction I'd be extremely grateful. Thanks, and good morning Luke Quote Link to comment Share on other sites More sharing options...
rajivgonsalves Posted January 29, 2008 Share Posted January 29, 2008 try this out need more work but you can give it a shot echo preg_replace("#<a[\t\s\n\]*href=[\"\']*http://[^\"\']*[^>]+(.*?)</a>#","",$content); Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.