jjacquay712 Posted October 27, 2008 Share Posted October 27, 2008 I need to extract the text out of the first <p> tag in a web page for my spider. Im using preg match all with this pattern: /<p>([^<]*)<\/p> but its not working. I have no idea how to use Regular Expressions, so any help would be appreciated. Quote Link to comment https://forums.phpfreaks.com/topic/130353-solved-php-spider-and-regular-expressions/ Share on other sites More sharing options...
Jeremysr Posted October 27, 2008 Share Posted October 27, 2008 The only problem with your regex is that it won't work if there is a '<' inside the <p> tags. This should work: preg_match('/<p>(.*?)<\/p>/', $text, $matches); $p_tag_text = $matches[1]; Quote Link to comment https://forums.phpfreaks.com/topic/130353-solved-php-spider-and-regular-expressions/#findComment-676130 Share on other sites More sharing options...
msiekkinen Posted October 27, 2008 Share Posted October 27, 2008 The only problem with your regex is that it won't work if there is a '<' inside the <p> tags. This should work: preg_match('/<p>(.*?)<\/p>/', $text, $matches); $p_tag_text = $matches[1]; a more robust regex: ~<\s*p\b[^>]*>(.*?)<\s*/\s*p\s*>~is" however if you have something like <p> some text <p>More text</p> Final text </p> You'll only capture up to "More Text" ... which I imagine might not be what you want. Better approach would be to use tidy or the dom processing libraries to access the first P you find so you can properly get all it's children. Quote Link to comment https://forums.phpfreaks.com/topic/130353-solved-php-spider-and-regular-expressions/#findComment-676164 Share on other sites More sharing options...
ghostdog74 Posted October 28, 2008 Share Posted October 28, 2008 I need to extract the text out of the first <p> tag in a web page for my spider. Im using preg match all with this pattern: /<p>([^<]*)<\/p> but its not working. I have no idea how to use Regular Expressions, so any help would be appreciated. there's no need to use regex, if you are not familiar. There are many string methods you can use in PHP, such as strpos $startpos = strpos($data,"<p>"); $endpos = strpos($data,"</p>"); echo substr($data,$startpos+strlen("<p>"),$endpos - $startpos); Quote Link to comment https://forums.phpfreaks.com/topic/130353-solved-php-spider-and-regular-expressions/#findComment-676391 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.