Jump to content

Regexp For Getting First <P> Tag?


AA_Haider

Recommended Posts

I had a multiple line text I want to get only first paragraph.like

......................................…

<p>this is a paragraph

number 1</p>

<p>this is a paragraph

number 2</p>

......................................…

I want get only

..................................

this is a paragraph

number 1

................................

How can I do that with RegExp.

Please tell me.

Link to comment
Share on other sites

I have a these regex and I am using preg_match function

/(<p[^>]*>.*?<\/p>)/m

but this is not working

I did not know why But if paragraphs are a single line form then it will work fine.

I need it for multiply lines.

Edited by AA_Haider
Link to comment
Share on other sites

While I agree with Jessica that a DOM parser is the best way to go, there are a couple of things you can try with the regexp.

 

1) The dot in your pattern will NOT match a newline unless you add the "s" modifier.

2) Your pattern will not match a paragraph tag that is in uppercase, unless you use the "i" modifier.

3) Since you are not trying to anchor at the beginning or end of the string, you do NOT really need the "m" modifier.

4) Since you are looking for the match of the entire pattern, you do not need the capturing parenthesis.

 

/<p[^>]*>.*?<\/p>/si

might give you better luck.

 

 

P.S. If you are attempting to scrape a site without permission of the site owner, please ignore my advice and repent from your evil ways.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.