dmaksimov Posted July 4, 2011 Share Posted July 4, 2011 I have some HTML code and I'm trying to extract some text between the HTML tags with preg_match. I have the following regular expression string: $regex = "/class=\"BVRRReviewText description\">[[:alnum:][:punct:][:word:]\s]+<\/span>/"; This is one of the matches that comes back. (I'm trying to retrieve the text in between the tags) class="BVRRReviewText description">Bought this hat as a gift for two men. The fit was very snug for one of them, and too small for the other. Might work for a child.</span><span class="BVRRReviewTextSuffix">"</span> Why does it stop at the second </span> and not the first. I've tried multiple methods and haven't got it to work. Also, is it possible to return just the text without the HTML tags? Thanks. Quote Link to comment https://forums.phpfreaks.com/topic/241095-php-regular-expressions/ Share on other sites More sharing options...
xyph Posted July 4, 2011 Share Posted July 4, 2011 Use a lazy quantifier. Watch out for embedded span tags. You're better off parsing this with a PHP DOM Parser Quote Link to comment https://forums.phpfreaks.com/topic/241095-php-regular-expressions/#findComment-1238360 Share on other sites More sharing options...
derwert Posted July 5, 2011 Share Posted July 5, 2011 dmaksimov you need to use a selector to specify which area you want to get, this is accomplished by using () around the area in the pattern you want to capture. You also need to set your pattern to not be greedy by using a question mark. So if you wanted to use your pattern you would change it to: $regex = "/class=\"BVRRReviewText description\">([[:alnum:][:punct:][:word:]\s]+?)<\/span>/"; When I parse data with pregex I usually do a quick and dirty pattern, in this case I'd do something along the lines of: $regex = '~class="BVRRReviewText description"\>(.*?)\</span\>~'; Also a note \ + * ? [ ^ ] $ ( ) { } = ! < > | : - are special characters in pregex so ensure you escape them properly so your pattern really should be something along the lines of: $regex = '~class\="BVRRReviewText description"\>([[:alnum:][:punct:][:word:]\s]+?)\</span\>~'; Quote Link to comment https://forums.phpfreaks.com/topic/241095-php-regular-expressions/#findComment-1238420 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.