What is wrong with my regex preg_match? (I'm new to it..)!

physaux · January 19, 2010

		$regex = '/<form.+>(.+?)<\/form/';
	preg_match($regex,$pagehtmlcode,$output);
	echo count($output).'</br>';
	foreach($output as $instanceoutput){
		echo "[".$instanceoutput."]</br>";
	}

As you can see, I am trying to print out my results, but I am not getting anything printed out. My "counter" simply prints out 0.

I am trying to grab all the text between form tags; for example:

*there are line breaks in "WANT THIS TEXT", and there is more than one "<form>" couples in the code. That I am sure of, as I have printed it out.

So, is there a problem in my regex that I don't see :confused:

Thanks!!

cags · January 19, 2010

If there are line spaces that you wish to match in your pattern you will need to add the s modifier. Also you state that there are more than one form element, once you have added the s modifier you will probably find that your pattern matches only one result, because you have use .+ in the opening tag without making it lazy, this will match much more than what you want. In place of your first .+ you would be much better off adding [^>]* so that you only match until you find the greater than sign which closes the tag.

physaux · January 19, 2010

Ok so, I'm not really exactly sure what you said. How do I apply a modifier? Here are some changes I thought I should do:

-I tried to change the regex how you said :-\

-I changed "preg_match" to "preg_match_all"

-I added a delimiter, "PREG_SET_ORDER"

$regex = '/<form.+>([^>]*?)<\/form/';
preg_match_all($regex,$pagehtmlcode,$output, PREG_SET_ORDER);

What exactly do you mean by all those changes? I don't really understand how I should change my current regex expression :shrug: :confused:

**The form tag can't just match to the next "<", because there are more tags inside of the form tags, i.e "<input", and so on

cags · January 19, 2010

Well the first change I didn't think was too complicated, as I said replace the first .+ with the section of pattern I provided. If you don't know how to add a modifier you are very much fighting an uphill battle, I suggest you read through the official manual for PCRE, modifiers is covered fairly early on. A modifier is placed after the closing delimiter for the pattern.

physaux · January 20, 2010

aha thank you for the great resource, I read some of it, then read what you said, then was confused again. I took a short break, read it again, made your changes, and now it is working perfectly! Thanks for your patience in helping me! :)

Here is what I did:

$regex = '/<form[^>]*>(.+?)<\/form/s';

, as well as preg_match_all

yay!!

physaux · January 20, 2010

Ah ok, so I have ran into another problem, albeit a small one. Now, instead of searching for <form> </form> tags, I need to search for <input ..../> tags. Here is my regex for that:

$regex = '/<input[^>]*>(.+?)\/>/s';

It works great! But... now I want to get more specific, I want it to only include input fields that do not have the following text type="hidden". Any idea how I can modify my regex to accomplish this? I am only doing this so that I can count the number of visible input fields per form. Perhaps DomDocument would be better for this? idk what is your opinion someone?

cags · January 20, 2010

Generally speaking using DOMDocument/DOMXPath are a more correct way of parsing HTML. I think you would need a negative lookahead assertion to achieve that aim. Completely untested, but...

$regex = '/<input(?!type="hidden")[^>]*>(.+?)\/>/s';

Sign In

What is wrong with my regex preg_match? (I'm new to it..)!

Recommended Posts

physaux

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

physaux

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

physaux

Link to comment

Share on other sites

physaux

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information