Jump to content

What is wrong with my regex preg_match? (I'm new to it..)!


physaux

Recommended Posts

		$regex = '/<form.+>(.+?)<\/form/';
	preg_match($regex,$pagehtmlcode,$output);
	echo count($output).'</br>';
	foreach($output as $instanceoutput){
		echo "[".$instanceoutput."]</br>";
	}

 

As you can see, I am trying to print out my results, but I am not getting anything printed out. My "counter" simply prints out 0.

I am trying to grab all the text between form tags; for example:

<form property1="value1" ...>WANT THIS TEXT</form ...>

*there are line breaks in "WANT THIS TEXT", and there is more than one "<form>" couples in the code. That I am sure of, as I have printed it out.

 

So, is there a problem in my regex that I don't see :confused: :confused:

 

Thanks!!

If there are line spaces that you wish to match in your pattern you will need to add the s modifier. Also you state that there are more than one form element, once you have added the s modifier you will probably find that your pattern matches only one result, because you have use .+ in the opening tag without making it lazy, this will match much more than what you want. In place of your first .+ you would be much better off adding [^>]* so that you only match until you find the greater than sign which closes the tag.

Ok so, I'm not really exactly sure what you said. How do I apply a modifier? Here are some changes I thought I should do:

-I tried to change the regex how you said :-\

-I changed "preg_match" to "preg_match_all"

-I added a delimiter, "PREG_SET_ORDER"

$regex = '/<form.+>([^>]*?)<\/form/';
preg_match_all($regex,$pagehtmlcode,$output, PREG_SET_ORDER);

 

What exactly do you mean by all those changes? I don't really understand how I should change my current regex expression  :shrug::confused:

 

**The form tag can't just match to the next "<", because there are more tags inside of the form tags, i.e "<input", and so on

Well the first change I didn't think was too complicated, as I said replace the first .+ with the section of pattern I provided. If you don't know how to add a modifier you are very much fighting an uphill battle, I suggest you read through the official manual for PCRE, modifiers is covered fairly early on. A modifier is placed after the closing delimiter for the pattern.

aha thank you for the great resource, I read some of it, then read what you said, then was confused again. I took a short break, read it again, made your changes, and now it is working perfectly! Thanks for your patience in helping me! :) :)

 

Here is what I did:

$regex = '/<form[^>]*>(.+?)<\/form/s';

, as well as preg_match_all

 

yay!! :)

Ah ok, so I have ran into another problem, albeit a small one. Now, instead of searching for <form> </form> tags, I need to search for <input ..../> tags. Here is my regex for that:

 

$regex = '/<input[^>]*>(.+?)\/>/s';

 

It works great! But... now I want to get more specific, I want it to only include input fields that do not have the following text type="hidden". Any idea how I can modify my regex to accomplish this? I am only doing this so that I can count the number of visible input fields per form. Perhaps DomDocument would be better for this? idk what is your opinion someone?

Generally speaking using DOMDocument/DOMXPath are a more correct way of parsing HTML. I think you would need a negative lookahead assertion to achieve that aim. Completely untested, but...

 

$regex = '/<input(?!type="hidden")[^>]*>(.+?)\/>/s';

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.