Jump to content

Getting string inside string


Lautarox

Recommended Posts

I'm trying to get the html code of a page, for example, action="pageinside" name="id"> I want to get the /page.php inside the " ", but I'm getting messed up when I try to comment the spcecial characters. It would be nice to see how it is commented, if someone can provide me with an example, it would be great.

Link to comment
Share on other sites

Thanks for your answer.

I'm actually trying to get from: <form id="login_form" action="webpage.php"><input... ,

using:

$pattern = '~id=\"login_form\" action=\"(.+)\">~';
preg_match_all($pattern, $subject, $matches);

I'm getting the page but also all the code that follows it.

What am I doing wrong?

Thanks in advance.

Link to comment
Share on other sites

The most effective Regex is the shortest one.

 

How many login forms are there on the page? Use preg_match to only match the one (and only?) occurrence, preg_match_all for multiple occurerences.

No need to escape " if you use single quote, too.

 

preg_match_all('#id="login_form" action="([^"]+)#', $webpage, $match);

Link to comment
Share on other sites

The most effective Regex is the shortest one.

 

How many login forms are there on the page? Use preg_match to only match the one (and only?) occurrence, preg_match_all for multiple occurerences.

No need to escape " if you use single quote, too.

 

preg_match_all('#id="login_form" action="([^"]+)#', $webpage, $match);

that is not always true, you also have to be preemptive when working with regex...the code that you posted will not allow for spaces in between "action" and "=" in both the id and the action of the form, which is acceptable and valid syntax..

Link to comment
Share on other sites

He's not attempting to PARSE html. RegEx isn't meant to parse markup.

 

There's a ton of perfectly valid markup that would cause your expression to fail as well.

 

Simple is generally better with RegEx. You want a fast way to make a complex match in a string. If you want to account for variable syntax and constantly changing markup, you may want to use an HTML parser.

Link to comment
Share on other sites

He's not attempting to PARSE html. RegEx isn't meant to parse markup.

 

There's a ton of perfectly valid markup that would cause your expression to fail as well.

 

Simple is generally better with RegEx. You want a fast way to make a complex match in a string. If you want to account for variable syntax and constantly changing markup, you may want to use an HTML parser.

bottom line, the regex he posted won't catch anything if the user includes spaces like i said above.

Link to comment
Share on other sites

Bottom line, RegEx isn't meant to parse markup. HTML has a very loose syntax. I'm not saying it's a bad thing, but accounting for it all with RegEx will make an ugly and slow expression.

 

Even something simple like yours. Your RegEx doesn't take single quotes into account. Keep in mind, that something like attribute='valwith"doublequote' is fine markup. Yours won't account for action="something.php?var=something" either.

 

See where I'm getting at here? If you want to account for every markup variation with loose syntax, use a parser, not RegEx.

Link to comment
Share on other sites

You have to decide if RegEx will work well in your specific situation. If the string you're searching stays quite static, RegEx will work fine.

i see your point about my code not accepting single quotes and your right on that one, i'm not a big fan of using regex on html anyway, too many things to take into account here and the code needs to be consistently static throughout to receive the desired results..

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.