Lautarox Posted July 20, 2011 Share Posted July 20, 2011 I'm trying to get the html code of a page, for example, action="pageinside" name="id"> I want to get the /page.php inside the " ", but I'm getting messed up when I try to comment the spcecial characters. It would be nice to see how it is commented, if someone can provide me with an example, it would be great. Quote Link to comment Share on other sites More sharing options...
AyKay47 Posted July 20, 2011 Share Posted July 20, 2011 $subject = "<form action ='test.php' name='test'>"; $pattern = '~action\s*=\s*\"(\w+\.[php|asp|html|htm])\"~'; preg_match($pattern, $subject, $matches); print_r($matches); Quote Link to comment Share on other sites More sharing options...
Lautarox Posted July 21, 2011 Author Share Posted July 21, 2011 Thanks for your answer. I'm actually trying to get from: <form id="login_form" action="webpage.php"><input... , using: $pattern = '~id=\"login_form\" action=\"(.+)\">~'; preg_match_all($pattern, $subject, $matches); I'm getting the page but also all the code that follows it. What am I doing wrong? Thanks in advance. Quote Link to comment Share on other sites More sharing options...
silkfire Posted July 21, 2011 Share Posted July 21, 2011 The most effective Regex is the shortest one. Â How many login forms are there on the page? Use preg_match to only match the one (and only?) occurrence, preg_match_all for multiple occurerences. No need to escape " if you use single quote, too. Â preg_match_all('#id="login_form" action="([^"]+)#', $webpage, $match); Quote Link to comment Share on other sites More sharing options...
Lautarox Posted July 21, 2011 Author Share Posted July 21, 2011 Thanks, I'll use preg_match instead. The ([^"]+) means all the characters inside exept " right? thanks for answering Quote Link to comment Share on other sites More sharing options...
AyKay47 Posted July 21, 2011 Share Posted July 21, 2011 The most effective Regex is the shortest one. Â How many login forms are there on the page? Use preg_match to only match the one (and only?) occurrence, preg_match_all for multiple occurerences. No need to escape " if you use single quote, too. Â preg_match_all('#id="login_form" action="([^"]+)#', $webpage, $match); that is not always true, you also have to be preemptive when working with regex...the code that you posted will not allow for spaces in between "action" and "=" in both the id and the action of the form, which is acceptable and valid syntax.. Quote Link to comment Share on other sites More sharing options...
xyph Posted July 21, 2011 Share Posted July 21, 2011 He's not attempting to PARSE html. RegEx isn't meant to parse markup. Â There's a ton of perfectly valid markup that would cause your expression to fail as well. Â Simple is generally better with RegEx. You want a fast way to make a complex match in a string. If you want to account for variable syntax and constantly changing markup, you may want to use an HTML parser. Quote Link to comment Share on other sites More sharing options...
AyKay47 Posted July 21, 2011 Share Posted July 21, 2011 He's not attempting to PARSE html. RegEx isn't meant to parse markup. Â There's a ton of perfectly valid markup that would cause your expression to fail as well. Â Simple is generally better with RegEx. You want a fast way to make a complex match in a string. If you want to account for variable syntax and constantly changing markup, you may want to use an HTML parser. bottom line, the regex he posted won't catch anything if the user includes spaces like i said above. Quote Link to comment Share on other sites More sharing options...
xyph Posted July 21, 2011 Share Posted July 21, 2011 Bottom line, RegEx isn't meant to parse markup. HTML has a very loose syntax. I'm not saying it's a bad thing, but accounting for it all with RegEx will make an ugly and slow expression. Â Even something simple like yours. Your RegEx doesn't take single quotes into account. Keep in mind, that something like attribute='valwith"doublequote' is fine markup. Yours won't account for action="something.php?var=something" either. Â See where I'm getting at here? If you want to account for every markup variation with loose syntax, use a parser, not RegEx. Quote Link to comment Share on other sites More sharing options...
Lautarox Posted July 21, 2011 Author Share Posted July 21, 2011 Yes, I understand, but It will work fine as a way to learn RegEx. Will an HTML parser be faster? Quote Link to comment Share on other sites More sharing options...
xyph Posted July 21, 2011 Share Posted July 21, 2011 You have to decide if RegEx will work well in your specific situation. If the string you're searching stays quite static, RegEx will work fine. Quote Link to comment Share on other sites More sharing options...
AyKay47 Posted July 22, 2011 Share Posted July 22, 2011 You have to decide if RegEx will work well in your specific situation. If the string you're searching stays quite static, RegEx will work fine. i see your point about my code not accepting single quotes and your right on that one, i'm not a big fan of using regex on html anyway, too many things to take into account here and the code needs to be consistently static throughout to receive the desired results.. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.