d.shankar Posted August 17, 2007 Share Posted August 17, 2007 I need to extract the textbox names and form names under the form tag of any html source file of a site. Consider the code <form action="new.asp" method="post"> <input type="text" name="txt1"> </form> In this code , i have to extract new.asp and txt1. Please help. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/ Share on other sites More sharing options...
markjoe Posted August 18, 2007 Share Posted August 18, 2007 If this is the Greatest problem on Earth, there's a lot your leaving out. (hehe) I am new to regular expressions, but I believe I'm on the right track here. /action="(\w+.*\w+)"/ should match new.asp and similar. /name="(\w+)"/ should match any name value. You may want to add to the second one if you only want to find names of input elements. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327379 Share on other sites More sharing options...
d.shankar Posted August 18, 2007 Author Share Posted August 18, 2007 Mark ! I once again say this is the most toughest part in the world . say if you have two or three forms in a single page ? for eg; i have this source <form action="form1.asp" method="get"> <input type="text" name="val1"> </form> <form action="form2.asp" method="post"> <input type="text" name="val2"> </form> i have to capture these resullts to an array .. i.e. array(0)=form1.asp & val1 array(1)=form2.asp & val2 Hope you understand my problem. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327387 Share on other sites More sharing options...
MadTechie Posted August 18, 2007 Share Posted August 18, 2007 try this <?php $data = '<form action="form1.asp" method="get"> <input type="text" name="val1"> </form> <form action="form2.asp" method="post"> <input type="text" name="val2"> </form> '; preg_match_all('/(?<=\<form )action="(\w+\.\w+)".*?name="(\w+)"/si', $data, $result, PREG_PATTERN_ORDER); $forms = $result[1]; $values = $result[2]; echo "<pre>"; print_r($forms); print_r($values); echo "or<br />"; $newarray = array(); foreach($forms as $K => $V) { $newarray[] = "$V & {$values[$K]}"; } print_r($newarray); ?> Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327412 Share on other sites More sharing options...
d.shankar Posted August 18, 2007 Author Share Posted August 18, 2007 Hi MT thanks for reply. you did a great job. but actually i have captured the source of website in a variable $data in which it was previously holding the <form> thing in the previous post. I am not getting the results into the variable for this code <?php $url="www.google.com"; $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch,CURLOPT_FAILONERROR,true); curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); $data = curl_exec($ch); //here $data variable contains the source of google.com preg_match_all('/(?<=\<form )action="(\w+\.\w+)".*?name="(\w+)"/si', $data, $result, PREG_PATTERN_ORDER); $forms = $result[1]; $values = $result[2]; echo "<pre>"; print_r($forms); print_r($values); echo "or<br />"; $newarray = array(); foreach($forms as $K => $V) { $newarray[] = "$V & {$values[$K]}"; } print_r($newarray); ?> any idea buddy ? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327459 Share on other sites More sharing options...
MadTechie Posted August 18, 2007 Share Posted August 18, 2007 yep.. not sure what you want this for but.. <?php $url="www.google.com"; $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch,CURLOPT_FAILONERROR,true); curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); $data = curl_exec($ch); //here $data variable contains the source of google.com preg_match_all('/(?<=\<form )action="([\S]+)".*?name=(\w+)/s', $data, $result, PREG_PATTERN_ORDER); $forms = $result[1]; $values = $result[2]; echo "<pre>"; print_r($forms); print_r($values); echo "or<br />"; $newarray = array(); foreach($forms as $K => $V) { $newarray[] = "$V & {$values[$K]}"; } print_r($newarray); ?> Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327467 Share on other sites More sharing options...
d.shankar Posted August 18, 2007 Author Share Posted August 18, 2007 MT.. Actually i am working on a spidering/crawling project and i am 0 in regex. thats why i need these accurate details.. back to the code.. the code works only for google.com buddy.. why is it so ? does the regex needs to be changed or it is perfect? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327471 Share on other sites More sharing options...
MadTechie Posted August 18, 2007 Share Posted August 18, 2007 the only way i can think if you doing this something like this <?php $url="www.google.com"; $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch,CURLOPT_FAILONERROR,true); curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false); $data = curl_exec($ch); //here $data variable contains the source of google.com preg_match_all('/.*action\s?=(?:\'|"|\s)?([^\'"\s\>]*)(?:\'|"|\s)?.*?name=(?:\'|"|\s)?([^\'"\s\>]*)(?:\'|"|\s)?/s', $data, $result, PREG_PATTERN_ORDER); $forms = $result[1]; $values = $result[2]; echo "<pre>"; print_r($forms); print_r($values); echo "or<br />"; $newarray = array(); foreach($forms as $K => $V) { $newarray[] = "$V & {$values[$K]}"; } print_r($newarray); ?> have a go see if it works Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327484 Share on other sites More sharing options...
d.shankar Posted August 18, 2007 Author Share Posted August 18, 2007 Thanks MT for spending your precious time. But each and every time i post for asking help i really feel guilty. i am really sorry buddy. your code works perfectly well but i come to back to the main requirement. the code is able to fetch the values for only a single form. FYI: this page www.dnschart.com/name.php contains two forms. your superb code fetched one but ignored the other. your code fetched the form "whois/results.php" and variables "query" it left "recordres2.php" and variables "domain" I changed the for loop but it didnt help me . Any suggestions MT ? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-327497 Share on other sites More sharing options...
d.shankar Posted August 19, 2007 Author Share Posted August 19, 2007 Hi .. I have switched to DOM with the same problem its working but i need small alteration I already mentioned that if there are two or more forms then it will be a trouble , actually i need it in this way. form1 var1a var1b form2 var2b form3 var3a var3b var3c Each variables should converge under their parent form. I have coded this in DOM , actually i am nearing to the conclusion but i need help. Here is the code.. <?php $target_url = "www.dnschart.com"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs1=$dom->getElementsByTagName("form"); for ($i = 0; $i < $hrefs1->length; $i++) { $href = $hrefs1->item($i); $url = $href->getAttribute('action'); echo $url; echo "<br>"; flush(); $hrefs2=$dom->getElementsByTagName("input"); for ($j = 0;$j < $hrefs2->length; $j++) { $hrefx = $hrefs2->item($j); $urlx = $hrefx->getAttribute('text'); if(strtolower($urlx)=='text') { $urlx = $hrefx->getAttribute('name'); echo $urlx; flush(); } } } ?> Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328181 Share on other sites More sharing options...
Azu Posted August 19, 2007 Share Posted August 19, 2007 I need to extract the textbox names and form names under the form tag of any html source file of a site. Consider the code <form action="new.asp" method="post"> <input type="text" name="txt1"> </form> In this code , i have to extract new.asp and txt1. Please help. Wow... THIS is the greatest problem on Earth!? AMAZING!!! Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328204 Share on other sites More sharing options...
d.shankar Posted August 20, 2007 Author Share Posted August 20, 2007 Could someone help me. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328650 Share on other sites More sharing options...
rea|and Posted August 20, 2007 Share Posted August 20, 2007 Try something like this... I've tried this code only against the last page you've posted. Currently the regexps work only with double/single quotes (so not something like name=namefield). <pre><?php $text='your html code'; $form=array(); preg_replace_callback('/<form[^>]+action=("|\')(.+?)(\\1).+?<\/form>/is','cb_form',$text); function cb_form($mth){ global $form; preg_match_all('/<input(?=[^>]+type="text")[^>]+name=("|\')(.+?)(\\1)/is',$mth[0],$names); // for more than one type attribute // '/<input(?=[^>]+type="(?:text|hidden)")[^>]+name=("|\')(.+?)(\\1)/is' $form[$mth[2]]=$names[2]; } print_r($form); ?></pre> Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328659 Share on other sites More sharing options...
d.shankar Posted August 20, 2007 Author Share Posted August 20, 2007 Currently the regexps work only with double/single quotes (so not something like name=namefield). What does that mean ? preg_match_all('/<input(?=[^>]+type="text")[^>]+name=("|\')(.+?)(\\1)/is',$mth[0],$names); this regex works only for type="text" , if i need for both "password" and "text" where i should i add ? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328667 Share on other sites More sharing options...
rea|and Posted August 20, 2007 Share Posted August 20, 2007 Currently the regexps work only with double/single quotes (so not something like name=namefield). What does that mean ? preg_match_all('/<input(?=[^>]+type="text")[^>]+name=("|\')(.+?)(\\1)/is',$mth[0],$names); this regex works only for type="text" , if i need for both "password" and "text" where i should i add ? It means that if some forms don't use quotes, like google, my code doesn't match the name fields. I could add if you need it, but for now let's check if it works. For the password problem I wrote a comment just below the preg_match_all line to explain how to add more than one attribute. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328675 Share on other sites More sharing options...
d.shankar Posted August 20, 2007 Author Share Posted August 20, 2007 yes realand it works i changed the regex as you told to do in the comment. so how can you make it to work with google.com ... ? also i have this site http://www.vizual.co.in/enquiry_form.html where the input boxes are available but their source is in this way <INPUT name=name22 id=name22 value="" class="formfield"> there is no type="text" attribute . so can you change the regex ? thanks for the advanced help. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328678 Share on other sites More sharing options...
MadTechie Posted August 20, 2007 Share Posted August 20, 2007 heres one for extraction, i broke it down to make it a little easier <?php $data ="";//the html page //get input preg_match_all('/<input ([^>]*)>/si', $data, $regs, PREG_PATTERN_ORDER); $inputs = $regs[1]; foreach($inputs as $input) { $key = "type"; if (preg_match('/type\s?=\s?(?:\'|")?(\w+)(?:\'|")?/si', $input, $regs)) { $type = $regs[1]; }else{ $type = ""; } echo "$key=$type|"; //----other key $key = "name"; if (preg_match('/type\s?=\s?(?:\'|")?(\w+)(?:\'|")?/si', $input, $regs)) { $type = $regs[1]; }else{ $type = ""; } echo "$key=$type<br />"; } ?> problem is my client was just on the phone so i have to do some paid for work.. will look at this later to night Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328681 Share on other sites More sharing options...
d.shankar Posted August 20, 2007 Author Share Posted August 20, 2007 Thank you MT but form names are not being fetched .. there should be a mutual link between the forms and input variable. Anyway thanks i am still waiting Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328684 Share on other sites More sharing options...
rea|and Posted August 20, 2007 Share Posted August 20, 2007 yes realand it works i changed the regex as you told to do in the comment. so how can you make it to work with google.com ... ? also i have this site http://www.vizual.co.in/enquiry_form.html where the input boxes are available but their source is in this way <INPUT name=name22 id=name22 value="" class="formfield"> there is no type="text" attribute . so can you change the regex ? thanks for the advanced help. Back from lunch. That page you posted doesn't have any form tags, so regex can't match anything, instead I've modified the regexp to match no type att. or no quote cases, it seems to work: <pre><?php $form=array(); preg_replace_callback('/<form[^>]+action=("|\')(.+?)(\\1).+?<\/form>/is','cb_form',$text); function cb_form($mth){ global $form; preg_match_all('/<input(?(?=[^>]+type=)(?=[^>]+type=(?(?="|\')("|\')(?:text|password)(?:\\1)|(?:text|password))))[^>]+name=(?(?="|\')("|\')(.+?)(?:\\2)|(\S+))/is',$mth[0],$names); $form[$mth[2]]=($names[2][0]!='')?$names[3]:$names[4]; } print_r($form); ?></pre> Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328772 Share on other sites More sharing options...
d.shankar Posted August 20, 2007 Author Share Posted August 20, 2007 Thanks a lot buddy.. your array declarations are really tricky ??? How can i access them one by one ? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-328972 Share on other sites More sharing options...
d.shankar Posted August 21, 2007 Author Share Posted August 21, 2007 Regex is really making me mad. Its not matching for all sites. Do you guys easy with DOM ? Here is a small code that returns form names or actions of anywebsites without any restrictions. <?php $target_url = "www.dnschart.com"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); $dom = new DOMDocument(); @$dom->loadHTML($html); $hrefs = $dom->getElementsByTagName('form'); foreach($hrefs as $href) { echo $href->getAttribute('action'); echo "<br>"; } ?> This code works fine but i am unable to access the child nodes ( i.e. type=text variables ) Is it possible to proceed with this code and putting this to work ? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-329579 Share on other sites More sharing options...
d.shankar Posted August 21, 2007 Author Share Posted August 21, 2007 any ideas and suggestions ??? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-329742 Share on other sites More sharing options...
effigy Posted August 21, 2007 Share Posted August 21, 2007 Try using childNodes like this example. Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-329905 Share on other sites More sharing options...
d.shankar Posted August 22, 2007 Author Share Posted August 22, 2007 I tried my best but i still cant figure out. I am unable to fetch the input variables of the form respectively. <?php $target_url = "www.dnschart.com"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); $dom = new DOMDocument(); @$dom->loadHTML($html); $params = $dom->getElementsByTagName('form'); foreach ($params as $param) { echo $param -> getAttribute('name').'<br>'; if($param->hasChildNodes()) { echo "true"; echo "<br>"; $children = $param->childNodes; echo $children->getElementsByTagName('input').'<br>'; foreach($children as $child) { echo $child->getAttribute('name'); } } else { echo "false"; echo "<br>"; } ?> ??? Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-330603 Share on other sites More sharing options...
shoaiblatif Posted September 5, 2007 Share Posted September 5, 2007 Dear shankar; I have solved your problem. I could show ur desire results. Contact with me at this id [email protected]. I will be available to u frm 4:00p.m to 10:00 p.m (+5 GMT) Regards, Shoaib Latif Link to comment https://forums.phpfreaks.com/topic/65381-greatest-problem-on-earth/#findComment-342088 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.