hassank1 Posted January 26, 2011 Share Posted January 26, 2011 I am working on a small script that upload a php file,open it, read it into a string, and then save functions data into database. can you help me to parse the functions using preg ? (seperate function name,parameters,function body) ex1: function x1($p1,$p2) { echo 'test'; echo 'test'; } function x2() { echo 'hi'; } output: array[0]=>x1 , array[1]=> $p1,$p2 , array[2] => echo'test'; echo 'test'; array1[0]=>x2 , array1[1]=> (empty) , array1[2] => echo'hi'; etc.. Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/ Share on other sites More sharing options...
50jkelly Posted January 27, 2011 Share Posted January 27, 2011 Hey hassank1. I had fun with this regex, it's a cool idea! I've come up with a regex that produces the output you want: /^\s*function ([\w\-]*)\((.*)(?=\))\)[\s\r\n]*\{([\w\s\r\n\{\}\$\(\)\=\'\"\!;]*)(?=\}[\s\r\n]*$)/ Using this in PHP: preg_match('/^\s*function ([\w\-]*)\((.*)(?=\))\)[\s\r\n]*\{([\w\s\r\n\{\}\$\(\)\=\'\"\!;]*)(?=\}[\s\r\n]*$)/', $test_string, $matches In this case the $matches array will contain: [0]: The whole function matched, [1]: Function Name, [2]: Parameters, [3]: Function body It's a bit messy and could probably do with some tidying up, but try it and see what you think. If you want it explained just let me know Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1165743 Share on other sites More sharing options...
hassank1 Posted January 27, 2011 Author Share Posted January 27, 2011 thank you, I will check it now Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1165875 Share on other sites More sharing options...
hassank1 Posted January 27, 2011 Author Share Posted January 27, 2011 in ([\w\-]*) <== shouldn't be '_' instead of '-' ? because you are not allowed to put '-' in functions' name in php Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1165877 Share on other sites More sharing options...
50jkelly Posted January 27, 2011 Share Posted January 27, 2011 Doh! You're right, although you don't need \_ as its covered by \w Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1165880 Share on other sites More sharing options...
hassank1 Posted January 27, 2011 Author Share Posted January 27, 2011 I have tested it on multilple functions in the same file, it didnt work correctly, and there's a problem when I have nested { } inside a specific function. by the way, I had to remove the $ from the end of the regex in order for the regex to work.. can you help me to solve the nested {} issue ? Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1165935 Share on other sites More sharing options...
50jkelly Posted January 27, 2011 Share Posted January 27, 2011 Hey, I had been thinking about this and I believe this regex is better (it allows any characters in the function body, and should fix your problem with nested brackets). /^\s*function (\w*)\(([\$\w\s\,\-]*)(?=\))\)[\s\r\n]*\{(.*)(?=\}[\s\r\n])/s You're right though, this expression only works on one function at a time, I'll think about this today and see what I can come up with. Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1165937 Share on other sites More sharing options...
.josh Posted January 27, 2011 Share Posted January 27, 2011 There is no way to gracefully handle nested {..} (or any nested delimiters for that matter) with regex, which is same reason why you shouldn't use regex to try and parse/traverse a DOM (you should instead use DOM for that). The easiest way to do this with multiple functions would be to place and look for a unique delimiter between functions, pretty much anything will work, but as an example: /** function x1 **/ function x1($p1,$p2) { echo 'test'; echo 'test'; } /** function x2 **/ function x2() { echo 'hi'; } You can then use that comment tag as a marker to know the boundaries of each function. Based on the above example, here is my take: preg_match_all("~function\s+(\w+)\s*\(([^\)]*)\)\s*\{([^}]*)\}\s*(?:/\*\* function[^/]+/|$)~is",$data,$matches); array_shift($matches); $n = count($matches[0]); for ($c = 0; $c < $n; $c++) { $functions[] = array('name' => trim($matches[0][$c]), 'args' => trim($matches[1][$c]), 'body' => trim($matches[2][$c])); } echo "<pre>";print_r($functions); echo "</pre>"; output: Array ( [0] => Array ( [name] => x1 [args] => $p1,$p2 [body] => echo 'test'; echo 'test'; ) [1] => Array ( [name] => x2 [args] => [body] => echo 'hi'; ) ) Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1166025 Share on other sites More sharing options...
.josh Posted January 27, 2011 Share Posted January 27, 2011 okay scratch that above code, I got a bit ahead of myself and then timed out on editing post. This is correct code. Basically you first split at the delimiter you add and then parse each function $fs = preg_split('~/\*\* function[^/]+/~i',$data); array_shift($fs); foreach ($fs as $v) { preg_match("~function\s+(\w+)\s*\(([^\)]*)\)\s*\{(.*)\}~is",$v,$match); $functions[] = array('name' => trim($match[1]), 'args' => trim($match[2]), 'body' => trim($match[3])); } echo "<pre>";print_r($functions);echo "</pre>"; I know you can skip the preg_split and use preg_match_all instead of preg_match if you use a negative lookahead for the delimiter but I haven't quite wrapped my head around how to actually do that, so I opted to preg_split instead :/ Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1166034 Share on other sites More sharing options...
50jkelly Posted January 27, 2011 Share Posted January 27, 2011 Yeah, I've been thinking about this all day and I agree with Crayon Violent. Even if there was an effective way of doing it all in one regex, heaven help anyone who goes back to edit or maintain it, and splitting the functions in php is relatively straightforward... Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1166065 Share on other sites More sharing options...
hassank1 Posted January 30, 2011 Author Share Posted January 30, 2011 thanks guys for the help, I will use the regex you suggested to parse the function, but I want the code to work on any php file (even if it is not mine) that means I can't place a unique delimiter, so I will make a code that will work on any php file, and I will keep you up to date with my progress,and the reason why I am doing this project,it is a good idea and I will share the code as an open source project.. Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1167282 Share on other sites More sharing options...
sasa Posted January 30, 2011 Share Posted January 30, 2011 try <?php $test = 'function x1($p1,$p2) { echo \'test\'; echo \'test\'; } function x2($a) { if(1> 0){ echo \'hi\'; } }'; $test = ' '.$test; $fun = array(); $of = 0; while($sta = stripos($test, 'function', $of)){ $ob = stripos($test, '(', $sta+=7); $name = substr($test, $sta+1, $ob-$sta-1); $cb = stripos($test, ')',$ob); $param = substr($test, $ob+1, $cb-$ob-1); $cnt = 1; $start = strpos($test, '{', $cb); $ss = $start; while ($cnt){ $s = strpos($test, '{', $ss+1); $e = strpos($test, '}', $ss+1); if($s < $e and $s > 0){$cnt++; $ss = $s;} else {$cnt--; $ss = $e;} } $end = $ss; $body = substr($test, $start+1, $end - $start-1); $of = $ob; $fun[]= array(trim($name), trim($param), trim($body)); } print_r($fun); ?> Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1167295 Share on other sites More sharing options...
hassank1 Posted February 2, 2011 Author Share Posted February 2, 2011 thanks sasa, I still need a small regex, I want a regex that can detect a line that contains (require,require_once,etc..) and extract the file that was included, and it should work with ' and " ex: it should detect require("file.php"); => output1:require, output2: file.php require_once('file.php'); => output1:require_once, output2: file.php etc.. Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1168852 Share on other sites More sharing options...
BlueSkyIS Posted February 2, 2011 Share Posted February 2, 2011 $line1 = 'require("file.php");'."\n"; $line2 = "require_once('file.php');\n"; $pattern = '/^(require_once|require|include_once|include)\((.*)\);/'; $res1 = preg_match($pattern,$line1,$matches1); $res2 = preg_match($pattern,$line2,$matches2); print_r($matches1); echo "<br />"; print_r($matches2); output: Array ( [0] => require("file.php"); [1] => require [2] => "file.php" ) Array ( [0] => require_once('file.php'); [1] => require_once [2] => 'file.php' ) Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1169037 Share on other sites More sharing options...
DavidAM Posted February 2, 2011 Share Posted February 2, 2011 Have you considered using the built-in PHP Tokenizer? It takes some work to understand the results, but it parses PHP into an array that you might be able to use. Quote Link to comment https://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/#findComment-1169103 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.