Jump to content

regex to extract php functions data


hassank1

Recommended Posts

I am working on a small script that upload a php file,open it, read it into a string, and then save functions data into database.

 

can you help me to parse the functions using preg ? (seperate function name,parameters,function body)

 

ex1:

function x1($p1,$p2)
{
echo 'test';
echo 'test';
}

function x2()
{
echo 'hi';
}

 

output:

array[0]=>x1 , array[1]=> $p1,$p2 , array[2] => echo'test'; echo 'test';

 

array1[0]=>x2 , array1[1]=> (empty) , array1[2] => echo'hi';

 

etc..

Link to comment
Share on other sites

Hey hassank1.  I had fun with this regex, it's a cool idea!  I've come up with a regex that produces the output you want:

/^\s*function ([\w\-]*)\((.*)(?=\))\)[\s\r\n]*\{([\w\s\r\n\{\}\$\(\)\=\'\"\!;]*)(?=\}[\s\r\n]*$)/

 

Using this in PHP:

preg_match('/^\s*function ([\w\-]*)\((.*)(?=\))\)[\s\r\n]*\{([\w\s\r\n\{\}\$\(\)\=\'\"\!;]*)(?=\}[\s\r\n]*$)/', $test_string, $matches

 

In this case the $matches array will contain: [0]: The whole function matched, [1]: Function Name, [2]: Parameters, [3]: Function body

 

It's a bit messy and could probably do with some tidying up, but try it and see what you think.  If you want it explained just let me know  :D

 

Link to comment
Share on other sites

I have tested it on multilple functions in the same file, it didnt work correctly, and there's a problem when I have nested { } inside a specific function.

by the way, I had to remove the $ from the end of the regex in order for the regex to work..

can you help me to solve the nested {} issue ?

Link to comment
Share on other sites

Hey, I had been thinking about this and I believe this regex is better (it allows any characters in the function body, and should fix your problem with nested brackets).

/^\s*function (\w*)\(([\$\w\s\,\-]*)(?=\))\)[\s\r\n]*\{(.*)(?=\}[\s\r\n])/s

 

You're right though, this expression only works on one function at a time, I'll think about this today and see what I can come up with.

 

Link to comment
Share on other sites

There is no way to gracefully handle nested {..} (or any nested delimiters for that matter) with regex, which is same reason why you shouldn't use regex to try and parse/traverse a DOM (you should instead use DOM for that). 

 

The easiest way to do this with multiple functions would be to place and look for a unique delimiter between functions, pretty much anything will work, but as an example:

 

/** function x1 **/
function x1($p1,$p2)
{
echo 'test';
echo 'test';
}

/** function x2 **/
function x2()
{
echo 'hi';
}

 

You can then use that comment tag as a marker to know the boundaries of each function.  Based on the above example, here is my take:

 

preg_match_all("~function\s+(\w+)\s*\(([^\)]*)\)\s*\{([^}]*)\}\s*(?:/\*\* function[^/]+/|$)~is",$data,$matches);
array_shift($matches);

$n = count($matches[0]);
for ($c = 0; $c < $n; $c++) {
  $functions[] = array('name' => trim($matches[0][$c]),
                       'args' => trim($matches[1][$c]),
                       'body' => trim($matches[2][$c]));
}

echo "<pre>";print_r($functions); echo "</pre>";

 

output:

Array
(
    [0] => Array
        (
            [name] => x1
            [args] => $p1,$p2
            [body] => echo 'test';
echo 'test';
        )

    [1] => Array
        (
            [name] => x2
            [args] => 
            [body] => echo 'hi';
        )

)

 

Link to comment
Share on other sites

okay scratch that above code, I got a bit ahead of myself and then timed out on editing post. 

 

This is correct code.  Basically you first split at the delimiter you add and then parse each function

 

$fs = preg_split('~/\*\* function[^/]+/~i',$data);
array_shift($fs);
foreach ($fs as $v) {
  preg_match("~function\s+(\w+)\s*\(([^\)]*)\)\s*\{(.*)\}~is",$v,$match);
  $functions[] = array('name' => trim($match[1]),
                       'args' => trim($match[2]),
                       'body' => trim($match[3]));
}
echo "<pre>";print_r($functions);echo "</pre>";

 

I know you can skip the preg_split and use preg_match_all instead of preg_match if you use a negative lookahead for the delimiter but I haven't quite wrapped my head around how to actually do that, so I opted to preg_split instead :/

Link to comment
Share on other sites

Yeah, I've been thinking about this all day and I agree with Crayon Violent.  Even if there was an effective way of doing it all in one regex, heaven help anyone who goes back to edit or maintain it, and splitting the functions in php is relatively straightforward...

Link to comment
Share on other sites

thanks guys for the help, I will use the regex you suggested to parse the function, but I want the code to work on any php file (even if it is not mine) that means I can't place a unique delimiter, so I will make a code that will work on any php file, and I will keep you up to date with my progress,and the reason why I am doing this project,it is a good idea and I will share the code as an open source project..

Link to comment
Share on other sites

try

<?php
$test = 'function x1($p1,$p2)
{
echo \'test\';
echo \'test\';
}

function x2($a)
{
    if(1> 0){
        echo \'hi\';
    }
}';
$test = ' '.$test;
$fun = array();
$of = 0;
while($sta = stripos($test, 'function', $of)){ 
    $ob = stripos($test, '(', $sta+=7);
    $name = substr($test, $sta+1, $ob-$sta-1);
    $cb = stripos($test, ')',$ob);
    $param = substr($test, $ob+1, $cb-$ob-1);
    $cnt = 1;
    $start = strpos($test, '{', $cb);
    $ss = $start;
    while ($cnt){
        $s = strpos($test, '{', $ss+1);
        $e = strpos($test, '}', $ss+1);
        if($s < $e  and $s > 0){$cnt++; $ss = $s;} else {$cnt--; $ss = $e;}
    }
    $end = $ss;
    $body = substr($test, $start+1, $end - $start-1);
    $of = $ob;
    $fun[]= array(trim($name), trim($param), trim($body));
}
print_r($fun);
?>

Link to comment
Share on other sites

thanks sasa,

I still need a small regex, I want a regex that can detect a line that contains (require,require_once,etc..) and extract the file that was included, and it should work with ' and "

ex: it should detect

require("file.php"); => output1:require, output2: file.php

require_once('file.php'); => output1:require_once, output2: file.php

etc..

Link to comment
Share on other sites

$line1 = 'require("file.php");'."\n";
$line2 = "require_once('file.php');\n";

$pattern = '/^(require_once|require|include_once|include)\((.*)\);/';
$res1 = preg_match($pattern,$line1,$matches1);
$res2 = preg_match($pattern,$line2,$matches2);

print_r($matches1);
echo "<br />";
print_r($matches2);

 

output:

Array ( [0] => require("file.php"); [1] => require [2] => "file.php" )
Array ( [0] => require_once('file.php'); [1] => require_once [2] => 'file.php' ) 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.