Jump to content

php code regex


beta0x64

Recommended Posts

Hey guys, do you think that this is a good regex to capture a class declaration?

 

/^(?P<type>abstract\s|final\s)*class\s(?P<name>[a-z0-9_]+)\s*(extends\s(?P<parent>[a-z0-9_]+)\s*)?(implements\s(?P<interfaces>([a-z0-9_,\s])+)\s*)?\{/imS

 

Am I missing anything? Is it possible for a class to be both abstract and final, too? Can I do something to break the interfaces up into named subpatterns that are predictable (inter1, inter2, inter3, etc.), or will I have to do that with explode or something as I'm assuming? :shrug:

 

Output on example subject:

 

Array
(
    [0] => Array
        (
            [0] => class patsSQL extends MySQL implements patsInfo, patsDisplay {
            [1] => class Controller {
        )

    [type] => Array
        (
            [0] =>
            [1] =>
        )

    [1] => Array
        (
            [0] =>
            [1] =>
        )

    [name] => Array
        (
            [0] => patsSQL
            [1] => Controller
        )

    [2] => Array
        (
            [0] => patsSQL
            [1] => Controller
        )

    [3] => Array
        (
            [0] => extends MySQL
            [1] =>
        )

    [parent] => Array
        (
            [0] => MySQL
            [1] =>
        )

    [4] => Array
        (
            [0] => MySQL
            [1] =>
        )

    [5] => Array
        (
            [0] => implements patsInfo, patsDisplay
            [1] =>
        )

    [interfaces] => Array
        (
            [0] => patsInfo, patsDisplay
            [1] =>
        )

    [6] => Array
        (
            [0] => patsInfo, patsDisplay
            [1] =>
        )

    [7] => Array
        (
            [0] =>
            [1] =>
        )

)

Link to comment
https://forums.phpfreaks.com/topic/202598-php-code-regex/
Share on other sites

Well, I'm trying to make a program that will split source code files into classes, functions, and everything else.

 

const C_pattern = "/^(?P<type>abstract\s+|final\s+)?class\s(?P<name>[a-z_][a-z0-9_]*)\s*(extends\s(?P<parent>[a-z0-9_]+)\s*)?(implements\s(?P<interfaces>([a-z0-9_,\s])+)\s*)?\{/imS";
const F_pattern = "/^function\s+(?P<name>[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*\((?P<operands>[\$a-z0-9_,\s]*)\)\s*\{/imS"; 

 

The one thing I am concerned about is functions inside of functions, classes inside of functions, functions inside of classes, etc. if the user does not use proper tabs. I think that I can determine the offset of the match, then replace that match with a require_once(), like nothing happened! This would work even inside of a class, correct?

 

In order to handle the functions inside of classes problem, well I just parse classes first and delete them! (Don't worry, I plan on doing all of this inside a tmp file, so deletion is not a problem)

 

What do you guys think?

 

 

Link to comment
https://forums.phpfreaks.com/topic/202598-php-code-regex/#findComment-1062267
Share on other sites

You know, there is no need to write a PHP parser yourself; there is already one in... PHP.

 

Consider the following file (test.php):

<?php
function foo() {
    echo 'hello world';
}

echo 2 + 4;

function
hello
()
{
    // aslkjdlakjds
    return 'abc' +
        
        1;}

echo 'more junk here';
?>

 

Then you can extract all the functions in that file like this:

<?php
$tokens = token_get_all(file_get_contents('test.php'));

$started = false;
$open = 0;
$functions = array();
$tmp = '';

foreach ($tokens as $token) {
    if (!$started && $token[0] === T_FUNCTION) {
        $started = true;
        $tmp .= $token[1];
    }
    else if ($started) {
        if (is_array($token)) {
            $tmp .= $token[1];
        }
        else {
            $tmp .= $token;

            if ($token === '{') {
                ++$open;
            }
            else if ($token === '}') {
                if (--$open === 0) {
                    $started = false;
                    $functions[] = $tmp;
                    $tmp = '';
                }
            }
        }
    }
}

var_dump($functions);

 

The output of running that would be:

array(2) {
  [0]=>
  string(42) "function foo() {
    echo 'hello world';
}"
  [1]=>
  string(79) "function
hello
()
{
    // aslkjdlakjds
    return 'abc' +
        
        1;}"
}

Link to comment
https://forums.phpfreaks.com/topic/202598-php-code-regex/#findComment-1062281
Share on other sites

Awww, but now how will I show off my l33t regex skillz?  ::)

 

Anyway, the Tokenizer also grabs functions inside of classes (but strangely not functions inside of functions, hmmm), which is not what I want, per se. I think I should stick with my current M.O., especially because I've already coded most of it...

 

Thanks, though!

Link to comment
https://forums.phpfreaks.com/topic/202598-php-code-regex/#findComment-1062289
Share on other sites

Well, it was not meant to be a complete solution for you. You were supposed to extend it in the same manner to capture classes.

 

Take a look at how it works. When a function declaration starts it ignores the tokens and just adds them to the string until the function declaration ends. You can do the same with classes. When a class declaration starts, ignore everything until it ends.

Link to comment
https://forums.phpfreaks.com/topic/202598-php-code-regex/#findComment-1062291
Share on other sites

For what it's worth, this will extract functions, classes and interfaces from a file. Took me ten minutes to write. I bet it took longer time writing those regular expressions.

 

Sample file (test.php):

<?php
function foo() {
    echo 'hello world';
}

echo 2 + 4;

function
hello
()
{
    // aslkjdlakjds
    return 'abc' +
        
        1;}

class Hello
{
    function thisMethodWontGetIncludedInFunctions() {
        echo 'foo';
    }
}
abstract class Foo {}
final class Bar {}
interface Baz {}

echo 'more junk here';
?>

 

Parser:

<?php
class FileParser
{
    private $_path;
    private $_parsed = false;

    private $_classes = array();
    private $_functions = array();
    private $_interfaces = array();

    public function __construct($path)
    {
        $this->_path = $path;
    }

    private function _parse()
    {
        if ($this->_parsed) return;

        $parsing = null;
        $tmp = '';
        $open = 0;

        foreach (token_get_all(file_get_contents($this->_path)) as $token) {
            if ($parsing === null && is_array($token)) {
                switch ($token[0]) {
                    case T_FUNCTION:
                        $parsing = T_FUNCTION;
                        break;
                    case T_CLASS:
                    case T_ABSTRACT:
                    case T_FINAL:
                        $parsing = T_CLASS;
                        break;
                    case T_INTERFACE:
                        $parsing = T_INTERFACE;
                        break;
                }
                if ($parsing !== null) $tmp .= $token[1];
            }
            else {
                if (is_array($token)) {
                    $tmp .= $token[1];
                }
                else {
                    $tmp .= $token;

                    switch ($token) {
                        case '{':
                            ++$open;
                            break;
                        case '}':
                            if (--$open === 0) {
                                switch ($parsing) {
                                    case T_FUNCTION:
                                        $this->_functions[] = $tmp;
                                        break;
                                    case T_CLASS:
                                        $this->_classes[] = $tmp;
                                        break;
                                    case T_INTERFACE:
                                        $this->_interfaces[] = $tmp;
                                        break;
                                }
                                $parsing = null;
                                $tmp = '';
                            }
                            break;
                    }
                }
            }
        }

        $this->_parsed = true;
    }

    public function getClasses()
    {
        $this->_parse();
        return $this->_classes;
    }

    public function getFunctions()
    {
        $this->_parse();
        return $this->_functions;
    }

    public function getInterfaces()
    {
        $this->_parse();
        return $this->_interfaces;
    }
}

$parser = new FileParser('test.php');

var_dump(
    $parser->getFunctions(),
    $parser->getClasses(),
    $parser->getInterfaces()
);

 

Output:

array(2) {
  [0]=>
  string(42) "function foo() {
    echo 'hello world';
}"
  [1]=>
  string(81) "+;function
hello
()
{
    // aslkjdlakjds
    return 'abc' +
        
        1;}"
}
array(3) {
  [0]=>
  string(95) "class Hello
{
    function thisMethodWontGetIncludedInFunctions() {
        echo 'foo';
    }
}"
  [1]=>
  string(21) "abstract class Foo {}"
  [2]=>
  string(18) "final class Bar {}"
}
array(1) {
  [0]=>
  string(16) "interface Baz {}"
}

Link to comment
https://forums.phpfreaks.com/topic/202598-php-code-regex/#findComment-1062308
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.