beta0x64 Posted May 22, 2010 Share Posted May 22, 2010 Hey guys, do you think that this is a good regex to capture a class declaration? /^(?P<type>abstract\s|final\s)*class\s(?P<name>[a-z0-9_]+)\s*(extends\s(?P<parent>[a-z0-9_]+)\s*)?(implements\s(?P<interfaces>([a-z0-9_,\s])+)\s*)?\{/imS Am I missing anything? Is it possible for a class to be both abstract and final, too? Can I do something to break the interfaces up into named subpatterns that are predictable (inter1, inter2, inter3, etc.), or will I have to do that with explode or something as I'm assuming? Output on example subject: Array ( [0] => Array ( [0] => class patsSQL extends MySQL implements patsInfo, patsDisplay { [1] => class Controller { ) [type] => Array ( [0] => [1] => ) [1] => Array ( [0] => [1] => ) [name] => Array ( [0] => patsSQL [1] => Controller ) [2] => Array ( [0] => patsSQL [1] => Controller ) [3] => Array ( [0] => extends MySQL [1] => ) [parent] => Array ( [0] => MySQL [1] => ) [4] => Array ( [0] => MySQL [1] => ) [5] => Array ( [0] => implements patsInfo, patsDisplay [1] => ) [interfaces] => Array ( [0] => patsInfo, patsDisplay [1] => ) [6] => Array ( [0] => patsInfo, patsDisplay [1] => ) [7] => Array ( [0] => [1] => ) ) Quote Link to comment Share on other sites More sharing options...
Mchl Posted May 22, 2010 Share Posted May 22, 2010 Class cannot be both abstract and final. Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted May 22, 2010 Share Posted May 22, 2010 What exactly are you trying to do? Any reason why you can't use reflection to get information about classes? Quote Link to comment Share on other sites More sharing options...
beta0x64 Posted May 23, 2010 Author Share Posted May 23, 2010 Well, I'm trying to make a program that will split source code files into classes, functions, and everything else. const C_pattern = "/^(?P<type>abstract\s+|final\s+)?class\s(?P<name>[a-z_][a-z0-9_]*)\s*(extends\s(?P<parent>[a-z0-9_]+)\s*)?(implements\s(?P<interfaces>([a-z0-9_,\s])+)\s*)?\{/imS"; const F_pattern = "/^function\s+(?P<name>[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)\s*\((?P<operands>[\$a-z0-9_,\s]*)\)\s*\{/imS"; The one thing I am concerned about is functions inside of functions, classes inside of functions, functions inside of classes, etc. if the user does not use proper tabs. I think that I can determine the offset of the match, then replace that match with a require_once(), like nothing happened! This would work even inside of a class, correct? In order to handle the functions inside of classes problem, well I just parse classes first and delete them! (Don't worry, I plan on doing all of this inside a tmp file, so deletion is not a problem) What do you guys think? Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted May 23, 2010 Share Posted May 23, 2010 You know, there is no need to write a PHP parser yourself; there is already one in... PHP. Consider the following file (test.php): <?php function foo() { echo 'hello world'; } echo 2 + 4; function hello () { // aslkjdlakjds return 'abc' + 1;} echo 'more junk here'; ?> Then you can extract all the functions in that file like this: <?php $tokens = token_get_all(file_get_contents('test.php')); $started = false; $open = 0; $functions = array(); $tmp = ''; foreach ($tokens as $token) { if (!$started && $token[0] === T_FUNCTION) { $started = true; $tmp .= $token[1]; } else if ($started) { if (is_array($token)) { $tmp .= $token[1]; } else { $tmp .= $token; if ($token === '{') { ++$open; } else if ($token === '}') { if (--$open === 0) { $started = false; $functions[] = $tmp; $tmp = ''; } } } } } var_dump($functions); The output of running that would be: array(2) { [0]=> string(42) "function foo() { echo 'hello world'; }" [1]=> string(79) "function hello () { // aslkjdlakjds return 'abc' + 1;}" } Quote Link to comment Share on other sites More sharing options...
beta0x64 Posted May 23, 2010 Author Share Posted May 23, 2010 Awww, but now how will I show off my l33t regex skillz? Anyway, the Tokenizer also grabs functions inside of classes (but strangely not functions inside of functions, hmmm), which is not what I want, per se. I think I should stick with my current M.O., especially because I've already coded most of it... Thanks, though! Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted May 23, 2010 Share Posted May 23, 2010 Well, it was not meant to be a complete solution for you. You were supposed to extend it in the same manner to capture classes. Take a look at how it works. When a function declaration starts it ignores the tokens and just adds them to the string until the function declaration ends. You can do the same with classes. When a class declaration starts, ignore everything until it ends. Quote Link to comment Share on other sites More sharing options...
beta0x64 Posted May 23, 2010 Author Share Posted May 23, 2010 I understand that; I just find that my way is actually easier. Quote Link to comment Share on other sites More sharing options...
salathe Posted May 23, 2010 Share Posted May 23, 2010 I just find that my way is actually easier. Yet you come here asking for help with it? :S Quote Link to comment Share on other sites More sharing options...
Daniel0 Posted May 23, 2010 Share Posted May 23, 2010 For what it's worth, this will extract functions, classes and interfaces from a file. Took me ten minutes to write. I bet it took longer time writing those regular expressions. Sample file (test.php): <?php function foo() { echo 'hello world'; } echo 2 + 4; function hello () { // aslkjdlakjds return 'abc' + 1;} class Hello { function thisMethodWontGetIncludedInFunctions() { echo 'foo'; } } abstract class Foo {} final class Bar {} interface Baz {} echo 'more junk here'; ?> Parser: <?php class FileParser { private $_path; private $_parsed = false; private $_classes = array(); private $_functions = array(); private $_interfaces = array(); public function __construct($path) { $this->_path = $path; } private function _parse() { if ($this->_parsed) return; $parsing = null; $tmp = ''; $open = 0; foreach (token_get_all(file_get_contents($this->_path)) as $token) { if ($parsing === null && is_array($token)) { switch ($token[0]) { case T_FUNCTION: $parsing = T_FUNCTION; break; case T_CLASS: case T_ABSTRACT: case T_FINAL: $parsing = T_CLASS; break; case T_INTERFACE: $parsing = T_INTERFACE; break; } if ($parsing !== null) $tmp .= $token[1]; } else { if (is_array($token)) { $tmp .= $token[1]; } else { $tmp .= $token; switch ($token) { case '{': ++$open; break; case '}': if (--$open === 0) { switch ($parsing) { case T_FUNCTION: $this->_functions[] = $tmp; break; case T_CLASS: $this->_classes[] = $tmp; break; case T_INTERFACE: $this->_interfaces[] = $tmp; break; } $parsing = null; $tmp = ''; } break; } } } } $this->_parsed = true; } public function getClasses() { $this->_parse(); return $this->_classes; } public function getFunctions() { $this->_parse(); return $this->_functions; } public function getInterfaces() { $this->_parse(); return $this->_interfaces; } } $parser = new FileParser('test.php'); var_dump( $parser->getFunctions(), $parser->getClasses(), $parser->getInterfaces() ); Output: array(2) { [0]=> string(42) "function foo() { echo 'hello world'; }" [1]=> string(81) "+;function hello () { // aslkjdlakjds return 'abc' + 1;}" } array(3) { [0]=> string(95) "class Hello { function thisMethodWontGetIncludedInFunctions() { echo 'foo'; } }" [1]=> string(21) "abstract class Foo {}" [2]=> string(18) "final class Bar {}" } array(1) { [0]=> string(16) "interface Baz {}" } Quote Link to comment Share on other sites More sharing options...
beta0x64 Posted May 24, 2010 Author Share Posted May 24, 2010 OK, looking at it now, the Tokenizer library does in fact add in an offset, which is what my main problem with using it was. I'll use it. Thanks guys! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.