regex project (ps. Im a noob at it)

scs · February 21, 2007

Ok so I kno just about the ins and outs of php. I own the php core book. All my scripts are custom.... But I havent got it to regex that much. Tho I think its the coolest part php. But since I dont really how to use it. I find some other long way around it. I hope I wont have to do that any more.

Ok so I have this project for myself and I would appreciate all the help I can get.

Im looking over the syntax of the perl functions. I've used them more than the regular ones. Im going to ask some pretty basic regex questions and, if possible, would like ppl to explain in some detail of what their doing. I can look back at the syntax so u dont have to be down to the point. I basically want to learn this as I go through it. Shouldnt be to hard for ppl to answer.

Ok so my script that is going to be passed through the functions will look something like this.. o and I would also like ppl to suggest which function is best.

<name:type prop="attribute" prop2="attribute">
      <name2:type2 prop="attribute" prop2="attribute" />
</name>

You should have figured out by now what im wanting. But let me just start of small and work my way up.

how do I get "name" as am array with a child name. For instance:

<name>
    <name2 />
</name>

So get output like

$output['name']['name2']

Thanx A BUNCH in advance

Zach

effigy · February 22, 2007

That's XML and those are namespaces; an XML parser would be the proper tool for job. (Although, some parsers don't take well to namespaces.)

Are you trying to get the attributes as well?

c4onastick · February 22, 2007

Are you looking for the "attribute" part? Or just name?

Just as a head's up, nested matches are pretty difficult in regex (do-able for 2-3 deep, they get ugly after than). You might want to look into an XML parser like effigy said, it might give you more of what you want.

Other than that, this is how I would tackle this:

preg_match_all('#<(\w+)[^>]*>[^<]*<(\w+)[^<]*</\1>#s', $text, $matches, PREG_SET_ORDER);

That'll give you an array that looks like:

Array (
   Array (
      Whole first match,
      Name1,
      Name2,
   ),
   Array (
      Whole second match,
      Name1,
      Name2,
   ),
...
)

A quick break down:

# is the delimiter that I chose (tells the engine where the pattern starts and ends)
\w is shorthand for the character class [A-Za-z0-9_] (I'm capturing the result with parentheses)
[^>]* means match anything that's not a '>' (lets you get away with using greedy quantifiers)
the \1 at the end is a callback to the first set of parentheses, says match the closing tag for the opening parentheses

Check out the link in effigy's signature to get to some tutorials on regex.

scs · February 22, 2007

Thanks for the responses.

First, yes it is xml. But not standard xml. Notice the : in the name of the tags. Xml parser would not recognize that. Its more complex than xml.

It is weird for me not to just use xml parser. But thats easy. and im looking for something more thant what xml can offer me. Like I said, this is a project. Im going alot further with it once I get the hang of it.

And sorry if I was unclear at the end. Im just starting with grabbing the names of the tags. I work on the attributes once I get this part down.

And thanks for the code and tips c4onastick. I was actually already looking at effigy tutorial. o and btw thanx effigy for the crontab tutorial. I've been looking for one.

anyways, Since ur code was made for parsing nested tags, and u borough out that it can get messy. If u could show me for one line only. That would parse any of the lines. So I loop through each line and parse them into there own arrays. So that later I can compile it as one.

Its already getting complex. Let me kno if I was unclear so that I can rephrase.

Thanx again for the responces.

Zach

c4onastick · February 23, 2007

anyways, Since ur code was made for parsing nested tags, and u borough out that it can get messy. If u could show me for one line only. That would parse any of the lines. So I loop through each line and parse them into there own arrays. So that later I can compile it as one.

Its already getting complex. Let me know if I was unclear so that I can rephrase.

Zach,

That's kinda a mood point (or maybe not, I'm thinking the most general case). If you do it line-by-line you'll have to assume that there's only X number of 'name2's below each 'name1'. The beauty of regex is that you can do the whole thing in one foul swoop. Really what I wrote above is, IMHO, the simplest you can get while still getting the job done. If you just want to match the 'name' attribute (assuming all the 'names' are in the '<name:type' format) then you can use something like this:

preg_match_all('#<(\w+):#i', $text, $matches);

(I purposely avoided lookarounds here so as not to get too complex too fast.)

This wont preserve any of the hierarchy that you have in your data though...

Sign In

regex project (ps. Im a noob at it)

Recommended Posts

scs

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

scs

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information