Jump to content

regex project (ps. Im a noob at it)


Recommended Posts

Ok so I kno just about the ins and outs of php. I own the php core book. All my scripts are custom.... But I havent got it to regex that much. Tho I think its the coolest part php. But since I dont really how to use it. I find some other long way around it. I hope I wont have to do that any more.

 

Ok so I have this project for myself and I would appreciate all the help I can get.

Im looking over the syntax of the perl functions. I've used them more than the regular ones. Im going to ask some pretty basic regex questions and, if possible, would like ppl to explain in some detail of what their doing. I can look back at the syntax so u dont have to be down to the point. I basically want to learn this as I go through it. Shouldnt be to hard for ppl to answer.

 

 

Ok so my script that is going to be passed through the functions will look something like this.. o and I would also like ppl to suggest which function is best.

 

<name:type prop="attribute" prop2="attribute">
      <name2:type2 prop="attribute" prop2="attribute" />
</name>

 

You should have figured out by now what im wanting. But let me just start of small and work my way up.

how do I get "name" as am array with a child name. For instance:

<name>
    <name2 />
</name>

 

So get output like

$output['name']['name2']

 

Thanx A BUNCH in advance

Zach

Link to comment
https://forums.phpfreaks.com/topic/39564-regex-project-ps-im-a-noob-at-it/
Share on other sites

Are you looking for the "attribute" part? Or just name?

 

Just as a head's up, nested matches are pretty difficult in regex (do-able for 2-3 deep, they get ugly after than). You might want to look into an XML parser like effigy said, it might give you more of what you want.

 

Other than that, this is how I would tackle this:

preg_match_all('#<(\w+)[^>]*>[^<]*<(\w+)[^<]*</\1>#s', $text, $matches, PREG_SET_ORDER);

 

That'll give you an array that looks like:

Array (
   Array (
      Whole first match,
      Name1,
      Name2,
   ),
   Array (
      Whole second match,
      Name1,
      Name2,
   ),
...
)

 

A quick break down:

  • # is the delimiter that I chose (tells the engine where the pattern starts and ends)
  • \w is shorthand for the character class [A-Za-z0-9_] (I'm capturing the result with parentheses)
  • [^>]* means match anything that's not a '>' (lets you get away with using greedy quantifiers)
  • the \1 at the end is a callback to the first set of parentheses, says match the closing tag for the opening parentheses

 

Check out the link in effigy's signature to get to some tutorials on regex.

Thanks for the responses.

 

First, yes it is xml. But not standard xml. Notice the : in the name of the tags. Xml parser would not recognize that. Its more complex than xml.

 

It is weird for me not to just use xml parser. But thats easy. and im looking for something more thant what xml can offer me. Like I said, this is a project. Im going alot further with it once I get the hang of it.

 

And sorry if I was unclear at the end. Im just starting with grabbing the names of the tags. I work on the attributes once I get this part down.

 

And thanks for the code and tips c4onastick. I was actually already looking at effigy tutorial. o and btw thanx effigy for the crontab tutorial. I've been looking for one.

 

anyways, Since ur code was made for parsing nested tags, and u borough out that it can get messy. If u could show me for one line only. That would parse any of the lines. So I loop through each line and parse them into there own arrays. So that later I can compile it as one.

Its already getting complex. Let me kno if I was unclear so that I can rephrase.

 

Thanx again for the responces.

Zach

anyways, Since ur code was made for parsing nested tags, and u borough out that it can get messy. If u could show me for one line only. That would parse any of the lines. So I loop through each line and parse them into there own arrays. So that later I can compile it as one.

Its already getting complex. Let me know if I was unclear so that I can rephrase.

Zach,

 

That's kinda a mood point (or maybe not, I'm thinking the most general case). If you do it line-by-line you'll have to assume that there's only X number of 'name2's below each 'name1'. The beauty of regex is that you can do the whole thing in one foul swoop. Really what I wrote above is, IMHO, the simplest you can get while still getting the job done. If you just want to match the 'name' attribute (assuming all the 'names' are in the '<name:type' format) then you can use something like this:

preg_match_all('#<(\w+):#i', $text, $matches);

(I purposely avoided lookarounds here so as not to get too complex too fast.)

This wont preserve any of the hierarchy that you have in your data though...

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.