Jump to content

Find XML namespace and replace


ILMV

Recommended Posts

Hello all,

 

I need to find this:

 

<somerandomtext:somemoretext

 

and replace with this:

 

<somerandomtext_somemoretext

 

Where the : changes to a _, I don't want to search the entire document, just where a < exists, with any number of alphanumeric characters exists and then a :, nothing else.

 

 

I am a super n00b with regex, need any more info I can provide it.

 

 

Many thanks,

Ben

Link to comment
https://forums.phpfreaks.com/topic/171763-find-xml-namespace-and-replace/
Share on other sites

I have found this, which removes the : all together, which I guess solves the problem, but if anyone can help me modify it so it replaces it with an underscore that would be fantastic.

 

$namespaceFree = preg_replace('/([<<\/])([a-z0-9]+):/i','$1$2',$xml);

Using what you have, just add the underscore into the replacement param:

 

$namespaceFree = preg_replace('/([<<\/])([a-z0-9]+):/i','$1_$2',$xml);

 

Edit: No my mistake, didn't think about what the regexp was doing. 2 minutes!

 

Edit 2: This should do the trick:

 

$namespaceFree = preg_replace('/<([a-z0-9]+)[a-z0-9]+)/i', '<$1_$2', $xml);

Thanks MrAdam!

 

I have tweaked it a bit to catch </ as well:

 

$xml = preg_replace('/<([\w]+)[\w]+)/', '<$1_$2', $xml);
$xml = preg_replace('/<\/([\w]+)[\w]+)/', '</$1_$2', $xml);

 

 

Thanks again :)

 

p.s. learnt a bit about regex too ;)

Heh no problem. I actually changed it from the \w character class after I realized that would match underscores as well (remembering that you'd said alphanumerical).

 

To build on yours you could quite easily turn that into one single replace:

 

$xml = preg_replace('/<(\/)?([a-z0-9]+)[a-z0-9]+)/i', '<$1$2_$3', $xml);

To build on yours you could quite easily turn that into one single replace:

 

$xml = preg_replace('/<(\/)?([a-z0-9]+)[a-z0-9]+)/i', '<$1$2_$3', $xml);

 

This could also be simplified to:

$xml = preg_replace('#</?[a-z0-9]+\K:#i', '_', $xml);

 

This assumes you don't need the match the whole tag, but only the front end portion of it, as in </randomText: which would become </randomText_

This way, we get to axe the need for captures all together. If however we need to match both sides of the colon, the pattern could become:

 

$xml = preg_replace('#</?[a-z0-9]+\K:(?=[a-z0-9]+)#i', '_', $xml);

Yeah, I just only learned of that one myself not too long ago.. Comes in nice and handy more often than I realise (it's my new regex best friend... that is till I find something cooler and more useful down the line ;) ). I wrote a blog post about it here in case it can help you out further.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.