Jump to content

Find XML namespace and replace


ILMV

Recommended Posts

Hello all,

 

I need to find this:

 

<somerandomtext:somemoretext

 

and replace with this:

 

<somerandomtext_somemoretext

 

Where the : changes to a _, I don't want to search the entire document, just where a < exists, with any number of alphanumeric characters exists and then a :, nothing else.

 

 

I am a super n00b with regex, need any more info I can provide it.

 

 

Many thanks,

Ben

Link to comment
Share on other sites

I have found this, which removes the : all together, which I guess solves the problem, but if anyone can help me modify it so it replaces it with an underscore that would be fantastic.

 

$namespaceFree = preg_replace('/([<<\/])([a-z0-9]+):/i','$1$2',$xml);

Link to comment
Share on other sites

Using what you have, just add the underscore into the replacement param:

 

$namespaceFree = preg_replace('/([<<\/])([a-z0-9]+):/i','$1_$2',$xml);

 

Edit: No my mistake, didn't think about what the regexp was doing. 2 minutes!

 

Edit 2: This should do the trick:

 

$namespaceFree = preg_replace('/<([a-z0-9]+)[a-z0-9]+)/i', '<$1_$2', $xml);

Link to comment
Share on other sites

Thanks MrAdam!

 

I have tweaked it a bit to catch </ as well:

 

$xml = preg_replace('/<([\w]+)[\w]+)/', '<$1_$2', $xml);
$xml = preg_replace('/<\/([\w]+)[\w]+)/', '</$1_$2', $xml);

 

 

Thanks again :)

 

p.s. learnt a bit about regex too ;)

Link to comment
Share on other sites

Heh no problem. I actually changed it from the \w character class after I realized that would match underscores as well (remembering that you'd said alphanumerical).

 

To build on yours you could quite easily turn that into one single replace:

 

$xml = preg_replace('/<(\/)?([a-z0-9]+)[a-z0-9]+)/i', '<$1$2_$3', $xml);

Link to comment
Share on other sites

To build on yours you could quite easily turn that into one single replace:

 

$xml = preg_replace('/<(\/)?([a-z0-9]+)[a-z0-9]+)/i', '<$1$2_$3', $xml);

 

This could also be simplified to:

$xml = preg_replace('#</?[a-z0-9]+\K:#i', '_', $xml);

 

This assumes you don't need the match the whole tag, but only the front end portion of it, as in </randomText: which would become </randomText_

This way, we get to axe the need for captures all together. If however we need to match both sides of the colon, the pattern could become:

 

$xml = preg_replace('#</?[a-z0-9]+\K:(?=[a-z0-9]+)#i', '_', $xml);

Link to comment
Share on other sites

Yeah, I just only learned of that one myself not too long ago.. Comes in nice and handy more often than I realise (it's my new regex best friend... that is till I find something cooler and more useful down the line ;) ). I wrote a blog post about it here in case it can help you out further.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.