Need help getting a value inbetween pattern

andyhajime · May 20, 2010

Hi everyone,

I need some pointers on how I can do this...

This is the content:

Taken on 24th Feb 2010, where [[Azrael]], [[HarryEx]] and [[Krisgage]] ventures in search for a certain 1936 British Outpost which was reported by [[Redstone]].

With a special assistance from a lovely lady from RMBR, the explorers of UESG locates a different sets of 1936 British Outpost that took part the "Battle of Pasir Panjang" history and presented these photos for publicity views.

The photos are contributed, credited and copyrights to [[HarryEx]] and [[Krisgage]], with a special thanks to [[Redstone]] and RMBR.

Notice there's this "[[" & "]]" in the names... I want a write a function that replace whatever in between the [[]] into a url link to their profile. But I having difficulty getting these value out so I can do a SQLquery stuff - eg Azrael, HarryEx, ect.

At first I figure looking at Preg_replace and Ereg_replace but I can't find a good proper tutorial to guide me with.

Please help this poor soul and direct me to which php function I should use for such task.

premiso · May 20, 2010

There is probably a better way to write the regex but here is one that works:

preg_match_all('~\[\[([\w\d]+?)]]~', $content, $matches);

Then looping through $matches[1] with foreach should pull out just the names.

EDIT:

Removed the .* and changed to match only letters and digits. If you need to be "all inclusive" change the [\w\d]+ to be .*

Daniel0 · May 20, 2010

Might want to use replace instead of match though.

premiso · May 20, 2010

Might want to use replace instead of match though.

Maybe, it depends on what he needs. I was going off the way to get the data out for the:

But I having difficulty getting these value out so I can do a SQLquery stuff

But yea, that is probably only half of the solution...

foxsoup · May 20, 2010

EDIT: As I was writing this monolithic post there were three replies with much more direct answers to the problem. Ah well...

---

preg_replace() is definately the way foward with this problem (the ereg functions are being phased out in PHP6, so best to start on the right foot and use preg). Unfortunately regex is a pretty complicated and powerful beast, so I'll stick to addressing the problem in question and try to explain it as best I can.

So I'm assuming that you'd like to change a paragraph of text from this:

Taken on 24th Feb 2010, where [[Azrael]], [[HarryEx]] and [[Krisgage]] ventures in search for a certain 1936 British Outpost which was reported by [[Redstone]].

to something like this:

Taken on 24th Feb 2010, where <a href="Azrael">Azrael</a>, <a href="HarryEx">HarryEx</a> and <a href="Krisgage">Krisgage</a> ventures in search for a certain 1936 British Outpost which was reported by <a href="Redstone">Redstone</a>.

Simply put, you can do this with the following code:

<?php

$data = 'Taken on 24th Feb 2010, where [[Azrael]], [[HarryEx]] and [[Krisgage]] ventures in search for a certain 1936 British Outpost which was reported by [[Redstone]].';

$output = preg_replace('/\[\[(.+)\]\]/iU', '<a href="\1">\1</a>', $data);

echo $output;

?>

Briefly, you can see that the preg_replace function is split into three parts seperated by commas. The third part is pretty obvious - it's the variable containing the data we want to work on. But what do the first two do? And how the hell do you make sense of them?

Well, the first part:

'/\[\[(.+)\]\]/iU'

is the search pattern. Basically you're looking for some text wedged in between sets of [[ and ]]. Regular expressions use symbols called metacharacters which we use when we don't know exactly what we're looking for but we do know the pattern. For example, we know that a date (in the UK anyway) has the format DD/MM/YYYY. We might not know exactly what the date is, but we do know to expect two numbers, followed by a slash, followed by two more numbers, followed by another slash, followed by four numbers. Since I don't know exactly what to expect between your [[ and ]], I can use the combination of metacharacters '.' and '+'. The period basically means 'this can be any single character', while the plus sign following it means 'one or more of what came in front of me'. So '.+' means 'look for one or more of any character'.

So you'd think that the complete regex should be '[[.+]]'. Well unfortunately life isn't quite that simple. You see, the square bracket characters also have special meaning in regex (called a character class), so in order to use them we need to escape them by preceeding each one with a backslash. So now the regex looks like '\[\[.+\]\]'.

Next we want to copy the actual text between the square brackets so we can write it into the <a> tag. We do this using subpatterns and backreferences. We can set a subpattern around the stuff we want to keep by putting regular brackets around it, thus: '\[\[(.+)\]\]'. This means that the blurb in between the square brackets will be available to be inserted into the hyperlink.

Finally the entire regex (in preg fucntions anyway) has to be encapsulated by delimiter characters. These can actually be anything (as long as both are the same) but typically the / character is used. After the ending delimiter we can add modifiers to extend the functionality of the search. In this case I've put in a lowercase 'i' (to make the search case-insensitive) and an uppercase 'U' (to make the search 'ungreedy' - i.e. it will only find the shortest possible match).

For a much better and in-depth description of all that stuff, check this manual section of php.net - http://uk.php.net/manual/en/reference.pcre.pattern.syntax.php

The second part of the preg_replace function is the text that we're replacing the original with, and is comparatively simple:

'<a href="\1">\1</a>'

As you can see it's basically just a <a> tag with the phrase '\1' in it twice. This is the backreference for the subpattern we defined earlier - the bit that we put in regular brackets will be copied into here. If we had defined more than one subpattern then we'd use \1 to reference the first one, \2 for the second, and so on.

I hope at least some of all that made sense, it was certainly a lot more than I originally intended to write! For more details about regex and the many, many, many functions of it, check this site out - http://www.regular-expressions.info

I go drink beer now.

andyhajime · May 29, 2010

Thanks everyone for helping, especially to Foxsoup for writing it out. You're the man!

Sign In

Need help getting a value inbetween pattern

Recommended Posts

andyhajime

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

Daniel0

Link to comment

Share on other sites

premiso

Link to comment

Share on other sites

foxsoup

Link to comment

Share on other sites

andyhajime

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information