Jump to content

Strip everything before first occurence of period


Recommended Posts

Hello,

 

I've done quite a bit of searching to try and figure how to accomplish this.

 

We receive strings like the following:

1. Some text with commas and periods.

20. S.A.T.S - School Exam

3523. 5 Stars.

 

Basically, I need to be able to strip everything before the first period, leaving just the underlined text. I do not have much knowledge with regular expressions, so could please someone assist?

 

Regards.

Thanks for you speedy reply, here is an example of data we get from a batch of mp3 song titles. (We run a DJ system.)

 

8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life

 

 

3 Doors Down - Kryptonite
Aerosmith - I Don't Want To Miss A Thing
Coldplay - The Scientist
Coldplay - Trouble
FatBoy Slim - Praise You
FatBoy Slim - Right Here, Right Now
Green Day - Time your life

 

The index numbers can go up into the high thousands, so I cannot specify a range from where it will go up to.

 

Regards.

Try that:

 

<?php

$data = <<<DATA
8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life
DATA;

$result = preg_replace("#^[^\s]+ (.*?)$#m", "$1", $data);

echo $result;

?>

 

 

Orio.

I've added the 'm' modifier, so each line is treated separately (so ^ matches a start of a newline and $ an end of one).

[^\s]+ matches everything until a space is met, so this way it skips the numbers and the dot. Then comes a literal space to match the space that comes after the dot. Then it captures everything until the end of the line (and because it's brackets it "saves" it as $1). The whole pattern is replaced by $1 - so you get only the song names.

 

Orio.

you don't need a regex to do simple task like this.  PHP comes with many string functions you can use.

 

<?php
$data = <<<DATA
8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life
DATA;
foreach ( split("\n",$data) as $k=>$v ){
    $s = explode(".",$v);
    echo $s[0]."\n";
}
?>

Just to add to ghostdog's response, you'd want to use array_shift() to get the first element off.

 

<?php
$data = <<<DATA
8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life
DATA;
foreach ( split("\n",$data) as $k=>$v ){
    $broken = explode(".",$v);
    array_shift($broken);
    $songinfo = implode('', array_map('trim', $broken));
    echo $songinfo;
}
?>

There's no point in capturing more than you need:

 

echo $result = preg_replace('/^.*\.\s+/m', '', $data);

 

Hmm.. I wonder about which method is more efficient Effigy, yours or Orio's.

 

Sure, Orio's solution involves a capture (not sure how 'heavy' this actually is), but when I examine you solution Effigy, I found it interesting that you used .* in conjunction with the m modifier. If I understand this correctly, this implies that from the start of each line (as you are using the m modifier), you match everything to the end of the line, then have regex backtrack character by character until it reaches (and thus matches) the beginning dot and space, and replace that...

 

I wonder aloud which is more work.. all that backtracking, or Orio's straight forward capturing. Looking at Orio's method, it starts matching everything after the first space. Side note, I do wonder about the lazy quantifier in this case.. it may not be necessary?

 

On the onset, I have to admit, I like Orio's solution best of all in this thread (this is just my opinion of course).

 

I guess what I'm getting at, is that even though you use the m modifier, I am weary of .* usage, as it does match as much as it can prior to backtracking (which may or may not be heavily involved, depending on how much backtracking is involved).

 

Perhaps I'm misunderstanding something?

 

Cheers,

 

NRG

^.*\.\s+

 

 

^ is an anchor, meaning from the start of the line

. means anything

* means any amount of times

\. means literal character .

\s means space character (" " for example)

+ means 1 or more times

 

So all combined:

 

From the start of the line, anything until a period and then a space after it.

 

 

The .* doesn't go until the end of the line and back track.  There could, however, be issues if a string such as:

 

1.  Some. Thing here

 

 

That would give back "Thing here".

\s means space character (" " for example)

 

A more complete explanation to those not aware is that it is a shorthand for a character class that encompasses many forms of spaces (such as tabs, literal spaces, return carriages and newlines).

"\s means space character (" " for example)"

 

 

Was meant to read "\s means a space character (" " for example)"

 

Incase you were correcting me.  I know what it means.

 

 

If that wasn't aimed at me, errr... ignore this comment.

This might be even faster:

<?php
$data = <<<DATA
8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life
DATA;
echo preg_replace('/^(?>\d+)\.\s+/m', '', $data);
?>

 

The non-backtracking subpattern for just digits is probably much faster.

DarkWater, I tested your snippet.. nothing displayed.

 

Here is my attempt:

 

$data = <<<DATA
8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life
DATA;

echo preg_replace("#^\d+\. #m", '', $data);

 

So all I did here was from the start (in multiline mode), match all consecutive digits, a dot then a space, and replaced that with nothing.

No backtracking nor capturing involved.

 

I suppose one could also use:

echo preg_replace("#^[^.]+\. #m", '', $data);

 

This would ensure that in the event any initial characters accidentally didn't have only digits before the dot would also be matched.

 

 

 

That's odd, it seemed to have stripped a ' or something.

<?php
$data = <<<DATA
8. 3 Doors Down - Kryptonite
207. Aerosmith - I Don't Want To Miss A Thing
1096. Coldplay - The Scientist
1097. Coldplay - Trouble
1832. FatBoy Slim - Praise You
1833. FatBoy Slim - Right Here, Right Now
2068. Green Day - Time your life
DATA;
echo preg_replace('/^(?>\d+)\.\s+/m', '', $data);
?>

 

Try that.

 

EDIT: Wth.  It still stripped a '.  Add a ' in right after the opening paren of preg_replace().

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.