AshleyS Posted October 17, 2008 Share Posted October 17, 2008 Hello, I've done quite a bit of searching to try and figure how to accomplish this. We receive strings like the following: 1. Some text with commas and periods. 20. S.A.T.S - School Exam 3523. 5 Stars. Basically, I need to be able to strip everything before the first period, leaving just the underlined text. I do not have much knowledge with regular expressions, so could please someone assist? Regards. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/ Share on other sites More sharing options...
Orio Posted October 17, 2008 Share Posted October 17, 2008 Can you show an example of raw data you get (in code tags!) and the output you're expecting? Orio. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667872 Share on other sites More sharing options...
AshleyS Posted October 17, 2008 Author Share Posted October 17, 2008 Thanks for you speedy reply, here is an example of data we get from a batch of mp3 song titles. (We run a DJ system.) 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life 3 Doors Down - Kryptonite Aerosmith - I Don't Want To Miss A Thing Coldplay - The Scientist Coldplay - Trouble FatBoy Slim - Praise You FatBoy Slim - Right Here, Right Now Green Day - Time your life The index numbers can go up into the high thousands, so I cannot specify a range from where it will go up to. Regards. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667875 Share on other sites More sharing options...
Orio Posted October 17, 2008 Share Posted October 17, 2008 Try that: <?php $data = <<<DATA 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life DATA; $result = preg_replace("#^[^\s]+ (.*?)$#m", "$1", $data); echo $result; ?> Orio. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667878 Share on other sites More sharing options...
AshleyS Posted October 17, 2008 Author Share Posted October 17, 2008 Thanks for the code, Orio. It works just as I needed. If you have the time, would you be able to explain each segment of what is used in the preg_replace? Regards. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667902 Share on other sites More sharing options...
Orio Posted October 17, 2008 Share Posted October 17, 2008 I've added the 'm' modifier, so each line is treated separately (so ^ matches a start of a newline and $ an end of one). [^\s]+ matches everything until a space is met, so this way it skips the numbers and the dot. Then comes a literal space to match the space that comes after the dot. Then it captures everything until the end of the line (and because it's brackets it "saves" it as $1). The whole pattern is replaced by $1 - so you get only the song names. Orio. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667908 Share on other sites More sharing options...
AshleyS Posted October 17, 2008 Author Share Posted October 17, 2008 Thank you Orio, you've explained it very well and I can understand how it operates. Many thanks. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667911 Share on other sites More sharing options...
effigy Posted October 17, 2008 Share Posted October 17, 2008 There's no point in capturing more than you need: echo $result = preg_replace('/^.*\.\s+/m', '', $data); Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667977 Share on other sites More sharing options...
ghostdog74 Posted October 17, 2008 Share Posted October 17, 2008 you don't need a regex to do simple task like this. PHP comes with many string functions you can use. <?php $data = <<<DATA 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life DATA; foreach ( split("\n",$data) as $k=>$v ){ $s = explode(".",$v); echo $s[0]."\n"; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-667978 Share on other sites More sharing options...
DarkWater Posted October 17, 2008 Share Posted October 17, 2008 Just to add to ghostdog's response, you'd want to use array_shift() to get the first element off. <?php $data = <<<DATA 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life DATA; foreach ( split("\n",$data) as $k=>$v ){ $broken = explode(".",$v); array_shift($broken); $songinfo = implode('', array_map('trim', $broken)); echo $songinfo; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668384 Share on other sites More sharing options...
nrg_alpha Posted October 18, 2008 Share Posted October 18, 2008 There's no point in capturing more than you need: echo $result = preg_replace('/^.*\.\s+/m', '', $data); Hmm.. I wonder about which method is more efficient Effigy, yours or Orio's. Sure, Orio's solution involves a capture (not sure how 'heavy' this actually is), but when I examine you solution Effigy, I found it interesting that you used .* in conjunction with the m modifier. If I understand this correctly, this implies that from the start of each line (as you are using the m modifier), you match everything to the end of the line, then have regex backtrack character by character until it reaches (and thus matches) the beginning dot and space, and replace that... I wonder aloud which is more work.. all that backtracking, or Orio's straight forward capturing. Looking at Orio's method, it starts matching everything after the first space. Side note, I do wonder about the lazy quantifier in this case.. it may not be necessary? On the onset, I have to admit, I like Orio's solution best of all in this thread (this is just my opinion of course). I guess what I'm getting at, is that even though you use the m modifier, I am weary of .* usage, as it does match as much as it can prior to backtracking (which may or may not be heavily involved, depending on how much backtracking is involved). Perhaps I'm misunderstanding something? Cheers, NRG Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668661 Share on other sites More sharing options...
corbin Posted October 18, 2008 Share Posted October 18, 2008 ^.*\.\s+ ^ is an anchor, meaning from the start of the line . means anything * means any amount of times \. means literal character . \s means space character (" " for example) + means 1 or more times So all combined: From the start of the line, anything until a period and then a space after it. The .* doesn't go until the end of the line and back track. There could, however, be issues if a string such as: 1. Some. Thing here That would give back "Thing here". Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668799 Share on other sites More sharing options...
nrg_alpha Posted October 18, 2008 Share Posted October 18, 2008 \s means space character (" " for example) A more complete explanation to those not aware is that it is a shorthand for a character class that encompasses many forms of spaces (such as tabs, literal spaces, return carriages and newlines). Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668836 Share on other sites More sharing options...
corbin Posted October 18, 2008 Share Posted October 18, 2008 "\s means space character (" " for example)" Was meant to read "\s means a space character (" " for example)" Incase you were correcting me. I know what it means. If that wasn't aimed at me, errr... ignore this comment. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668838 Share on other sites More sharing options...
nrg_alpha Posted October 18, 2008 Share Posted October 18, 2008 Nope.. my comment was not directed to you. Just for those who are not aware I realize you used " " as an example (implying there are more versions of spaces). Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668848 Share on other sites More sharing options...
DarkWater Posted October 19, 2008 Share Posted October 19, 2008 This might be even faster: <?php $data = <<<DATA 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life DATA; echo preg_replace('/^(?>\d+)\.\s+/m', '', $data); ?> The non-backtracking subpattern for just digits is probably much faster. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668882 Share on other sites More sharing options...
ghostdog74 Posted October 19, 2008 Share Posted October 19, 2008 Just to add to ghostdog's response, you'd want to use array_shift() to get the first element off. my bad, misread the requirement. foreach ( split("\n",$data) as $k=>$v ){ $s = explode(". ",$v); echo end($s)."\n"; } Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668897 Share on other sites More sharing options...
nrg_alpha Posted October 19, 2008 Share Posted October 19, 2008 DarkWater, I tested your snippet.. nothing displayed. Here is my attempt: $data = <<<DATA 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life DATA; echo preg_replace("#^\d+\. #m", '', $data); So all I did here was from the start (in multiline mode), match all consecutive digits, a dot then a space, and replaced that with nothing. No backtracking nor capturing involved. I suppose one could also use: echo preg_replace("#^[^.]+\. #m", '', $data); This would ensure that in the event any initial characters accidentally didn't have only digits before the dot would also be matched. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-668936 Share on other sites More sharing options...
DarkWater Posted October 19, 2008 Share Posted October 19, 2008 That's odd, it seemed to have stripped a ' or something. <?php $data = <<<DATA 8. 3 Doors Down - Kryptonite 207. Aerosmith - I Don't Want To Miss A Thing 1096. Coldplay - The Scientist 1097. Coldplay - Trouble 1832. FatBoy Slim - Praise You 1833. FatBoy Slim - Right Here, Right Now 2068. Green Day - Time your life DATA; echo preg_replace('/^(?>\d+)\.\s+/m', '', $data); ?> Try that. EDIT: Wth. It still stripped a '. Add a ' in right after the opening paren of preg_replace(). Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-669123 Share on other sites More sharing options...
nrg_alpha Posted October 19, 2008 Share Posted October 19, 2008 Oh yeah.. How did I miss that missing ' character? Must have been brain dead. So we have a couple of solutions at our disposal in this thread. To the OP, pick your poison. Quote Link to comment https://forums.phpfreaks.com/topic/128828-strip-everything-before-first-occurence-of-period/#findComment-669265 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.