adp1207 Posted August 10, 2007 Share Posted August 10, 2007 I have a bunch of text, and I'm looking to recognize 2-5 UPPER CASE letters within a paragraph. A sample paragraph is this: ================== United Airlines' parent UAL (UAUA) sprang a surprise in July when it posted second-quarter earnings that handily beat Street forecasts. Another surprise was how the stock has bounced up after UAL emerged from bankruptcy in February, 2006--despite high oil prices, rising competition, and public outcry about the airlines' disruptive flight cancellations and delays. The stock climbed from 21 in February to 49 on July 23, 2007, the day before the market's plunge. ================== The regular expression that I'm currently using is: \(.*?\) However, it recently failed on the following paragraph (for obvious reasons). Regular Expressions are just so pesky! ======= FAILED ON THIS PARAGRAPH ======= Sweden's Elekta, which trades only in Stockholm, is getting more attention from American investors. (Its Bloomberg symbol is EKTAB.) Some 30% of its stock is owned by U.S. institutions, including Fidelity Investments and Lazard (LAZ). And nearly half of the global sales of Elekta's cancer radiation equipment are in the U.S. In that market Elekta is second only to Varian Medical Systems (VAR) in Palo Alto, Calif. No. 3 is Siemens (SI). ===================================== It picked up the following: Its Bloomberg symbol is EKTAB. I was HOPING it would only pick up: EKTAB Any help would be greatly appreciated! Quote Link to comment Share on other sites More sharing options...
adp1207 Posted August 10, 2007 Author Share Posted August 10, 2007 Sorry, I forgot to mention that I strip off the parenthesis in a later piece of code. I also tried this regular expression to no avail: \([A-Z](2-5)\s\) Quote Link to comment Share on other sites More sharing options...
effigy Posted August 10, 2007 Share Posted August 10, 2007 (2-5) is looking to capture the literal "2-5"; what you're looking for is an interval: {2,5}. This should work as long as you only expect to find one stock symbol within a set of parentheses. If you need to look for more, you'll need to perform two searches. <pre> <?php $tests = array( "United Airlines' parent UAL (UAUA) sprang a surprise in July when it posted second-quarter earnings that handily beat Street forecasts. Another surprise was how the stock has bounced up after UAL emerged from bankruptcy in February, 2006--despite high oil prices, rising competition, and public outcry about the airlines' disruptive flight cancellations and delays. The stock climbed from 21 in February to 49 on July 23, 2007, the day before the market's plunge.", "Sweden's Elekta, which trades only in Stockholm, is getting more attention from American investors. (Its Bloomberg symbol is EKTAB.) Some 30% of its stock is owned by U.S. institutions, including Fidelity Investments and Lazard (LAZ). And nearly half of the global sales of Elekta's cancer radiation equipment are in the U.S. In that market Elekta is second only to Varian Medical Systems (VAR) in Palo Alto, Calif. No. 3 is Siemens (SI)." ); foreach ($tests as $test) { preg_match_all('/ \( (?: (?[A-Z]{2,5})|[^)]?) )* \) /x', $test, $matches); print_r($matches); } ?> </pre> Quote Link to comment Share on other sites More sharing options...
adp1207 Posted August 13, 2007 Author Share Posted August 13, 2007 Effigy: Thanks for the reply. What does "?:" mean? I know what the question mark is, but I've never seen the colon ( used. Also, what is /x? Thanks. -Allan Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 14, 2007 Share Posted August 14, 2007 ok in regex you can use parenthes to capture a part of the string but sometime you want to group a few letters but are not going to use them so to speed things up you use (?:blar) instead of (blar).. ie $theData = "hello world"; preg_match_all('/(hello)(world)/si', $theData, $result, PREG_PATTERN_ORDER); $result = $result[0]; print_r($result); returns an array [0] = hello [1] = world $theData = "hello world"; preg_match_all('/(?:hello)(world)/si', $theData, $result, PREG_PATTERN_ORDER); $result = $result[0]; print_r($result); returns an array [0] = world Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.