johnsmith153 Posted September 16, 2012 Share Posted September 16, 2012 I have this: $value = '/\\' . $currency . '[0-9]*[.]?[0-9]{0,2}/'; ..which will get all price information on a page. The problem is if the price is "$1,000" then it only returns $1 - so I need it to also pick up the comma. I'm guessing this is easy, but I can't image how to solve this. Quote Link to comment Share on other sites More sharing options...
Christian F. Posted September 16, 2012 Share Posted September 16, 2012 You need to add a grouping to that RegExp, which allows for up to three digits followed by a thousands separator. However, note that this separator is dependent upon the locale, so if you change currency you might need to change the separator too. Same goes for the decimal separator, btw. For information on how to solve it, you need to learn to write Regular Expressions. PS: This is a PHP RegExp issue, so it would have been better to post it in that section. Quote Link to comment Share on other sites More sharing options...
ignace Posted September 16, 2012 Share Posted September 16, 2012 $value = '/'. preg_quote($currency) .'\s?(\d{1,3}(,\d{3})+|\d{1,3})(\.\d{2,})?/'; You can try it yourself here: http://refiddle.com/4xr Quote Link to comment Share on other sites More sharing options...
ignace Posted September 16, 2012 Share Posted September 16, 2012 If you also want to match negative or positive values: $value = '/'. preg_quote($currency) .'\s?[-+]?(\d{1,3}(,\d{3})+|\d{1,3})(\.\d{2,})?/'; Quote Link to comment Share on other sites More sharing options...
Christian F. Posted September 16, 2012 Share Posted September 16, 2012 Not quite how I'd write it, and it doesn't take the locale differences into consideration, but a good start for the OP none the less. There's one hidden bug with it, in that it allows for a (up to) 6 digit number without any thousands separator. Which may or may not, be desirable. Quote Link to comment Share on other sites More sharing options...
johnsmith153 Posted September 16, 2012 Author Share Posted September 16, 2012 Perfect. Thanks Ignace. How would you write it then ChristianF ? Quote Link to comment Share on other sites More sharing options...
Christian F. Posted September 16, 2012 Share Posted September 16, 2012 Hehe, trying to get something for free, eh? Anyway, I'll bite. $RegExp = '/'.$curSym.'\\s?\\d{1,3}(?:'.$kiloSep.'\\d{3})*(?:'.$decSep.'\\d{1,2})?/'; This one will give you the following: The currency symbol. 0 or 1 whitespace character (space, normally). 1 to 3 digits. 0 or more of a group consisting of : A thousand separator, followed by 3 digits. 0 or 1 of a group consisting of: A decimal separator, followed by 1 or 2 digits. PS: Note that I've worked on the assumption that all three variables have been run through preg_quote () first. Ideally you'd do that only when constructing the RegExp, like in ignance example, but I chose not to do it here for readability. Quote Link to comment Share on other sites More sharing options...
ignace Posted September 16, 2012 Share Posted September 16, 2012 it doesn't take the locale differences into consideration The OP did not mention he needed that. And if he does want it that is easily fixed by swapping the . and , There's one hidden bug with it, in that it allows for a (up to) 6 digit number without any thousands separator. Which may or may not, be desirable. You mean like $123456 ? It only matches $123. Can you give an example? Quote Link to comment Share on other sites More sharing options...
Christian F. Posted September 16, 2012 Share Posted September 16, 2012 it doesn't take the locale differences into consideration The OP did not mention he needed that. And if he does want it that is easily fixed by swapping the . and , Afraid it's not that simple, as there are (at least) three different thousands separators in use: Dot, comma and space. Dot and space both use comma as the decimal separator, and as previously mentioned this changes depending upon the chosen locale. As does the currency and its related symbol, which is why you need to have them as easily interchangeable as the currency symbol. Even if the OP didn't specify this, it is a requirement from the fact that he supports multiple currencies (and thus locales). You mean like $123456 ? It only matches $123. Can you give an example? Ah, yes. Upon a closer look it seems I mismatched a couple of parentheses, so it will not allow number of more than 3 digits in length without a separator. Sorry about that. Quote Link to comment Share on other sites More sharing options...
ignace Posted September 16, 2012 Share Posted September 16, 2012 Afraid it's not that simple, as there are (at least) three different thousands separators in use: Dot, comma and space. Dot and space both use comma as the decimal separator, and as previously mentioned this changes depending upon the chosen locale. As does the currency and its related symbol, which is why you need to have them as easily interchangeable as the currency symbol. The same goes for the currency symbol it's not that simple that it is always at the front. Since it revolves around scraping ("which will get all price information on a page.") the symbol might aswell be at the end, it might not even be a symbol just the abbreviation (eg EUR). Since NASA is upping it's efforts maybe to make it complete we should assume that in the near future an alien civilization may enter our solar system and have a different formatting all together, we should account for that too. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.