ryandward Posted May 27, 2011 Share Posted May 27, 2011 I am trying to match all the text in a document, except the date (\d{1,2} (?!\d)\w+ \d{4}) But, then the string must end when the date begins. (?=(\d{1,2} (?!\d)\w+ \d{4})) I cannot for the life of me figure out how to NOT match a general pattern. Quote Link to comment Share on other sites More sharing options...
.josh Posted May 27, 2011 Share Posted May 27, 2011 Instead of trying to match for everything except the date, how about matching just the date and then remove it, using preg_replace ? Or perhaps I misunderstand your overall goal here...perhaps it would help if you should an actual example... Quote Link to comment Share on other sites More sharing options...
ryandward Posted May 27, 2011 Author Share Posted May 27, 2011 Alright here is a snippet from the text, what I am trying to do would be to have it match the bolded text and nothing else: Press Releases: Test results on formaldehyde in noodlefish. December 17, 2009. http://www.info.gov.hk/gia/general/200912/17/P200912170203.htm (accessed January 26, 2010). Formaldehyde found in noodlefish sample. December 17, 2009. http://news.gov.hk/en/category/healthandcommunity/091217/html/091217en05004.htm (accessed January 26, 2010). China – Scallops Contaminated with Paralytic Shellfish Poisoning 27 January 2010 Summary: Pre-packaged frozen half-shell scallops contaminated with Paralytic Shellfish Poisoning (PSP) are being sold at supermarkets in Hong Kong and vicinity. Several cases of PSP have resulted. Synthesis/Analysis: The sample containing the toxin was taken from the Kai Bo Food Supermarket on Wu Kwong Street. The product contained PSP toxin levels of 114 μg/100 g. The current legal allowable level of PSP toxins is 800 μg/kg. Quote Link to comment Share on other sites More sharing options...
ryandward Posted May 27, 2011 Author Share Posted May 27, 2011 But obviously there are going to be hundreds of these types of matches I need in the text. So, I am using preg_match_all. Quote Link to comment Share on other sites More sharing options...
.josh Posted May 27, 2011 Share Posted May 27, 2011 hmm...what about the green part below? Does that count as a match too? Or is that all part of the same block of code and that black bolded part is all you want to match? Press Releases: Test results on formaldehyde in noodlefish. December 17, 2009. http://www.info.gov.hk/gia/general/200912/17/P200912170203.htm (accessed January 26, 2010). Formaldehyde found in noodlefish sample. December 17, 2009. http://news.gov.hk/en/category/healthandcommunity/091217/html/091217en05004.htm (accessed January 26, 2010). China – Scallops Contaminated with Paralytic Shellfish Poisoning 27 January 2010 Summary: Pre-packaged frozen half-shell scallops contaminated with Paralytic Shellfish Poisoning (PSP) are being sold at supermarkets in Hong Kong and vicinity. Several cases of PSP have resulted. Synthesis/Analysis: The sample containing the toxin was taken from the Kai Bo Food Supermarket on Wu Kwong Street. The product contained PSP toxin levels of 114 μg/100 g. The current legal allowable level of PSP toxins is 800 μg/kg. my thought is to use the red stuff below as delimiters, but that would also match the green stuff, but I don't know if that is a separate entry you're trying to match or not Press Releases: Test results on formaldehyde in noodlefish. December 17, 2009. http://www.info.gov.hk/gia/general/200912/17/P200912170203.htm (accessed January 26, 2010). Formaldehyde found in noodlefish sample. December 17, 2009. http://news.gov.hk/en/category/healthandcommunity/091217/html/091217en05004.htm (accessed January 26, 2010). China – Scallops Contaminated with Paralytic Shellfish Poisoning 27 January 2010 Summary: Pre-packaged frozen half-shell scallops contaminated with Paralytic Shellfish Poisoning (PSP) are being sold at supermarkets in Hong Kong and vicinity. Several cases of PSP have resulted. Synthesis/Analysis: The sample containing the toxin was taken from the Kai Bo Food Supermarket on Wu Kwong Street. The product contained PSP toxin levels of 114 μg/100 g. The current legal allowable level of PSP toxins is 800 μg/kg. So here's the thing...you say you want to match the text up until the date...but you don't say what the starting point is, and you seem to only have highlighted one bit of text, even though there are several places where your "match text up until date" is applicable. I guess what I'm getting at is that you need a clear boundary (delimiter) for what you're trying to match, and there are lots of dates in this chunk of text you posted (in different formats, I might add..), so as of right now, I'm not seeing a clear boundary. Also...another question...are you sure this is really what the real content you are trying to match looks like? No html code in there somewhere? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.