Jump to content

Trying to match all text except date


ryandward

Recommended Posts

I am trying to match all the text in a document, except the date

 

(\d{1,2} (?!\d)\w+ \d{4})

 

But, then the string must end when the date begins.

 

(?=(\d{1,2} (?!\d)\w+ \d{4}))

 

I cannot for the life of me figure out how to NOT match a general pattern.

Link to comment
Share on other sites

Alright here is a snippet from the text, what I am trying to do would be to have it match the bolded text and nothing else:

 

Press Releases: Test results on formaldehyde in noodlefish. December 17, 2009. http://www.info.gov.hk/gia/general/200912/17/P200912170203.htm (accessed January 26, 2010). Formaldehyde found in noodlefish sample. December 17, 2009. http://news.gov.hk/en/category/healthandcommunity/091217/html/091217en05004.htm (accessed January 26, 2010). China – Scallops Contaminated with Paralytic Shellfish Poisoning 27 January 2010 Summary: Pre-packaged frozen half-shell scallops contaminated with Paralytic Shellfish Poisoning (PSP) are being sold at supermarkets in Hong Kong and vicinity. Several cases of PSP have resulted. Synthesis/Analysis: The sample containing the toxin was taken from the Kai Bo Food Supermarket on Wu Kwong Street. The product contained PSP toxin levels of 114 μg/100 g. The current legal allowable level of PSP toxins is 800 μg/kg.

 

 

Link to comment
Share on other sites

hmm...what about the green part below? Does that count as a match too? Or is that all part of the same block of code and that black bolded part is all you want to match?

 

Press Releases: Test results on formaldehyde in noodlefish. December 17, 2009. http://www.info.gov.hk/gia/general/200912/17/P200912170203.htm (accessed January 26, 2010). Formaldehyde found in noodlefish sample. December 17, 2009. http://news.gov.hk/en/category/healthandcommunity/091217/html/091217en05004.htm (accessed January 26, 2010). China – Scallops Contaminated with Paralytic Shellfish Poisoning 27 January 2010 Summary: Pre-packaged frozen half-shell scallops contaminated with Paralytic Shellfish Poisoning (PSP) are being sold at supermarkets in Hong Kong and vicinity. Several cases of PSP have resulted. Synthesis/Analysis: The sample containing the toxin was taken from the Kai Bo Food Supermarket on Wu Kwong Street. The product contained PSP toxin levels of 114 μg/100 g. The current legal allowable level of PSP toxins is 800 μg/kg.

 

my thought is to use the red stuff below as delimiters, but that would also match the green stuff, but I don't know if that is a separate entry you're trying to match or not

 

Press Releases: Test results on formaldehyde in noodlefish. December 17, 2009. http://www.info.gov.hk/gia/general/200912/17/P200912170203.htm (accessed January 26, 2010). Formaldehyde found in noodlefish sample. December 17, 2009. http://news.gov.hk/en/category/healthandcommunity/091217/html/091217en05004.htm (accessed January 26, 2010). China – Scallops Contaminated with Paralytic Shellfish Poisoning 27 January 2010 Summary: Pre-packaged frozen half-shell scallops contaminated with Paralytic Shellfish Poisoning (PSP) are being sold at supermarkets in Hong Kong and vicinity. Several cases of PSP have resulted. Synthesis/Analysis: The sample containing the toxin was taken from the Kai Bo Food Supermarket on Wu Kwong Street. The product contained PSP toxin levels of 114 μg/100 g. The current legal allowable level of PSP toxins is 800 μg/kg.

 

So here's the thing...you say you want to match the text up until the date...but you don't say what the starting point is, and you seem to only have highlighted one bit of text, even though there are several places where your "match text up until date" is applicable. I guess what I'm getting at is that you need a clear boundary (delimiter) for what you're trying to match, and there are lots of dates in this chunk of text you posted (in different formats, I might add..), so as of right now, I'm not seeing a clear boundary.

 

Also...another question...are you sure this is really what the real content you are trying to match looks like?  No html code in there somewhere?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.