Jump to content

Struggling with a regex match. Appreciate any help..


Recommended Posts

Here is the text:

 

<div class="left">Lorem Ipsum is simply dummy text of the printing and</div> typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scramble'd it to make-shift type <a href="google.com">specimen book</a> and something [tag]else[/tag].

 

Essentially what I'm trying to do is extract all of the words above while abiding by these rules:

 

1. word can contain dash and apostrophe (scramble'd and make-shift above)

2. word cannot be within a link tag

3. word cannot be within a block tag - [tag]

4. word cannot be part of a tag name or html (class in class=", div, a, tag etc)

 

That's about it. Any advice on where I might start with that? I'm currently experimenting with it but all I have is this:

 

(\s?)[a-zA-Z0-9\'\-]+(\s?|\,|\.)

 

Shameful, I know.

Thanks for the response.

 

Strip tags only removes the tags and so would leave the link text still there. I need to remove all links or tags of a certain type and their content. That is why I would remove the links and text but not remove the content within the div.

 

Thanks.

Please specify what you mean by "within."  Which of these words is not "within" a tag:

WordA <a href="http://www.wordb.com" class="wordc">WordD</a>

If your answer is "wordA and wordD," then remove the tags with strip_tags, emulate the same functionality with the preg_replace /\[[^\]]+\]/, and continue with your matching.

 

If your answer is "wordA only" you're going to have a lot more trouble.  It's going to be extremely difficult, especially when you start doing nested tags or erroneous HTML.

 

-Dan

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.