Jump to content

Remove WYSIWYG &nbsp with regex


Hybride

Recommended Posts

I have a WYSIWYG editor that converts the text to HTML format after submitting.

 

I have text that looks like this:

Test with:

^blah blah/XLR

^Another test/?                     

Synonyms 

Some Syndrome 

^ED test syndrome

Synonyms/acronyms

Test

^Another

 

Which would get converted to something ugly like this:

<p>
                              Test with:<br />
                                    ^blah blah/XLR                                    <br />
                                    ^Another test/?                      <br />
                                        Synonyms   <br />
                                            Some Syndrome  <br />
                                    ^ED test syndrome<br />
                                        Synonyms/acronyms<br />
                                            Test<br />
                                            ^Another</p>

 

Doing this regex

\^(.*)

Gets me the ^words that I need, but also gets me the &nbsps that I don't on the right hand side. How can I modify the regex to only get before and not including the   ?

Link to comment
https://forums.phpfreaks.com/topic/266552-remove-wysiwyg-nbsp-with-regex/
Share on other sites

There are several solutions.

 

1. Use str_replace() to remove the   after extracting the line you need

Does every line with '^' have an ' ' at the end? If so,

2. If you don't expect the ampersand to be included in the data you can change the regex to \^([^&]*)\

3. If the ampersand can be in the data you need then you can use this regex \^([.*) \ - may need to escape some of those characters

Thanks, Psycho! I actually changed my regex to this:

\^(.*)(|( ))

And for now, looks like it's working.

 

The '^' is at the beginning of each line that I need to do the regex on, but &nbsp may or may not be there (could be just a line break). Realizing this, I modded to include two '^' in case anyone used it within a paragraph (such as '^test this^'). The regex for that is

((\^(.*)\^))|\^(.*)(|( ))|

 

 

/^((?: |\s)+.*?) /

That's a slightly better RegExp, as it'll match one or more " " or space, as many as it can, before finding content which isn't. Then it adds that content, until it hit the first " " again (which is not included).

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.