Jump to content

Remove Office Non-Standard Microsoft Office Tags


jimmyelewis

Recommended Posts

I'm trying to use regular expression to remove some of the non-standard tags from into that is copyed from Microsoft Word to a text area.  So far I have:

[code]
$search = array(
'/<city[^>]*>(.*?)<\/city[^>]*>/is',
'/<place[^>]*>(.*?)<\/place[^>]*>/is',
'/mso-[^"]*|mso-[^;]*mso-[^"]*/is',
'/<formulas>(.*?)<\/formulas>/is',
'/o:[^=]*="[^"]*"/is'
);
$replace = array(
'$1',
'$1',
'',
'',
''
);

echo preg_replace($search,$replace,$row['contents']);
[/code]

One of the the values for $row['contents'] is:
[code]
<p class="MsoNormal"><span class="EmailStyle41"><font face="Arial" color="#003300" size="2"><span style="FONT-SIZE: 10pt">The Semiconductor Power and Electronics Center (SPEC) welcomes Dr. Engelbert Hetzmannseder as he&nbsp;presents an overview of&nbsp;the Eaton Corporation focused on the Eaton Electric Group and an overview of the mission and capabilities of the&nbsp;
<place w:st="on">
<placename w:st="on">Eaton</placename>
<placename w:st="on">Innovation</placename>
<placetype w:st="on">Center in Milwaukee, Wisconsin</placetype>
</place>
. </span></font></span></p>
<span class="EmailStyle41"><font face="Arial" color="#003300" size="2"><span style="FONT-SIZE: 10pt">
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style="mso-bidi-font-weight: normal"><span style="FONT-SIZE: 14pt; mso-bidi-font-size: 10.0pt"><font color="#000000"><font face="Times New Roman">The Speaker</font></font></span></strong></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style="mso-bidi-font-weight: normal"><span style="FONT-SIZE: 14pt; mso-bidi-font-size: 10.0pt"><font color="#000000"><font face="Times New Roman">Engelbert Hetzmannseder</font></font></span></strong></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style="mso-bidi-font-weight: normal"><span style="FONT-SIZE: 14pt; mso-bidi-font-size: 10.0pt"></span></strong></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style="mso-bidi-font-weight: normal"><span style="FONT-SIZE: 14pt; mso-bidi-font-size: 10.0pt"></span></strong><font size="3"><font color="#000000"><font face="Times New Roman"><strong style="mso-bidi-font-weight: normal">Engelbert Hetzmannseder</strong> was born in Klaffer,
<place w:st="on">Upper Austria</place>
.<span style="mso-spacerun: yes">&nbsp; </span>He received his Dipl.-Ing. (B.S., M.S., 1990) and Dr. techn. (Ph.D., 1994) degree in Electrical Engineering from the Technical University of Vienna, Austria.<span style="mso-spacerun: yes">&nbsp; </span>Since February 1995 he has been with Eaton Corporation /
<placename w:st="on">Innovation</placename>
<placetype w:st="on">Center</placetype>
in
<place w:st="on"><city w:st="on">Milwaukee</city>, <state w:st="on">WI</state>, </place>
, involved with fundamental and applied research on contacts, switching arc phenomena, and arc fault detection for industrial, aerospace, and automotive products.<span style="mso-spacerun: yes">&nbsp; </span>He holds 8 patents and published 20 papers at international conferences.<span style="mso-spacerun: yes">&nbsp; </span></font></font></font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><font size="3"><font color="#000000"><font face="Times New Roman"><span style="mso-spacerun: yes"></span><br />Since 2000 he has been Technology Manager of the Electrical Architecture &amp; Systems department at the
<place w:st="on">
<placename w:st="on">Eaton</placename>
<placename w:st="on">Innovation</placename>
<placetype w:st="on">Center</placetype>
</place>
.<span style="mso-spacerun: yes">&nbsp; </span>Capabilities of the EAS group include:</font></font></font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; mso-list: l0 level1 lfo1"><span style="COLOR: black; FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"><span style="mso-list: Ignore"><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Electric Power Management</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; mso-list: l0 level1 lfo1"><span style="COLOR: black; FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"><span style="mso-list: Ignore"><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Power electronic, Power quality, and power conversion architectures</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; mso-list: l0 level1 lfo1"><span style="COLOR: black; FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"><span style="mso-list: Ignore"><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Diagnostics &amp; prognostics of electrical components and systems</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; mso-list: l0 level1 lfo1"><span style="COLOR: black; FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"><span style="mso-list: Ignore"><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Electro-mechanical switching technologies, Arc &amp; plasma science, Contact physics</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; mso-list: l0 level1 lfo1"><span style="COLOR: black; FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"><span style="mso-list: Ignore"><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Power systems modeling: magnetic, electric, thermal, electro-mechanical</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; mso-list: l0 level1 lfo1"><span style="COLOR: black; FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol"><span style="mso-list: Ignore"><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Mechanism synthesis and modeling</font></p>
<p class="MsoHeader" style="MARGIN: 24pt 0in 0pt; LINE-HEIGHT: normal; tab-stops: .5in">
<p><font face="Times New Roman" color="#000000" size="3">&nbsp; <stroke joinstyle="miter"></stroke>
<formulas>
<f eqn="if lineDrawn pixelLineWidth 0"></f><f eqn="sum @0 1 0"></f><f eqn="sum 0 0 @1"></f><f eqn="prod @2 1 2"></f><f eqn="prod @3 21600 pixelWidth"></f><f eqn="prod @3 21600 pixelHeight"></f><f eqn="sum @0 0 1"></f><f eqn="prod @6 1 2"></f><f eqn="prod @7 21600 pixelWidth"></f><f eqn="sum @8 21600 0"></f><f eqn="prod @7 21600 pixelHeight"></f><f eqn="sum @10 21600 0"></f>
</formulas>
<path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"></path>
<lock v:ext="edit" aspectratio="t"></lock><shape id="_x0000_s1026" style="MARGIN-TOP: 48.4pt; Z-INDEX: 1; MARGIN-LEFT: 401.55pt; WIDTH: 157.1pt; POSITION: absolute; HEIGHT: 213.55pt" type="#_x0000_t75"><imagedata o:title="ENGELBERT 2 005 head" src="file:///C:\DOCUME~1\jakirk\Local%20Settings\Temp\msohtml1\01\clip_image001.jpg"></imagedata></shape> <stroke joinstyle="miter"></stroke>
<formulas>
<f eqn="if lineDrawn pixelLineWidth 0"></f><f eqn="sum @0 1 0"></f><f eqn="sum 0 0 @1"></f><f eqn="prod @2 1 2"></f><f eqn="prod @3 21600 pixelWidth"></f><f eqn="prod @3 21600 pixelHeight"></f><f eqn="sum @0 0 1"></f><f eqn="prod @6 1 2"></f><f eqn="prod @7 21600 pixelWidth"></f><f eqn="sum @8 21600 0"></f><f eqn="prod @7 21600 pixelHeight"></f><f eqn="sum @10 21600 0"></f>
</formulas>
<path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"></path>
<lock v:ext="edit" aspectratio="t"></lock><shape id="_x0000_s1026" style="MARGIN-TOP: 48.4pt; Z-INDEX: 1; MARGIN-LEFT: 401.55pt; WIDTH: 157.1pt; POSITION: absolute; HEIGHT: 213.55pt" type="#_x0000_t75"><imagedata o:title="ENGELBERT 2 005 head" src="file:///C:\DOCUME~1\jakirk\Local%20Settings\Temp\msohtml1\01\clip_image001.jpg"></imagedata></shape></font></p>
</p>
<p class="MsoNormal">
<p>&nbsp;</p>
</p>
</span></font></span>
[/code]

Whats returned from the preg_replace:
[code]
<p class="MsoNormal"><span class="EmailStyle41"><font face="Arial" color="#003300" size="2"><span style="FONT-SIZE: 10pt">The Semiconductor Power and Electronics Center (SPEC) welcomes Dr. Engelbert Hetzmannseder as he&nbsp;presents an overview of&nbsp;the Eaton Corporation focused on the Eaton Electric Group and an overview of the mission and capabilities of the&nbsp;

<placename w:st="on">Eaton
Innovation
Center in Milwaukee, Wisconsin
</place>
. </span></font></span></p>

<span class="EmailStyle41"><font face="Arial" color="#003300" size="2"><span style="FONT-SIZE: 10pt">
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style=""><span style="FONT-SIZE: 14pt; "><font color="#000000"><font face="Times New Roman">The Speaker</font></font></span></strong></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style=""><span style="FONT-SIZE: 14pt; "><font color="#000000"><font face="Times New Roman">Engelbert Hetzmannseder</font></font></span></strong></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style=""><span style="FONT-SIZE: 14pt; "></span></strong></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><strong style=""><span style="FONT-SIZE: 14pt; "></span></strong><font size="3"><font color="#000000"><font face="Times New Roman"><strong style="">Engelbert Hetzmannseder</strong> was born in Klaffer,
Upper Austria
.<span style="">&nbsp; </span>He received his Dipl.-Ing. (B.S., M.S., 1990) and Dr. techn. (Ph.D., 1994) degree in Electrical Engineering from the Technical University of Vienna, Austria.<span style="">&nbsp; </span>Since February 1995 he has been with Eaton Corporation /
Innovation
Center
in
Milwaukee, <state w:st="on">WI</state>,
, involved with fundamental and applied research on contacts, switching arc phenomena, and arc fault detection for industrial, aerospace, and automotive products.<span style="">&nbsp; </span>He holds 8 patents and published 20 papers at international conferences.<span style="">&nbsp; </span></font></font></font></p>

<p class="MsoNormal" style="MARGIN: 0in 0in 0pt"><font size="3"><font color="#000000"><font face="Times New Roman"><span style=""></span><br />Since 2000 he has been Technology Manager of the Electrical Architecture &amp; Systems department at the

<placename w:st="on">Eaton
Innovation
Center
</place>
.<span style="">&nbsp; </span>Capabilities of the EAS group include:</font></font></font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; "><span style="COLOR: black; FONT-FAMILY: Symbol; "><span style=""><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Electric Power Management</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; "><span style="COLOR: black; FONT-FAMILY: Symbol; "><span style=""><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Power electronic, Power quality, and power conversion architectures</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; "><span style="COLOR: black; FONT-FAMILY: Symbol; "><span style=""><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Diagnostics &amp; prognostics of electrical components and systems</font></p>

<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; "><span style="COLOR: black; FONT-FAMILY: Symbol; "><span style=""><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Electro-mechanical switching technologies, Arc &amp; plasma science, Contact physics</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; "><span style="COLOR: black; FONT-FAMILY: Symbol; "><span style=""><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Power systems modeling: magnetic, electric, thermal, electro-mechanical</font></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt 0.35in; TEXT-INDENT: -0.2in; LINE-HEIGHT: normal; "><span style="COLOR: black; FONT-FAMILY: Symbol; "><span style=""><font size="3">&middot;</font><span style="FONT: 7pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></span><font face="Times New Roman" color="#000000" size="3">Mechanism synthesis and modeling</font></p>
<p class="MsoHeader" style="MARGIN: 24pt 0in 0pt; LINE-HEIGHT: normal; tab-stops: .5in">
<p><font face="Times New Roman" color="#000000" size="3">&nbsp; <stroke joinstyle="miter"></stroke>

<path  gradientshapeok="t" ></path>
<lock v:ext="edit" aspectratio="t"></lock><shape id="_x0000_s1026" style="MARGIN-TOP: 48.4pt; Z-INDEX: 1; MARGIN-LEFT: 401.55pt; WIDTH: 157.1pt; POSITION: absolute; HEIGHT: 213.55pt" type="#_x0000_t75"><imagedata  src="file:///C:\DOCUME~1\jakirk\Local%20Settings\Temp\msohtml1\01\clip_image001.jpg"></imagedata></shape> <stroke joinstyle="miter"></stroke>

<path  gradientshapeok="t" ></path>
<lock v:ext="edit" aspectratio="t"></lock><shape id="_x0000_s1026" style="MARGIN-TOP: 48.4pt; Z-INDEX: 1; MARGIN-LEFT: 401.55pt; WIDTH: 157.1pt; POSITION: absolute; HEIGHT: 213.55pt" type="#_x0000_t75"><imagedata  src="file:///C:\DOCUME~1\jakirk\Local%20Settings\Temp\msohtml1\01\clip_image001.jpg"></imagedata></shape></font></p>
</p>
<p class="MsoNormal">
<p>&nbsp;</p>
</p>
</span></font></span>
[/code]

Things like:
[code]
<placename w:st="on">Eaton
Innovation
Center in Milwaukee, Wisconsin
</place>

<placename w:st="on">Eaton
Innovation
Center
</place>
[/code]

remain while the others are replaced.  I think that it might have something with those tags been embed in other tags, I've tried different things but I'm not sure what to do.
Link to comment
Share on other sites

Yes, regex engine usually doesn't work with nested patterns. But in this case I guess you could define one more pattern that matches <place*:
first replace those tags starting with <place (something like '/<place[b]\B[/b][^>]*>(.*?)<\/place[b]\B[/b][^>]*>/is' ) and then replace the last ones (that match <place).
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.