Jump to content

Converting plain text to XML with PHP


behindspace

Recommended Posts

maybe I'm going about this the wrong way. I've got numerous regular expressions that are taking plain text from a form and converting it to XML. The purpose is to create a book index file from a word document. words "XML" is complete trash (as is their HTML).

Maybe I'm heading the wrong direction with this, and someone has already written a utility like this.

basically, I need to take a line like this:

Aardvark, 100, 110-12

and convert it to this:

[code]
<term>
<name>Aardvark</name>
<page>100</page>
<page>110-12</page>
</term>
[/code]

but also I have nested terms as well that would end up appearing like this:

[code]
<term>
<name>Chainsaw</name>
<page>50</page>
     <term>
     <name>juggling</name>
     <page>210</page>
     </term>
</term>
[/code]
Link to comment
https://forums.phpfreaks.com/topic/4211-converting-plain-text-to-xml-with-php/
Share on other sites

ok, forgive me for being a complete n00b with regex. until recently I haven't had to ever delve into them like this, so I'm sure that this code can be written FAR better, I'm just awful and out of practice.

so, forgive my n00bness, and don't laugh too hard at me :(

FYI, I'm running this application on PHP4.0.2 on Apache on my windows box (no linux box access in the office)

[code]
<?
    
    $word = $_POST['word'];
    
    $clean = nl2br($word);
    
    
    $str = preg_replace('/<br \/>/', '<br>', $clean);
    $strxx = preg_replace('/(\D), ([1-9][0-9][2-9][0-9])/', '$1, <y>$2<y>', $str);
    $strx1 = preg_replace('/(\D), ([1-9][1-9][0-9][0-9])/', '$1, <y>$2<y>', $strxx);
    $str01 = eregi_replace(",", " , ", $strx1);
    $str02 = eregi_replace(" +", " ", $str01);
    $str03 = preg_replace('/\-([1-9][0-9][2-9][0-9])/', '-<y>$1<y>', $str02);
    $str0A = preg_replace('/\-([1-9][1-9][0-9][0-9])/', '-<y>$1<y>', $str03);
    $str0B = preg_replace('/([1-9][1-9][0-9][0-9])/', '<y>$1<y>', $str0A);
    $str04 = preg_replace('/([2-9][0-9][0-9][0-9])/', '<y>$1<y>', $str0B);
    $str0C = preg_replace('/\–/', '-', $str04);
    $str0D = preg_replace('/á/', 'a', $str0C);
    $str0E = preg_replace('/\”/', '"', $str0D);
    $str0F = preg_replace('/\“/', '"', $str0E);
    $str0G = preg_replace('/Á/', 'A', $str0F);
    $str0H = preg_replace('/ú/', 'u', $str0G);
    $str0I = preg_replace('/ñ/', 'n', $str0H);
    $str05 = preg_replace('/(\D) , ([0-9])/', '$1</name><page>$2', $str0I);
    $str06 = preg_replace('/(\D)<br>/', '$1</name><br>', $str05);
    $str07 = preg_replace('/([0-9])<br>/', '$1</page><br>', $str06);
    $str08 = preg_replace('/([0-9]) , ([0-9])/', '$1</page><page>$2', $str07);
    $str8A = preg_replace('/<page>([0-9]{1,4}). /', '<page>$1</page>', $str08);
    $str8B = preg_replace('/<page>([0-9]{1,4})-([0-9]{1,2}). /', '<page>$1-$2</page>', $str8A);
    $str09 = preg_replace('/\n/', '<name>', $str8B);
    $str10 = preg_replace('/<name>([A-Z])/', '</term><name>$1', $str09);
    $str11 = preg_replace('/<name>/', '<term><name>', $str10);
    $str12 = preg_replace('/<br>/', '', $str11);
    $strXB = preg_replace('/<y>/', '', $str12);
    $str13 = preg_replace('/</', '<', $strXB);
    $str14 = preg_replace('/>/', '>', $str13);
    $str15 = preg_replace('/></', '><br><', $str14);
    $str16 = preg_replace('/<term>/', '<br><term>', $str15);
    $str17 = preg_replace('/<\/term>/', '<br></term>', $str16);
    $str18 = preg_replace('/AT&T/', 'AT&T', $str17);
    
    print_r($str18);
    
?>
[/code]

if you are curious as to what I am thinking on any given line, just ask... <_<
another note:

I'm echoing back the results as html that you can copy/paste into a text document.

a few issues that I haven't figured out yet:

1.) nested terms, I can't figure out how to get the first term to not close until after the last nester term resulting in:

</term>
</term>

2.) I'm still stumped on moving the "See also..." text from after the page number to the end of the text in the: <name></name> tags like it should be...

any help would be GREATLY appreciated

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.