Jump to content

[SOLVED] Parse Data between two tags


sh44n

Recommended Posts

The data file is like that

 

Line1

Line2

Line3

Line4

<tag1>

 

Tag1 Line1

Tag1 Line2

Tag1 Line3

Tag1 Line4

 

 

<tag2>

 

Tag2 Line1

Tag2 Line2

Tag2 Line3

Tag2 Line4

 

<tag3>

.

.

.

 

how to regex to get the first 4 lines before the tag1 and then grabbing data between <tag1> and <tag2>  and associating it with index tag1 and so on. In short data for

 

tag1 data = betwen <tag1> and <tag2>

tag2 data = betwen <tag2> and <tag3>

.

.

.

so on.

 

Looking forward for your help guys.

 

Link to comment
Share on other sites

<?php

$data = <<<DATA
Line1
Line2
<tag1>

Tag1 Line1
Tag1 Line2

<tag2>

Tag2 Line1
Tag2 Line2

DATA;

preg_match_all("/(<tag\d>)?(.+?)(<tag\d>|$)/is", $data, $result);

$your_data = $result[2];
print_r($your_data);

?>

 

Orio.

Link to comment
Share on other sites

Thank you very much Orio. It works perfectly but how can I keep the <tag1> content generic without hardcoding it in regex ?

Like

 

 

Line1

Line2

Line3

Line4

<directors>

 

Tag1 Line1

Tag1 Line2

Tag1 Line3

Tag1 Line4

 

 

<address>

 

Tag2 Line1

Tag2 Line2

Tag2 Line3

Tag2 Line4

 

<copyright>

.

.

.

 

also is it possible that the $result array can contain the data as follows:

 

$result['tag1'] = its respective data

$result['tag2'] = its respective data

$result['tag3'] = its respective data

 

 

 

 

Link to comment
Share on other sites

also is it possible that the $result array can contain the data as follows:

 

$result['tag1'] = its respective data

$result['tag2'] = its respective data

$result['tag3'] = its respective data

 

 

 

Link to comment
Share on other sites

<pre>
<?php
$data = <<<DATA
Line1
Line2
Line3
Line4
<directors>

Tag1 Line1
Tag1 Line2
Tag1 Line3
Tag1 Line4


<address>

Tag2 Line1
Tag2 Line2
Tag2 Line3
Tag2 Line4

<copyright>
.
.
.
DATA;

$pieces = preg_split('/^(<[^>]+>)\s*/m', $data, -1, PREG_SPLIT_DELIM_CAPTURE);
$count = -1;
foreach ($pieces as $piece) {
++$count;
if (!preg_match('/^</', $piece)) {
	continue;
}
$piece = preg_replace('/[<>]/', '', $piece);
$result[$piece] = $pieces[$count+1];

}
print_r($result);

?>
</pre>

Link to comment
Share on other sites

That's what I manged to piece up while effigy posted his...

Kinda ugly I guess, but it works fine.

 

<?php

$data = <<<DATA
Line1
Line2

<directors>

Tag1 Line1
Tag1 Line2

<address>

Tag2 Line1
Tag2 Line2

DATA;

preg_match_all("/(((\w+)>)|^)(.+?)(<|$)/is", $data, $matches);


$result = array();
$result[0] = trim($matches[4][0]); //The first lines, before the first tag
$num_tags = count($matches[1]) - 1;
for($i = 1; $i <= $num_tags; $i++)
$result[$matches[3][$i]] = trim($matches[4][$i]);

echo "<pre>";
print_r($result);
echo "</pre>";

?>

 

 

Orio.

Link to comment
Share on other sites

Thank you all for taking time out to help. Your effort in this regard is highly appreciated.

 

Orio, the examples works perfect but it seems to get the last index when we have data like this

 

line1

line2

line3

 

<borad of directors>

 

board1name

board2name

board3name

 

<address at place>

 

address 1

address 2

address 3

 

....

 

it seems to get in the array

$result['directors'] = its respective data NOT result['board of directors']

$result['place'] = its respective data NOT result['address at place']

 

though fortunately, the sample data i'm working still gives a unique data to the index but still as a student currently of regex I would like to learn that how to proceed. A little explanation of the regex difference in that regard will be highly appreciated as well.

 

Thanks in anticipation again.

 

 

 

Link to comment
Share on other sites

I think this should solve the spaced keys problem.

 

<?php

$data = <<<DATA
Line1
Line2

<directors>

Tag1 Line1
Tag1 Line2

<address>

Tag2 Line1
Tag2 Line2

DATA;

preg_match_all("/((([a-z0-9_ ]+)>)|^)(.+?)(<|$)/is", $data, $matches);


$result = array();
$result[0] = trim($matches[4][0]); //The first lines, before the first tag
$num_tags = count($matches[1]) - 1;
for($i = 1; $i <= $num_tags; $i++)
$result[$matches[3][$i]] = trim($matches[4][$i]);

echo "<pre>";
print_r($result);
echo "</pre>";

?>

 

Orio.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.