Jump to content

[SOLVED] Looking for Solution / Workaround for Preg_Match with Array as Subject


mr.rum

Recommended Posts

Hello,

 

after a longer break from coding I need to figure something out and it did not turn out as it

should. In the end it came down to the fact that preg_match does not want arrays as

subjects.

 

The idea is to extract from one big document individual articles and save them in a separate

file named by the date. Each article is defined by <ppsarticle></ppsarticle>; each article

consists of multiple <paragraph></<paragraph> tags and has a <date></date> tag

which should be the filename.

 

So the idea is simple to look for the paragraph tag then copy all the content between the 

ppsarticle into a file and continue with the next article.

 

In the first place I thought my foreach loops where wrong and tried it with for loops. It seemed

to work but the output was Array, Array, Array, etc.

 

It would be great if someone could give me advice how solve this problem when arrays are given

as a subject for preg_match.

 

 

 

<?php
$data = file_get_contents('http://domain.tld/article_data.txt');
$regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s';
preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER);
	foreach ($all_articles as $individual_article) {
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/s';
	preg_match($regex_date,$individual_article,$article_date);
	$regex_paragraphs = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s';
	preg_match_all($regex_paragraphs,$individual_article,$all_article_paragraphs);
	$output_file = fopen($article_date[0], "w");
	foreach ($all_article_paragraphs as $individual_article_paragraphs) {
		fwrite($output_file, $individual_article_paragraphs);
	}
fclose($output_file);
}
?>

 

Thanks a lot

 

:shrug: Mr.Rum

Your problem is that $all_articles is a multi-dim array and you are looping through the top level.  Do this:

 

echo "<pre>"; print_r($all_articles);

 

So you can see the hierarchy of results.

 

alternatively, considering you are using tags to mark your content, you might want to consider using DOM or XML instead.

 

Thanks for your reply,

 

I thought that i ruled this out because I also tried using the following code

 

  $number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2;
  
  for ( $i = 0; $i <= $number_articles -1 ; $i++){
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/';  
		preg_match_all($regex_date,$all_articles[$i][1]),$article_date);
  	echo $article_date;
  }

 

There it should go through each of the articles and extract

the date. However, it only prints 8x "Array" in case of a document which contains 8 articles.

 

 

I was also suprised my the return of is_string(). However, it is solved now.

The following code does what it should.

 

<?php

$data = file_get_contents('http://domain.tld/data.xml');
$regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s';
preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER);

$number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2;
  	for ($i=0; $i<=$number_articles-1; $i++)
{
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/';  
		preg_match($regex_date,$all_articles[$i][1],$article_date);
    # echo $article_date[1]."-".$i;

	$output_file = fopen($article_date[1]."-".$i.".txt", "w");
	# echo $article_date[1]."-".$i.".txt";

	$regex_date = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s';  
		preg_match_all($regex_date,$all_articles[$i][1],$all_article_paragraphs);

	$number_paragraphs = (count($all_article_paragraphs, COUNT_RECURSIVE) - count($all_article_paragraphs))/2;
	for ($j =0; $j<=$number_paragraphs-1; $j++)
	{
		fwrite($output_file, rtrim(ltrim($all_article_paragraphs[1][$j]))." ");
	}

	fclose($output_file);
	echo 'Created <a href="http://domain.tld/'.$article_date[1].'-'.$i.'.txt">'.$article_date[1].'-'.$i.'.txt'.'</a><br>';
	#echo '<br ""';

}
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.