[SOLVED] Looking for Solution / Workaround for Preg_Match with Array as Subject

mr.rum · October 2, 2009

Hello,

after a longer break from coding I need to figure something out and it did not turn out as it

should. In the end it came down to the fact that preg_match does not want arrays as

subjects.

The idea is to extract from one big document individual articles and save them in a separate

file named by the date. Each article is defined by <ppsarticle></ppsarticle>; each article

consists of multiple <paragraph></<paragraph> tags and has a <date></date> tag

which should be the filename.

So the idea is simple to look for the paragraph tag then copy all the content between the

ppsarticle into a file and continue with the next article.

In the first place I thought my foreach loops where wrong and tried it with for loops. It seemed

to work but the output was Array, Array, Array, etc.

It would be great if someone could give me advice how solve this problem when arrays are given

as a subject for preg_match.

<?php
$data = file_get_contents('http://domain.tld/article_data.txt');
$regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s';
preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER);
	foreach ($all_articles as $individual_article) {
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/s';
	preg_match($regex_date,$individual_article,$article_date);
	$regex_paragraphs = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s';
	preg_match_all($regex_paragraphs,$individual_article,$all_article_paragraphs);
	$output_file = fopen($article_date[0], "w");
	foreach ($all_article_paragraphs as $individual_article_paragraphs) {
		fwrite($output_file, $individual_article_paragraphs);
	}
fclose($output_file);
}
?>

Thanks a lot

:shrug: Mr.Rum

.josh · October 3, 2009

Your problem is that $all_articles is a multi-dim array and you are looping through the top level. Do this:

echo "<pre>"; print_r($all_articles);

So you can see the hierarchy of results.

alternatively, considering you are using tags to mark your content, you might want to consider using DOM or XML instead.

mr.rum · October 3, 2009

Thanks for your reply,

I thought that i ruled this out because I also tried using the following code

  $number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2;
  
  for ( $i = 0; $i <= $number_articles -1 ; $i++){
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/';  
		preg_match_all($regex_date,$all_articles[$i][1]),$article_date);
  	echo $article_date;
  }

There it should go through each of the articles and extract

the date. However, it only prints 8x "Array" in case of a document which contains 8 articles.

mr.rum · October 3, 2009

I do not see the problem in the loop above. When I call is_string() it

says it is a string and the loop only prints the word "Array" :wtf:

Help my brain please

cags · October 3, 2009

I'm not sure how/why is_string($article_date) would return true, but the variable returned by preg_match_all is an array.

echo $article_date[1];

mr.rum · October 3, 2009

I was also suprised my the return of is_string(). However, it is solved now.

The following code does what it should.

<?php

$data = file_get_contents('http://domain.tld/data.xml');
$regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s';
preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER);

$number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2;
  	for ($i=0; $i<=$number_articles-1; $i++)
{
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/';  
		preg_match($regex_date,$all_articles[$i][1],$article_date);
    # echo $article_date[1]."-".$i;

	$output_file = fopen($article_date[1]."-".$i.".txt", "w");
	# echo $article_date[1]."-".$i.".txt";

	$regex_date = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s';  
		preg_match_all($regex_date,$all_articles[$i][1],$all_article_paragraphs);

	$number_paragraphs = (count($all_article_paragraphs, COUNT_RECURSIVE) - count($all_article_paragraphs))/2;
	for ($j =0; $j<=$number_paragraphs-1; $j++)
	{
		fwrite($output_file, rtrim(ltrim($all_article_paragraphs[1][$j]))." ");
	}

	fclose($output_file);
	echo 'Created <a href="http://domain.tld/'.$article_date[1].'-'.$i.'.txt">'.$article_date[1].'-'.$i.'.txt'.'</a><br>';
	#echo '<br ""';

}
?>

Sign In

[SOLVED] Looking for Solution / Workaround for Preg_Match with Array as Subject

Recommended Posts

mr.rum

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

mr.rum

Link to comment

Share on other sites

mr.rum

Link to comment

Share on other sites

cags

Link to comment

Share on other sites

mr.rum

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information