Jump to content

[SOLVED] Looking for Solution / Workaround for Preg_Match with Array as Subject


Recommended Posts

Hello,

 

after a longer break from coding I need to figure something out and it did not turn out as it

should. In the end it came down to the fact that preg_match does not want arrays as

subjects.

 

The idea is to extract from one big document individual articles and save them in a separate

file named by the date. Each article is defined by <ppsarticle></ppsarticle>; each article

consists of multiple <paragraph></<paragraph> tags and has a <date></date> tag

which should be the filename.

 

So the idea is simple to look for the paragraph tag then copy all the content between the 

ppsarticle into a file and continue with the next article.

 

In the first place I thought my foreach loops where wrong and tried it with for loops. It seemed

to work but the output was Array, Array, Array, etc.

 

It would be great if someone could give me advice how solve this problem when arrays are given

as a subject for preg_match.

 

 

 

<?php
$data = file_get_contents('http://domain.tld/article_data.txt');
$regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s';
preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER);
	foreach ($all_articles as $individual_article) {
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/s';
	preg_match($regex_date,$individual_article,$article_date);
	$regex_paragraphs = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s';
	preg_match_all($regex_paragraphs,$individual_article,$all_article_paragraphs);
	$output_file = fopen($article_date[0], "w");
	foreach ($all_article_paragraphs as $individual_article_paragraphs) {
		fwrite($output_file, $individual_article_paragraphs);
	}
fclose($output_file);
}
?>

 

Thanks a lot

 

:shrug: Mr.Rum

Your problem is that $all_articles is a multi-dim array and you are looping through the top level.  Do this:

 

echo "<pre>"; print_r($all_articles);

 

So you can see the hierarchy of results.

 

alternatively, considering you are using tags to mark your content, you might want to consider using DOM or XML instead.

 

Thanks for your reply,

 

I thought that i ruled this out because I also tried using the following code

 

  $number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2;
  
  for ( $i = 0; $i <= $number_articles -1 ; $i++){
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/';  
		preg_match_all($regex_date,$all_articles[$i][1]),$article_date);
  	echo $article_date;
  }

 

There it should go through each of the articles and extract

the date. However, it only prints 8x "Array" in case of a document which contains 8 articles.

 

 

I was also suprised my the return of is_string(). However, it is solved now.

The following code does what it should.

 

<?php

$data = file_get_contents('http://domain.tld/data.xml');
$regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s';
preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER);

$number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2;
  	for ($i=0; $i<=$number_articles-1; $i++)
{
	$regex_date = '/<date\b[^>]*>(.*?)<\/date>/';  
		preg_match($regex_date,$all_articles[$i][1],$article_date);
    # echo $article_date[1]."-".$i;

	$output_file = fopen($article_date[1]."-".$i.".txt", "w");
	# echo $article_date[1]."-".$i.".txt";

	$regex_date = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s';  
		preg_match_all($regex_date,$all_articles[$i][1],$all_article_paragraphs);

	$number_paragraphs = (count($all_article_paragraphs, COUNT_RECURSIVE) - count($all_article_paragraphs))/2;
	for ($j =0; $j<=$number_paragraphs-1; $j++)
	{
		fwrite($output_file, rtrim(ltrim($all_article_paragraphs[1][$j]))." ");
	}

	fclose($output_file);
	echo 'Created <a href="http://domain.tld/'.$article_date[1].'-'.$i.'.txt">'.$article_date[1].'-'.$i.'.txt'.'</a><br>';
	#echo '<br ""';

}
?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.