mr.rum Posted October 2, 2009 Share Posted October 2, 2009 Hello, after a longer break from coding I need to figure something out and it did not turn out as it should. In the end it came down to the fact that preg_match does not want arrays as subjects. The idea is to extract from one big document individual articles and save them in a separate file named by the date. Each article is defined by <ppsarticle></ppsarticle>; each article consists of multiple <paragraph></<paragraph> tags and has a <date></date> tag which should be the filename. So the idea is simple to look for the paragraph tag then copy all the content between the ppsarticle into a file and continue with the next article. In the first place I thought my foreach loops where wrong and tried it with for loops. It seemed to work but the output was Array, Array, Array, etc. It would be great if someone could give me advice how solve this problem when arrays are given as a subject for preg_match. <?php $data = file_get_contents('http://domain.tld/article_data.txt'); $regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s'; preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER); foreach ($all_articles as $individual_article) { $regex_date = '/<date\b[^>]*>(.*?)<\/date>/s'; preg_match($regex_date,$individual_article,$article_date); $regex_paragraphs = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s'; preg_match_all($regex_paragraphs,$individual_article,$all_article_paragraphs); $output_file = fopen($article_date[0], "w"); foreach ($all_article_paragraphs as $individual_article_paragraphs) { fwrite($output_file, $individual_article_paragraphs); } fclose($output_file); } ?> Thanks a lot Mr.Rum Quote Link to comment https://forums.phpfreaks.com/topic/176331-solved-looking-for-solution-workaround-for-preg_match-with-array-as-subject/ Share on other sites More sharing options...
.josh Posted October 3, 2009 Share Posted October 3, 2009 Your problem is that $all_articles is a multi-dim array and you are looping through the top level. Do this: echo "<pre>"; print_r($all_articles); So you can see the hierarchy of results. alternatively, considering you are using tags to mark your content, you might want to consider using DOM or XML instead. Quote Link to comment https://forums.phpfreaks.com/topic/176331-solved-looking-for-solution-workaround-for-preg_match-with-array-as-subject/#findComment-929412 Share on other sites More sharing options...
mr.rum Posted October 3, 2009 Author Share Posted October 3, 2009 Thanks for your reply, I thought that i ruled this out because I also tried using the following code $number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2; for ( $i = 0; $i <= $number_articles -1 ; $i++){ $regex_date = '/<date\b[^>]*>(.*?)<\/date>/'; preg_match_all($regex_date,$all_articles[$i][1]),$article_date); echo $article_date; } There it should go through each of the articles and extract the date. However, it only prints 8x "Array" in case of a document which contains 8 articles. Quote Link to comment https://forums.phpfreaks.com/topic/176331-solved-looking-for-solution-workaround-for-preg_match-with-array-as-subject/#findComment-929546 Share on other sites More sharing options...
mr.rum Posted October 3, 2009 Author Share Posted October 3, 2009 I do not see the problem in the loop above. When I call is_string() it says it is a string and the loop only prints the word "Array" Help my brain please Quote Link to comment https://forums.phpfreaks.com/topic/176331-solved-looking-for-solution-workaround-for-preg_match-with-array-as-subject/#findComment-929614 Share on other sites More sharing options...
cags Posted October 3, 2009 Share Posted October 3, 2009 I'm not sure how/why is_string($article_date) would return true, but the variable returned by preg_match_all is an array. echo $article_date[1]; Quote Link to comment https://forums.phpfreaks.com/topic/176331-solved-looking-for-solution-workaround-for-preg_match-with-array-as-subject/#findComment-929627 Share on other sites More sharing options...
mr.rum Posted October 3, 2009 Author Share Posted October 3, 2009 I was also suprised my the return of is_string(). However, it is solved now. The following code does what it should. <?php $data = file_get_contents('http://domain.tld/data.xml'); $regex_article = '/<ppsarticle\b[^>]*>(.*?)<\/ppsarticle>/s'; preg_match_all($regex_article, $data, $all_articles, PREG_SET_ORDER); $number_articles = (count($all_articles, COUNT_RECURSIVE) - count($all_articles))/2; for ($i=0; $i<=$number_articles-1; $i++) { $regex_date = '/<date\b[^>]*>(.*?)<\/date>/'; preg_match($regex_date,$all_articles[$i][1],$article_date); # echo $article_date[1]."-".$i; $output_file = fopen($article_date[1]."-".$i.".txt", "w"); # echo $article_date[1]."-".$i.".txt"; $regex_date = '/<paragraph\b[^>]*>(.*?)<\/paragraph>/s'; preg_match_all($regex_date,$all_articles[$i][1],$all_article_paragraphs); $number_paragraphs = (count($all_article_paragraphs, COUNT_RECURSIVE) - count($all_article_paragraphs))/2; for ($j =0; $j<=$number_paragraphs-1; $j++) { fwrite($output_file, rtrim(ltrim($all_article_paragraphs[1][$j]))." "); } fclose($output_file); echo 'Created <a href="http://domain.tld/'.$article_date[1].'-'.$i.'.txt">'.$article_date[1].'-'.$i.'.txt'.'</a><br>'; #echo '<br ""'; } ?> Quote Link to comment https://forums.phpfreaks.com/topic/176331-solved-looking-for-solution-workaround-for-preg_match-with-array-as-subject/#findComment-929644 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.