Jump to content

[SOLVED] preg's parsed string limit


TruthSeeker

Recommended Posts

ok, heres the status:

 

Introduction

 

i've built a somewhat developing framework in php...

anyway, it's now working in some websites and i've now found it's first limitation:

the framework 100% separates html of php code and is mainly xml/php driven, being xml a simple skeleton organizer to html bits also loaded/parsed throughout the framework. As you see, a bit complex to explain.

So here's a small sample:

 

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE waveml PUBLIC "-//WAVER//DTD WAVER 1.0//EN" "http://www.gimmestock.com/Waver/lib/dtd/1_0.dtd">
<waveml error_page="/application_error.html">

<!-- header -->
<include file="src:index.xml" part="header"/>

<!--actions-->
<run file="src:site_source/php/search.php" />

<!--body-->
<?php $varBox['page_title']='Royalty Free Image : '.addslashes($_GET['search']['str']).' : Gimmestock.com';?>
	<process html="src:site_source/xhtml/search.html" part="mainContent" key="mainContent">
		<?php
			$_SESSION['comeback']=$Waver->Waver_Commons->thisUrl();
		?>
		<?if $varBox['search_total_results']>0;?>
		<run file="src:site_source/php/search_engine.php" />
		<process html="src:site_source/xhtml/search.html" part="results_layout" key="results_layout"/>
		<?else?>
		<?php
			$varBox['search_total_results']='"0"';
			$Waver->Waver_Notices->addInfo("No images were found matching your search.", true);
			$varBox['results_layout']='noShow';
		?>
		<?endif?>
		<include file="src:index.xml" part="searchBox"/>
		<include file="src:index.xml" part="user_console"/>
	</process>

<!-- finish -->
<include file="src:index.xml" part="finish"/>
</waveml>

 

The example above is taken from http://www.gimmestock.com search page original source code.

"site_source/php/search.php" has php funccionalities for organizing mysql query strings later used on "site_source/php/search_engine.php" that takes care of listing results, pagination, etc.

All html files are static and loaded as strings to the application, as well as every php is loaded in the same way and later parsed on eval.

As it is the framework has full power and error control over every code built on it and can load many runable separate file parts of its xml, php and html instead of full files as standard php and still contains standard php's full power.

It has html full paths correction to the executions location. This means my static source website, my php and xml code can be placed in whatever folder i wish and it will all be run and displayed perfectly in whatever location my xml main execution file is.

 

I can now ask my designer to make whatever new layout he wants to the application and nearly just copying the html/images/css files to the site_source/html folder as a static website, and the application will fully work on it like magic. As i said 100% code/html separation.

 

As an example the applications above latest full layout change took about 4 hours to implement.

 

We're thinking on releasing it as a beta maybe in a few months.

 

Problem:

 

anyway, to the point - the whole application breaks down run when a string over some +- 75.000 chars is parsed through some core regular expressions (preg_replace_callback, preg_match_all, etc).

When this happens simply nothing is returned. The preg functions just blow everything up when they are requested to parse such strings.

Apparently there is a string limit to be parsed by preg functions.

 

Any ideas 4 efficient resolution/workarounds ?

 

PS: most of this core regex do different and very complex stuff, so it would be hard choping the string into smaller bits as there is a risk  of having something important not parsed...

Link to comment
Share on other sites

I'm not aware of any limits myself, other than those of the OS. I assume "normal" string functions (substr, strpos, etc.) work OK on these large strings? Maybe there's something in the patterns that could be optimized. Do you see any errors when error_reporting is E_ALL? What about the web server's error log?

Link to comment
Share on other sites

framework on debug mode:

E_ALL shows nothing

<!-- Final string lenght: 0 -->

 

this only happens when html string total is supposed to be around 80.000 chars.

same page works fine if not so much information is loaded (up to around 75.000 chars) .

 

the html before loading listed information is around 10.000

listed information html is about 72.000

 

the info html is inserted into base html through str_replace of it's destination key (keys are like <?/*key*/?>).

so no problem here i think.

 

The problem should be when keys are reloaded after this process

here's some examples (random code sentences):

$string=preg_replace('#<\?/\*BEGIN (.*?)\*/\?>(.*?)<\?/\*END \\1\*/\?>#', '<?/*\\1*/?>', $string);

$string=preg_replace('#<\?/\*(.*?)\*/\?>#', '', $string);


preg_match_all("|<\?/\*(.*)\*/\?>|U", $this->fullString, $keys, PREG_PATTERN_ORDER);

preg_match_all("|<\?/\*BEGIN (.*?)\*/\?>(.*?)<\?/\*END \\1\*/\?>|U", $tpl, $keys, PREG_PATTERN_ORDER);

preg_replace('#<\?/\*BEGIN (.*?)\*/\?>(.*?)<\?/\*END \\1\*/\?>#', "\n" . '$this->'.$varName.'[\'\\1\'] = \'\\2\';', $value);


or on a major preg_replace_callback that fully corrects html/css paths to the executed location.

$string=preg_replace_callback(
'#(<(?![?/!]|[aA] ))(.*?[^?])(>)#', 		//gets every <...> except <a...> and </...>
 create_function(  //takes care of <***>, transforms paths and prints it back on string
           '&$matches',
           '$inside=preg_replace_callback(     
	        \'#(src|background|bgimage|href|style)=("|\\\')(.*?)\\2#\',
		    create_function(
           			\'$matches2\',
				\'
				  $start=&$matches2[1];
				  
				  if($matches2[1]=="style"){ 
					  preg_match_all("#^(.*?)(url\()(.*?)(\).*?)$#", $matches2[3], $regs);
					  if (@$regs[2][0]!="url("){
						  return $matches2[0];
					  }
					  $start.="=\"".$regs[1][0].$regs[2][0];
					  $url=&$regs[3][0];
					  $end=&$regs[4][0];
				  } else {
				  	  $start.="=\"";
				  	  $url=&$matches2[3];
				  	  $end="";
				  }
				  $end.="\"";
				  if (!preg_match("#^\/|((?i:http|ftp|gopher|file|wais|javascript):\/\/)#", $url)){
					  $url=&$GLOBALS["Waver"]->Waver_Commons->urlCombine($url, "'.$path.'");
				  }
				  return $start.$url.$end;
				\'),
				$matches[2]);
            return $matches[1].$inside.$matches[3];'
		),
		$string);

 

As you can see it's hard and resources consuming to to just do it with string functions (substr, strpos, etc.)

it always worked fine until a couple of days ago, as the information grew. It could be some kind of processing memory limit.

 

Nothing in error_log or apache error_log either

 

This is really getting me stuck. Errrr...

Link to comment
Share on other sites

What if you echo the string after each preg_replace_*? Perhaps some are working and others are not. Is there anything in the data that would cause your expressions to return a 0 length string, i.e., they're working correctly?

Link to comment
Share on other sites

ok, i've gone through some lot's of strlen throught the app to check exactly where it disappears.

u'r right it actually only happens once after the js replacing stuff, done to include js funccionallities at the very bit end of the framework.

i've tried to purposelly inscrese chars up to 150.000 and still only happens here, never on other preg_replace / preg_match, etc.

 

for the interest, here's where it blew up:

 

i probably copied it from somewhere else in the framework source code and it ended up like this

$string  = str_replace("\n", '&-:newLine:-;', $string);
$string=preg_replace('#^(.*?<head>.*?)(<\/head>.*?)$#i', 
'\\1'.$this->initializations.'&-:newLine:-;<script type="text/javascript" language="javascript">'.$this->snippets.'&-:newLine:-;</script>&-:newLine:-;\\2', 
$string);
$string = str_replace('&-:newLine:-;', "\n", $string);

This one can actually work much better with str functions.

 

i know, i know... nearly a daily wtf post lol

well now it looks like this

$refPos=stripos($string, "</head>");
$after=substr($string, $refPos, strlen($string));
$string=substr($string, 0, $refPos);
$string.='\n'.$this->initializations.'\n'.$this->snippets.'\n'.$after;

 

Done! It works!! \o/

Though i can't seem to find a reason for it not to be working before, now i can have a good night sleep again muahahah lol

 

Tanx a lot. ;)

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.