Jump to content

get_headers() returns empty when called from another php file :(


Seda145

Recommended Posts

Hi everyone!

I've been working on a php script to replace links that contain a query with direct links to the files they would redirect to.
The script has to work with a large xml file, which might take a few minutes.
Because of the xml size the script will never finish. The php execution time limit prevents the script from finishing and can't be set to a higher value.

I rewrote my script, so it splits the xml file into multiple "chunk" files that will be processed later.
For each chunk, I call a second php script from within the first. So each xml chunk can be processed at the same time.
For example, for 20 xml chunks, the second php script will be launched 20 times per chunk.

To convert the queries , the second script uses get_headers($url, 1)  on each link in the xml chunk, which then should return a direct link as $headers['Location'];
BUT, the header always returns empty.

For some reason I could get the headers when I used get_headers() from the first php script without calling the second ones that process the chunks...
I spend a lot of time trying to figure out what is going on, I included the files I'm using.
Can someone tell me what I'm doing wrong?

In this case I have to rewrite the xml with this script. Situation is not optimal, I'm running the chunk script because we're stuck with the execution time limit too.

Export filter: This loops over a xml file and splits it into chunks

<?php
//ini_set('max_execution_time', 10);

// ---- includes
if ( ! defined('ABSPATH') ) {
    require_once( dirname( __FILE__ ) . '/wp-load.php' );
}


// ---- end includes
// console
function vwaconsole($input) {
	$disabled = false;
	
	if ($disabled === false) {
		$a = print_r($input.'</br>');
		$a = $input;
		echo "<script>console.log( '--log--: " . $a . "' );</script>";
	}
}
//end console

// settings //
	$chunk_size = 20;
//


$home = constant( 'ABSPATH' );
$xml_path = $home."/wp-content/uploads/wpallimport/files/Bastiaansen.xml";
if (fopen($xml_path,"r") != true) {
	vwaconsole("xml file does not exist");
	exit();
}

$xml = new DOMDocument();
$xml->formatOutput = true;
$xml->preserveWhiteSpace = false;	
$xml->load($xml_path);

// Main program
// creates another file to be filled by sub scripts
if (!empty($xml)) {
	vwaconsole("Running main program");
	// backup xml first
	$date = "_".date("Y M D h i");
	$date = str_replace(' ', '_', $date);	
	

	$xpath = new DOMXpath($xml);
	$items = $xpath->query("//aanbiedingen//item"); 
	$loopcount = 0;
	$processedamount = 0;
	$islast = 0;
		
	vwaconsole('total items: '.$items->length);
	vwaconsole('chunk size: '.$chunk_size);
	echo'<br>';
	
	
	/* chunks processing: */ 
	$chunkxml = new DOMDocument();
	$chunkxml->formatOutput = true;
	$chunkxml->preserveWhiteSpace = false;		
	$counter = 0;
	$chunkroot = null;
	
	foreach($items as $item) {
		//vwaconsole('processing item');
		if ($chunkroot === null) {
			$chunkroot = $chunkxml->createElement('root');	
			$chunkxml->appendChild($chunkroot);
			//vwaconsole('created root');
		}
		
		if (($processedamount + $chunk_size) > $items->length) {
			//vwaconsole("last chunk in progress...");
			$islast = 1;				
		}	

		$chunkitem = $chunkxml->createElement($item->nodeName);
		$chunkroot->appendChild($chunkitem);
		//vwaconsole('appended child item to root');

		foreach($item->childNodes as $spec) {
			//vwaconsole('processing specs in item');

			$chunkspec = $chunkxml->createElement($spec->nodeName);
			$chunkitem->appendChild($chunkspec);

			$chunkspectext = $chunkxml->createTextNode($spec->nodeValue);
			$chunkspec->appendChild($chunkspectext);			
		}
		
		$counter++;
		$processedamount++;	
		
		if ($counter >= $chunk_size) {
			$chunkxml->save("wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_".$loopcount.".xml");
			vwaconsole("saved array chunk");
			
			$output = `php export_filter_chunk_processor.php $loopcount $islast `;
			vwaconsole($output);
			
			vwaconsole("creating new array chunk");
			$chunkxml = new DOMDocument();
			$chunkxml->formatOutput = true;
			$chunkxml->preserveWhiteSpace = false;	
		
			$loopcount++;	
			$counter = 0;
			$chunkroot = null;
		}	
			
		if ($items->length === $processedamount) {
			$chunkxml->save("wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_".$loopcount.".xml");
			vwaconsole("finished saving last chunk");
		}	
		
	}	
	

	
	// merge  documents later..
	/*
	$newxml = new DOMDocument("1.0", "utf-8");
	$newxml->formatOutput = true;
	$newxml->preserveWhiteSpace = false;	
	$itemContainer = $newxml->createElement('aanbiedingen');
	
	
	$newxml->appendChild($itemContainer);
	//$newxml->save("wp-content/uploads/wpallimport/files/TEMP_Bastiaansen.xml");	
	*/
	
	vwaconsole("main ending");
	exit();
	
} else {
	//vwaconsole("xml is empty ?! exiting");
	exit();
}	


?>

 

Export chunk processor: The xml was split by the previous script. This one takes one of the chunks and calls get_headers() , sending a link containing a query multiple of this script run at same time. The queries are then picked up by the last php file.

<?php

//ini_set('max_execution_time', 10);

// console
function vwaconsole($input) {
	$disabled = false;
	
	if ($disabled === false) {
		$a = print_r($input.'</br>');
		$a = $input;
		echo "<script>console.log( '--log--: " . $a . "' );</script>";
	}
}
//end console


//echo'<br>';
print_r("called chunk processor > Chunk processor started. ");
//echo'<br>';

$loopcount=$argv[1];
$islast=$argv[2];

if ($loopcount === null || $islast === null) {
	print_r("CHUNK PROCESSOR ERROR > loop count is empty");
	exit();
} else {
	print_r("CHUNK PROCESSOR variables set. loop count: ".$loopcount." is last: ".$islast."<br>");
}	

if ( ! defined('BAST_ROOT_DIR') ) {
    define('BAST_ROOT_DIR', __DIR__);
}
$home = constant( 'BAST_ROOT_DIR' );

$xml_path = $home."/wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_".$loopcount.".xml";
if (fopen($xml_path,"r") != true) {
	print_r("chunk file was not found at path: ".$xml_path);
	exit();
}


$xmlChunk = new DOMDocument();
//$xmlChunk = new DOMDocument();
$xmlChunk->formatOutput = true;
$xmlChunk->preserveWhiteSpace = false;	
$xmlChunk->load($xml_path);


if (empty($xmlChunk)) {
	print_r("chunk data is empty, exiting");
	exit();
} else {
	print_r("got chunk data <br>");
}	

$xpath = new DOMXpath($xmlChunk);
$item = $xpath->query("//root//item"); 

//$items = $xmlChunk->getElementsByTagName('//root//item');
foreach ($item as $node) {
	echo'new item: <br>';
	foreach ($node->childNodes as $spec) {
		//vwaconsole("processing spec: ".$spec->nodeName);
		if (($spec->nodeName == 'itemlink')) {					
			$memValue = $spec->nodeValue;		
			
			//setting new url spec node
			$spec->nodeValue = processItemLink($memValue);

		// add new itemfoto nodes to xml 
		} elseif(($spec->nodeName == 'itemfoto1')) {
			$memValue = $spec->nodeValue;	
			$imgurls = array();
			
			$first = true;
			for ($i = 1; $i < 50; $i++) {
				if ($first === false) {
					$memValue = str_replace('bast_get_itemfoto_id='.strval($i-1), 'bast_get_itemfoto_id='.strval($i), $memValue);
				}	

				
				vwaconsole("trying to get a itemfoto with id: ".strval($i)." link: " .$memValue);
				$triedFotoUrl = processItemLink($memValue);
				vwaconsole("tried url: ".$triedFotoUrl);
				
				if($triedFotoUrl !== $memValue) {
					$imgurls[] = $triedFotoUrl;
				} else {
					vwaconsole("tried url and processed url are equal. end of loop. No new images left or error happened.");
					break;
				}	
				$first = false;	
			}	
			
			//remove the old single itemfoto from xml
			$spec->parentNode->removeChild($spec);
			
			// add sorted nodes from image array
			if (!empty($imgurls)) {	
				sort($imgurls, SORT_NATURAL);
				
				print_r('<br> after sort;');
				print_r($imgurls);
				print_r('<br>');
				

				$arlength = count($imgurls);
				for($x = 0; $x < $arlength; $x++) {
				
					$newItemFoto = $xml->createElement('itemfoto'.strval($x+1));
					$newItemFotoText = $xml->createTextNode($imgurls[$x]);
					$newItemFoto->appendChild($newItemFotoText);
					$node->appendChild($newItemFoto);
				}
				
				unset($imgurls);
			}			
		
		} 

	}
	echo'<br>';	
}	

//returns string url
//follow the query url from xml, return a direct link or returns input on failure. query is processed in Bastiaansen.php.

function processItemLink($url) {
	if (!empty($url)) {
		
		$headers = get_headers($url, 1);
		if(!empty($headers['Location'])) {
			vwaconsole('test returning header location: '.$headers['Location']);
			return $headers['Location'];
		} else {
			vwaconsole("header empty ?!?! can't convert query");
		}	
	}
	return $url;
}


$xmlChunk->save("wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_test_".$loopcount.".xml");






?>

 

When the chunk processor calls a query link with get_headers(), the query will be noticed by the next script, the query converter:

<?php
if ( ! defined('ABSPATH') ) {
    require_once( dirname( __FILE__ ) . '/wp-load.php' );
}

// ---- vars
$machineID = $_GET['bast_get_machine_id'];
$itemlink = $_GET['bast_get_itemlink'];
$itemfoto_id = $_GET['bast_get_itemfoto_id'];
if (!empty($itemlink)) {
	$request = 'bast_get_itemlink';
	$value = $itemlink;
} elseif (!empty($itemfoto_id)) {
	$request = 'bast_get_itemfoto_id';
	$value = $itemfoto_id;
} 
// ---- end vars

if (empty($machineID)) {
	echo 'please enter query ?bast_get_machine_id=(number) first';
	exit();
}
if (empty($request) || empty($value)) {
	echo 'Valid query request would be: bast_get_itemlink <br>';
	echo 'Valid query request would be: bast_get_itemfoto_id <br>';
	exit();	
}	

// register custom query options 
function sm_register_query_vars( $vars ) {
    // get the right custom field names
	$vars[] = $request;
	$vars[] = 'pa_'.$request;
    return $vars;
} 
add_filter( 'query_vars', 'sm_register_query_vars' );

//returns null or string url
function fetch($machineID,$request,$value) {
	$return = null;
	$args = array(
		'orderby' => 'meta_value_num',
		'meta_key' => 'bast_get_machine_id',
		'meta_type' => 'NUMERIC',
		'post_type' => 'product',
		'posts_per_page' => -1,
		'order' => 'ASC',
		'meta_query' => array(
			array(
				'key' => 'bast_get_machine_id',
				'type' => 'NUMERIC',
				'value' => $machineID,
				'compare' => 'EXISTS',
			)
		)
	);	

	$wp_query = new WP_Query($args);
	if ( $wp_query->have_posts() ) {
		while ( $wp_query->have_posts() ) { 
			$wp_query->the_post();
			apply_filters( 'the_content', 'filter_post_content' );
			echo 'found product " '.get_the_title().' " on machine ID '.$machineID.'<br>';
			echo 'requested: '.$request.'<br>';
			
			if ($request === 'bast_get_itemlink') {
				// product url
				$return = get_permalink(get_the_ID());
				break;
			} elseif ($request === 'bast_get_itemfoto_id') {
				
				// image attachment 
			    $attachments = get_posts(array(
					'post_type' => 'attachment',
					//'post_mime_type' => 'image',
					'posts_per_page' => -1,
					'post_parent' => get_the_ID()
					//'exclude'     => get_post_thumbnail_id()
				));
				// as the requested value should be 1, but the array starts at 0.. remove 1 from value
				if (!empty($attachments) && ($attachments != false) && !empty($attachments[$value-1]) && ($attachments[$value-1] != false) ) {
				
					$return = wp_get_attachment_image_src( $attachments[$value-1]->ID, 'full')[0];
					
					echo 'attachment found';
					
				} else {
					
					echo 'attachment empty...';	
				
				} 
				echo 'total amount of images found on this machine: '.count($attachments);
				break;	
			}		
			break;			
		}	
		/* Reset Post Data after loop */
		wp_reset_postdata();
	} else {
		echo 'query on machine ID '.$machineID.' found no product. Exit.';	
	}	
	
	return $return;	
}	

function filter_post_content( $content ) {
    // Check if we're inside the main loop in a single post page.
    if ( is_single() && in_the_loop() && is_main_query() ) {
        return $content;
    }
    return $content;
}

$finaldestination = fetch($machineID,$request,$value);
if (empty($finaldestination)) {
	exit();
}	

//echo 'destination set: '.$finaldestination.'<br>';
header("Location: ".$finaldestination, true, 302);


?>

 

Link to comment
Share on other sites

@requinix

I accidentally made the previous post about this without content ? I didn't clean or completely finish the script yet, but the following works:
xml is split into chunks, chunks are read by a called script per chunk, then urls are written back into an xml chunk.

For some reason the header location is always empty, so the query is written back instead of a direct link.
When I try to get the header from the first script, It returns an array and location perfectly fine. From the second script (using same query url) it returns empty which is the main problem.

 

Link to comment
Share on other sites

  • 2 weeks later...

I rewrote the script, so I could process the whole xml by running it multiple times. Never figured out why the chunk processing in the old script didn't work, didn't get anything useful in my logs. My new script works so problem solved ?

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.