Seda145 Posted March 15, 2019 Share Posted March 15, 2019 Hi everyone! I've been working on a php script to replace links that contain a query with direct links to the files they would redirect to. The script has to work with a large xml file, which might take a few minutes. Because of the xml size the script will never finish. The php execution time limit prevents the script from finishing and can't be set to a higher value. I rewrote my script, so it splits the xml file into multiple "chunk" files that will be processed later. For each chunk, I call a second php script from within the first. So each xml chunk can be processed at the same time. For example, for 20 xml chunks, the second php script will be launched 20 times per chunk. To convert the queries , the second script uses get_headers($url, 1) on each link in the xml chunk, which then should return a direct link as $headers['Location']; BUT, the header always returns empty. For some reason I could get the headers when I used get_headers() from the first php script without calling the second ones that process the chunks... I spend a lot of time trying to figure out what is going on, I included the files I'm using. Can someone tell me what I'm doing wrong? In this case I have to rewrite the xml with this script. Situation is not optimal, I'm running the chunk script because we're stuck with the execution time limit too. Export filter: This loops over a xml file and splits it into chunks <?php //ini_set('max_execution_time', 10); // ---- includes if ( ! defined('ABSPATH') ) { require_once( dirname( __FILE__ ) . '/wp-load.php' ); } // ---- end includes // console function vwaconsole($input) { $disabled = false; if ($disabled === false) { $a = print_r($input.'</br>'); $a = $input; echo "<script>console.log( '--log--: " . $a . "' );</script>"; } } //end console // settings // $chunk_size = 20; // $home = constant( 'ABSPATH' ); $xml_path = $home."/wp-content/uploads/wpallimport/files/Bastiaansen.xml"; if (fopen($xml_path,"r") != true) { vwaconsole("xml file does not exist"); exit(); } $xml = new DOMDocument(); $xml->formatOutput = true; $xml->preserveWhiteSpace = false; $xml->load($xml_path); // Main program // creates another file to be filled by sub scripts if (!empty($xml)) { vwaconsole("Running main program"); // backup xml first $date = "_".date("Y M D h i"); $date = str_replace(' ', '_', $date); $xpath = new DOMXpath($xml); $items = $xpath->query("//aanbiedingen//item"); $loopcount = 0; $processedamount = 0; $islast = 0; vwaconsole('total items: '.$items->length); vwaconsole('chunk size: '.$chunk_size); echo'<br>'; /* chunks processing: */ $chunkxml = new DOMDocument(); $chunkxml->formatOutput = true; $chunkxml->preserveWhiteSpace = false; $counter = 0; $chunkroot = null; foreach($items as $item) { //vwaconsole('processing item'); if ($chunkroot === null) { $chunkroot = $chunkxml->createElement('root'); $chunkxml->appendChild($chunkroot); //vwaconsole('created root'); } if (($processedamount + $chunk_size) > $items->length) { //vwaconsole("last chunk in progress..."); $islast = 1; } $chunkitem = $chunkxml->createElement($item->nodeName); $chunkroot->appendChild($chunkitem); //vwaconsole('appended child item to root'); foreach($item->childNodes as $spec) { //vwaconsole('processing specs in item'); $chunkspec = $chunkxml->createElement($spec->nodeName); $chunkitem->appendChild($chunkspec); $chunkspectext = $chunkxml->createTextNode($spec->nodeValue); $chunkspec->appendChild($chunkspectext); } $counter++; $processedamount++; if ($counter >= $chunk_size) { $chunkxml->save("wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_".$loopcount.".xml"); vwaconsole("saved array chunk"); $output = `php export_filter_chunk_processor.php $loopcount $islast `; vwaconsole($output); vwaconsole("creating new array chunk"); $chunkxml = new DOMDocument(); $chunkxml->formatOutput = true; $chunkxml->preserveWhiteSpace = false; $loopcount++; $counter = 0; $chunkroot = null; } if ($items->length === $processedamount) { $chunkxml->save("wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_".$loopcount.".xml"); vwaconsole("finished saving last chunk"); } } // merge documents later.. /* $newxml = new DOMDocument("1.0", "utf-8"); $newxml->formatOutput = true; $newxml->preserveWhiteSpace = false; $itemContainer = $newxml->createElement('aanbiedingen'); $newxml->appendChild($itemContainer); //$newxml->save("wp-content/uploads/wpallimport/files/TEMP_Bastiaansen.xml"); */ vwaconsole("main ending"); exit(); } else { //vwaconsole("xml is empty ?! exiting"); exit(); } ?> Export chunk processor: The xml was split by the previous script. This one takes one of the chunks and calls get_headers() , sending a link containing a query multiple of this script run at same time. The queries are then picked up by the last php file. <?php //ini_set('max_execution_time', 10); // console function vwaconsole($input) { $disabled = false; if ($disabled === false) { $a = print_r($input.'</br>'); $a = $input; echo "<script>console.log( '--log--: " . $a . "' );</script>"; } } //end console //echo'<br>'; print_r("called chunk processor > Chunk processor started. "); //echo'<br>'; $loopcount=$argv[1]; $islast=$argv[2]; if ($loopcount === null || $islast === null) { print_r("CHUNK PROCESSOR ERROR > loop count is empty"); exit(); } else { print_r("CHUNK PROCESSOR variables set. loop count: ".$loopcount." is last: ".$islast."<br>"); } if ( ! defined('BAST_ROOT_DIR') ) { define('BAST_ROOT_DIR', __DIR__); } $home = constant( 'BAST_ROOT_DIR' ); $xml_path = $home."/wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_".$loopcount.".xml"; if (fopen($xml_path,"r") != true) { print_r("chunk file was not found at path: ".$xml_path); exit(); } $xmlChunk = new DOMDocument(); //$xmlChunk = new DOMDocument(); $xmlChunk->formatOutput = true; $xmlChunk->preserveWhiteSpace = false; $xmlChunk->load($xml_path); if (empty($xmlChunk)) { print_r("chunk data is empty, exiting"); exit(); } else { print_r("got chunk data <br>"); } $xpath = new DOMXpath($xmlChunk); $item = $xpath->query("//root//item"); //$items = $xmlChunk->getElementsByTagName('//root//item'); foreach ($item as $node) { echo'new item: <br>'; foreach ($node->childNodes as $spec) { //vwaconsole("processing spec: ".$spec->nodeName); if (($spec->nodeName == 'itemlink')) { $memValue = $spec->nodeValue; //setting new url spec node $spec->nodeValue = processItemLink($memValue); // add new itemfoto nodes to xml } elseif(($spec->nodeName == 'itemfoto1')) { $memValue = $spec->nodeValue; $imgurls = array(); $first = true; for ($i = 1; $i < 50; $i++) { if ($first === false) { $memValue = str_replace('bast_get_itemfoto_id='.strval($i-1), 'bast_get_itemfoto_id='.strval($i), $memValue); } vwaconsole("trying to get a itemfoto with id: ".strval($i)." link: " .$memValue); $triedFotoUrl = processItemLink($memValue); vwaconsole("tried url: ".$triedFotoUrl); if($triedFotoUrl !== $memValue) { $imgurls[] = $triedFotoUrl; } else { vwaconsole("tried url and processed url are equal. end of loop. No new images left or error happened."); break; } $first = false; } //remove the old single itemfoto from xml $spec->parentNode->removeChild($spec); // add sorted nodes from image array if (!empty($imgurls)) { sort($imgurls, SORT_NATURAL); print_r('<br> after sort;'); print_r($imgurls); print_r('<br>'); $arlength = count($imgurls); for($x = 0; $x < $arlength; $x++) { $newItemFoto = $xml->createElement('itemfoto'.strval($x+1)); $newItemFotoText = $xml->createTextNode($imgurls[$x]); $newItemFoto->appendChild($newItemFotoText); $node->appendChild($newItemFoto); } unset($imgurls); } } } echo'<br>'; } //returns string url //follow the query url from xml, return a direct link or returns input on failure. query is processed in Bastiaansen.php. function processItemLink($url) { if (!empty($url)) { $headers = get_headers($url, 1); if(!empty($headers['Location'])) { vwaconsole('test returning header location: '.$headers['Location']); return $headers['Location']; } else { vwaconsole("header empty ?!?! can't convert query"); } } return $url; } $xmlChunk->save("wp-content/uploads/wpallimport/files/chunks_bast/bast_chunk_test_".$loopcount.".xml"); ?> When the chunk processor calls a query link with get_headers(), the query will be noticed by the next script, the query converter: <?php if ( ! defined('ABSPATH') ) { require_once( dirname( __FILE__ ) . '/wp-load.php' ); } // ---- vars $machineID = $_GET['bast_get_machine_id']; $itemlink = $_GET['bast_get_itemlink']; $itemfoto_id = $_GET['bast_get_itemfoto_id']; if (!empty($itemlink)) { $request = 'bast_get_itemlink'; $value = $itemlink; } elseif (!empty($itemfoto_id)) { $request = 'bast_get_itemfoto_id'; $value = $itemfoto_id; } // ---- end vars if (empty($machineID)) { echo 'please enter query ?bast_get_machine_id=(number) first'; exit(); } if (empty($request) || empty($value)) { echo 'Valid query request would be: bast_get_itemlink <br>'; echo 'Valid query request would be: bast_get_itemfoto_id <br>'; exit(); } // register custom query options function sm_register_query_vars( $vars ) { // get the right custom field names $vars[] = $request; $vars[] = 'pa_'.$request; return $vars; } add_filter( 'query_vars', 'sm_register_query_vars' ); //returns null or string url function fetch($machineID,$request,$value) { $return = null; $args = array( 'orderby' => 'meta_value_num', 'meta_key' => 'bast_get_machine_id', 'meta_type' => 'NUMERIC', 'post_type' => 'product', 'posts_per_page' => -1, 'order' => 'ASC', 'meta_query' => array( array( 'key' => 'bast_get_machine_id', 'type' => 'NUMERIC', 'value' => $machineID, 'compare' => 'EXISTS', ) ) ); $wp_query = new WP_Query($args); if ( $wp_query->have_posts() ) { while ( $wp_query->have_posts() ) { $wp_query->the_post(); apply_filters( 'the_content', 'filter_post_content' ); echo 'found product " '.get_the_title().' " on machine ID '.$machineID.'<br>'; echo 'requested: '.$request.'<br>'; if ($request === 'bast_get_itemlink') { // product url $return = get_permalink(get_the_ID()); break; } elseif ($request === 'bast_get_itemfoto_id') { // image attachment $attachments = get_posts(array( 'post_type' => 'attachment', //'post_mime_type' => 'image', 'posts_per_page' => -1, 'post_parent' => get_the_ID() //'exclude' => get_post_thumbnail_id() )); // as the requested value should be 1, but the array starts at 0.. remove 1 from value if (!empty($attachments) && ($attachments != false) && !empty($attachments[$value-1]) && ($attachments[$value-1] != false) ) { $return = wp_get_attachment_image_src( $attachments[$value-1]->ID, 'full')[0]; echo 'attachment found'; } else { echo 'attachment empty...'; } echo 'total amount of images found on this machine: '.count($attachments); break; } break; } /* Reset Post Data after loop */ wp_reset_postdata(); } else { echo 'query on machine ID '.$machineID.' found no product. Exit.'; } return $return; } function filter_post_content( $content ) { // Check if we're inside the main loop in a single post page. if ( is_single() && in_the_loop() && is_main_query() ) { return $content; } return $content; } $finaldestination = fetch($machineID,$request,$value); if (empty($finaldestination)) { exit(); } //echo 'destination set: '.$finaldestination.'<br>'; header("Location: ".$finaldestination, true, 302); ?> Quote Link to comment Share on other sites More sharing options...
Seda145 Posted March 16, 2019 Author Share Posted March 16, 2019 (edited) @requinix I accidentally made the previous post about this without content ? I didn't clean or completely finish the script yet, but the following works: xml is split into chunks, chunks are read by a called script per chunk, then urls are written back into an xml chunk. For some reason the header location is always empty, so the query is written back instead of a direct link. When I try to get the header from the first script, It returns an array and location perfectly fine. From the second script (using same query url) it returns empty which is the main problem. Edited March 16, 2019 by Seda145 Quote Link to comment Share on other sites More sharing options...
requinix Posted March 17, 2019 Share Posted March 17, 2019 Make sure you have PHP errors being reported or logged correctly: if get_headers() is not returning a Location then either the URL is not sending one or get_headers() somehow failed. Quote Link to comment Share on other sites More sharing options...
Seda145 Posted March 28, 2019 Author Share Posted March 28, 2019 I rewrote the script, so I could process the whole xml by running it multiple times. Never figured out why the chunk processing in the old script didn't work, didn't get anything useful in my logs. My new script works so problem solved ? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.