codeinphp Posted May 25, 2015 Share Posted May 25, 2015 I am attempting to parse an html page and save the results to an XML file. The purpose of the script is to create a program guide for tv. The script first parses the 'img', if found and matches criteria it will then proceed into getting the program name, program time and description. I can get it to do all of this but it saves all the channels into each xml, not just the info for the particular channel. Example, it reads the html, gets the first channel, say A&E, it then parses info for all the channels in the html and saves all the program info to all the xml, so I end up with 25 xml files, all named based on the different channels, but all containing program info for all channels. I suspect I have something wrong in the loop but can't locate. Any help appreciated. The code below leaves out curl to get $html, not really needed for problem. <?php #CREATE DOM PARSER $dom = new DOMDocument(); $dom->loadHTML($html); $xpath = new DOMXPath($dom); $dom->formatOutput = true; $dom->preserveWhiteSpace = true; $images= $dom->getElementsByTagName('img'); $childprogram = $xpath->query('//span[@class="prog_name"]'); $childtime= $xpath->query('//div[@class="prog_time"]'); $childdescrip= $xpath->query('//div[@class="prog_desc"]'); foreach($images as $img){ $xml = new DOMDocument("1.0"); $root = $xml->createElement("programme"); $book = $xml->createElement("tvprogram"); $icon= $img ->getAttribute('src'); if( preg_match('/\.(jpg|jpeg|gif)(?:[\?\#].*)?$/i', $icon) ) { //only matching types $channel= $img ->getAttribute('alt'); foreach ($childprogram as $programname) { foreach ($childtime as $programtime) { foreach ($childdescrip as $descrip) { $xml->appendChild($root); $title = $xml->createElement("Channel"); //CHANNEL NAME $showname= $xml->createElement("programname"); //PROGRAM NAME $showtime= $xml->createElement("programtime"); //PROGRAM TIME $descriptime= $xml->createElement("description"); //PROGRAM DESCRIPTION $titleText = $xml->createTextNode($channel); $shownameText= $xml->createTextNode($programname->nodeValue); $showtimeText= $xml->createTextNode($programtime->nodeValue); $showdescripText= $xml->createTextNode($descrip->nodeValue); $title->appendChild($titleText); $showname->appendChild($shownameText); $showtime->appendChild($showtimeText); $descriptime->appendChild($showdescripText); $book->appendChild($title); $book->appendChild($showname); $book->appendChild($showtime); $book->appendChild($descriptime); } } } $root->appendChild($book); }//END OF LOOP $xml->formatOutput = true; $xml->save(dirname(__FILE__)."/streamguideXML/".$channel.".xml") or die("Error"); } //END OF FUNCTION ?> Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/ Share on other sites More sharing options...
Psycho Posted May 25, 2015 Share Posted May 25, 2015 At the very end of the script you have this: $xml->save(dirname(__FILE__)."/streamguideXML/".$channel.".xml") or die("Error"); } //END OF FUNCTION However, there is no function in your script. That final bracket is the closing bracket for the first foreach loop - so, yes, you are creating a file for each execution of the loop. Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512604 Share on other sites More sharing options...
codeinphp Posted May 25, 2015 Author Share Posted May 25, 2015 Thank you, that //End Function was left over from something. I removed it. I can get the script to create a new xml for each channel name (a&e.xml, abc.xml, cbs.xml soforth) but each file had program info for programs for all channels not just the specific one. For example I run the script, all xml files are created. So if I open a&e.xml, I not only have programs for a&e, but for abc, cbs etc. It's not closing and saving the file the specific channel. Thanks again Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512605 Share on other sites More sharing options...
QuickOldCar Posted May 26, 2015 Share Posted May 26, 2015 (edited) I would make each unique channel an associative array in the loop, then foreach unique channels saving the file as xml. There is nowhere in the current code that distinguishes one channel from the others in the loop. It's everything. It would be easier for us with an example html. Something along the lines of this: //before loop $array = array(); //inside loop $array[$channel][] = array("name"=>$programname->nodeValue,"time"=>$programtime->nodeValue,"description"=>$descrip->nodeValue); Later on outside the loop foreach($array as $key=>$value){ //make your xml and save } Edited May 26, 2015 by QuickOldCar Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512672 Share on other sites More sharing options...
codeinphp Posted May 27, 2015 Author Share Posted May 27, 2015 Thank you for the advice, this is something I was looking for. I am new to PHP and was not sure of the best way to accomplish this, thanks again. Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512686 Share on other sites More sharing options...
codeinphp Posted May 27, 2015 Author Share Posted May 27, 2015 Here's an example of html. This what should be parsed for A&E.xml <div class="row"> <div class="col th"> <a class="channel_sched_link" href="javascript:void(0)" title="View A&E full schedule" data-channelid="9"> <img src="http://static.ilive.to/images/tv/AE.JPG" width="30" height="20" alt="A&E" />A&E </a> </div> <div class="prog_cols"> <div class="col ts ts_1 prog_907477 ps_0" data-catid="" > <span class="prog_name">Parking Wars</span> <div class="prog_time">May 27, 2015, 7:00 am - 8:00 am</div> <a class="btn_watchlist " href="javascript:void(0)" data-progid="907477">(+) add to watchlist</a> <div class="prog_desc"> An angry mother and daughter confront a booter in Detroit; and an irate Philadelphia citizen says he got a ticket while trying to help his physically disabled son.<br/> <a class="watchnow" href="http://www.streamlive.to/channels/?q=A%26E">Watch Now</a> </div> </div> <div class="col ts ts_3 prog_907478 ps_1" data-catid="" > <span class="prog_name">Dog the Bounty Hunter</span> <div class="prog_time">May 27, 2015, 8:00 am - 10:00 am</div> <a class="btn_watchlist " href="javascript:void(0)" data-progid="907478">(+) add to watchlist</a> <div class="prog_desc"> Dog pursues two fugitives whose drug problems have hurt their families.<br/> <a class="watchnow" href="http://www.streamlive.to/channels/?q=A%26E">Watch Now</a> </div> </div> </div> <a class="watchnow" href="http://www.streamlive.to/channels/?q=A%26E">Watch Now</a> </div> Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512688 Share on other sites More sharing options...
sKunKbad Posted May 29, 2015 Share Posted May 29, 2015 Having spent way too much time parsing and creating XML, I have to ask why XML would be anyone's first choice. I prefer to just serialize the data. If the goal is to store data so you can use it later, then not creating XML means not parsing it later. Serialization retains the data types, which is handy. If you don't need to retain data types, then json encoding is a good option. Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512788 Share on other sites More sharing options...
codeinphp Posted May 29, 2015 Author Share Posted May 29, 2015 Well, I am using the xml later time. What is going is my php script would run say every hour or so to update programming information. Each time it runs it will over write the existing xml. So when I go to a particular channel it will display the info in the xml for that channel. Either way I have to get all of the channels programs and times together and that's where I am not successful. Quote Link to comment https://forums.phpfreaks.com/topic/296480-parse-html-save-to-xml/#findComment-1512828 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.