Jump to content

DOM Experiments


phpsane

Recommended Posts


Folks,

 

I don't really understand this DOM. So, care to explain in your own layman's way ?

 

I am told this line creates DOM from url or file:

 


 
// Create DOM from URL or file
$html = file_get_html('https://www.youtube.com/feed/trending?gl=GB');
 

 

A PHP object was just created with the YouTube page structure.

 

I grabbed the code from this tutorial:


 

Q1. So, this DOM copies a page's structure does it ?

 

Anyway, look what the following code does. It extracts youtube search result links:

 


 

<?php
 
/*
ERROR HANDLING
*/
declare(strict_types=1);
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);
 
//Tuturial or code from: http://blog.endpoint.com/2016/07/scrape-web-content-with-php-no-api-no.html
 
require('simple_html_dom.php'); //http://simplehtmldom.sourceforge.net/
 
// Create DOM from URL or file
$html = file_get_html('https://www.youtube.com/feed/trending?gl=GB');
 
// creating an array of elements
$videos = [];
 
// Find top ten videos
$i = 1;
foreach ($html->find('li.expanded-shelf-content-item-wrapper') as $video) {
        if ($i > 10) {
                break;
        }
 
        // Find item link element 
        $videoDetails = $video->find('a.yt-uix-tile-link', 0);
 
        // get title attribute
        $videoTitle = $videoDetails->title;
 
        // get href attribute
        $videoUrl = 'https://youtube.com' . $videoDetails->href;
 
        // push to a list of videos
        $videos[] = [
                'title' => $videoTitle,
                'url' => $videoUrl
        ];
 
        $i++;
}
 
var_dump($videos);
 
?>
 

 

I would like the results to be shown structured like this. Like line by line.


array(10) {

  [0]=>

  array(2) {

    ["title"]=>

    string(90) "Enzo Amore & Big Cass help John Cena even the odds against The Club: Raw, July 4, 2016"

    ["url"]=>


  }

  [1]=>

  array(2) {

    ["title"]=>

    string(77) "Loose Women Reveal Sex Toys Confessions In Hilarious Discussion | Loose Women"

    ["url"]=>


  }

  [2]=>

  array(2) {

    ["title"]=>

    string(51) "Tinie Tempah - Mamacita ft. Wizkid (Official Video)"

    ["url"]=>


  }

  [3]=>

  array(2) {

    ["title"]=>

    string(54) "Michael Gove's Shows you What's Under his Kilt"

    ["url"]=>


  }

  [4]=>

  array(2) {

    ["title"]=>

    string(25) "Deception, Lies, and CSGO"

    ["url"]=>


  }

  [5]=>

  array(2) {

    ["title"]=>

    string(68) "Last Week Tonight with John Oliver: Independence Day (Web Exclusive)"

    ["url"]=>


  }

  [6]=>

  array(2) {

    ["title"]=>

    string(21) "Last Week I Ate A Pug"

    ["url"]=>


  }

  [7]=>

  array(2) {

    ["title"]=>

    string(59) "PEP GUARDIOLA VS NOEL GALLAGHER | Exclusive First Interview"

    ["url"]=>


  }

  [8]=>

  array(2) {

    ["title"]=>

    string(78) "Skins, lies and videotape - Enough of these dishonest hacks. [strong language]"

    ["url"]=>


  }

  [9]=>

  array(2) {

    ["title"]=>

    string(62) "We Are America ft. John Cena | Love Has No Labels | Ad Council"

    ["url"]=>


  }

}



 

But my results are shown all huddled-up like this:

 

array(10) { [0]=> array(2) { ["title"]=> string(34) "FAMILY REACT TO MY NEXT DISS TRACK" ["url"]=> string(39) "https://youtube.com/watch?v=TLdGFtErSE4" } [1]=> array(2) { ["title"]=> string(94) "Grace Davies goes back to her Roots with heartfelt song | Auditions Week 1 | The X Factor 2017" ["url"]=> string(39) "https://youtube.com/watch?v=YTNd3qhgv8M" } [2]=> array(2) { ["title"]=> string(43) "Giving My Best-Guy-Mate A Makeover | Zoella" ["url"]=> string(39) "https://youtube.com/watch?v=J7jYSPoAowk" } [3]=> array(2) { ["title"]=> string(30) "Wardrobe DIY! | The Weekly #33" ["url"]=> string(39) "https://youtube.com/watch?v=sI4DrRyES2U" } [4]=> array(2) { ["title"]=> string(39) "Taylor Swift - ...Ready For It? (Audio)" ["url"]=> string(39) "https://youtube.com/watch?v=T62maKYX9tU" } [5]=> array(2) { ["title"]=> string(44) "The Try Guys Throw A $300,000 Bachelor Party" ["url"]=> string(39) "https://youtube.com/watch?v=BLvbtXzGI48" } [6]=> array(2) { ["title"]=> string(24) "I FLEW THIS TINY PLANE!!" ["url"]=> string(39) "https://youtube.com/watch?v=9G1v5iytjjI" } [7]=> array(2) { ["title"]=> string(44) "Real-Time HOMEWARE Haul! Fleur De Force (Ad)" ["url"]=> string(39) "https://youtube.com/watch?v=y4RFqzuSVbY" } [8]=> array(2) { ["title"]=> string(39) "50 AMAZING Facts to Blow Your Mind! #87" ["url"]=> string(39) "https://youtube.com/watch?v=LaSdXVgxk3M" } [9]=> array(2) { 

["title"]=> string(50) "Enjoy or Destroy?! | 10 Ridiculous Amazon Products" ["url"]=> string(39) "https://youtube.com/watch?v=zhK-XjrFwhg" } }

 

 

How to fix this ? Where to add the br html tag ? I tried in many places but no luck! Instead, I keep getting errors and so took them out.

Any suggested spots for the br html tag on the php script flow ?

Link to comment
Share on other sites

If you want to explore advanced topics such as this, perhaps you need to advance your knowledge by doing some reading and learning.  Have you read any books/manuals on PHP?  And in this particular query you might want to learn about some other HTML tags for your output.

Link to comment
Share on other sites

var_dump() doesn't have any formatting. It's meant to dump the variable so you can see what it contains. It's for troubleshooting. Why would you need a "pretty print" when it's just for troubleshooting?

 

Yes. I know. var_dump() does not use any formatting. I had missed the var_dump() in that tutorial's code:

http://blog.endpoint.com/2016/07/scrape-web-content-with-php-no-api-no.html

 

Look over there how nice they presented the result. Neat. They probably used some proper html tag on the code which they did not provide their readers in the sample they provided.

I will look-up the pre html tag, like ginerjm suggested.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.