TheStudent2023 Posted May 2, 2023 Share Posted May 2, 2023 Programmers, Which one is new out of the following 2 and why some prefer one over the other ? What are their strengths & weaknesses when compared with each other ? A. DomDocument s. simple_html_doc. Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 2, 2023 Author Share Posted May 2, 2023 (edited) Fellow Programmers, General Php to extract Meta Tags <?php $meta_tags = get_meta_tags('http://www.example.com/'); print_r($tags); ?> // Output: Array ( [keywords] => this is the keywords [description] => this is the description ) DomDocument to extract Meta Tags <?php function file_get_contents_curl($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); $data = curl_exec($ch); curl_close($ch); return $data; } $url = "'http://www.example.com/code"; $html = file_get_contents_curl($url); // Load HTML to DOM object $doc = new DOMDocument(); @$doc->loadHTML($html); // Parse DOM to get Title data $nodes = $doc->getElementsByTagName('title'); $title = $nodes->item(0)->nodeValue; // Parse DOM to get metadata $metas = $doc->getElementsByTagName('meta'); for ($i = 0; $i < $metas->length; $i++) { $meta = $metas->item($i); if($meta->getAttribute('name') == 'description') $description = $meta->getAttribute('content'); if($meta->getAttribute('name') == 'keywords') $keywords = $meta->getAttribute('content'); } echo "Title: $title". '<br/><br/>'; echo "Description: $description". '<br/><br/>'; echo "Keywords: $keywords"; ?> Anyone can shorten the above without sacrificing on quality ? Edited May 2, 2023 by TheStudent2023 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 2, 2023 Author Share Posted May 2, 2023 (edited) Can someone show me how to extract meta tags & page title using simple_html_dom() ? I want to compare it's code with DomDocument's code. That is all. Edited May 2, 2023 by TheStudent2023 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 2, 2023 Author Share Posted May 2, 2023 @mac_gyver If you do not mind, can you show me 2 things ? How to extract link anchors using: A). DomDocument; B). simple_html_dom. And show me where you learnt the snippets from so I can learn more tag extractions from the documents rather than needing to pester you like this. Link to comment Share on other sites More sharing options...
Barand Posted May 2, 2023 Share Posted May 2, 2023 https://enb.iisd.org/_inc/simple_html_dom/manual/manual.htm 1 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 2, 2023 Author Share Posted May 2, 2023 @requinix I believe you have done web scraping before but with what ? DomDocument or simple_html_dom ? Do you know how to scrape html form's dropdown options ? Say, you want to scrape all the options from the dropdown you see here: https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_select How would you write the code with DomDocument and how would you with simple_html_dom ? If you can be kind enough to show me these 2 then I will try learning them and then try myself to get writing code for a scraper to scrape options off from a radio button and from a checkbox. Then, I can go deeper to scrape options from multi dropdowns and so on. Go deeper into the rabbit whole. As of now, I got no clue where to start. So, care to guide me ? Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 2, 2023 Author Share Posted May 2, 2023 (edited) @barand Thanks a lot. I appreciate it. But unfortunately, nearly sunrise here and so I got to check your link out the next night. In the meanwhile, care to show me 2 snippets (one using DomDoc, other using simple_html_dom) how to scrape inputs from a text input field (<input_type = 'text'>) from html form input field or from search box (like google) ? Once I have learnt that from you, I will try scraping text inputs from blocktext (<textarea>). Getting warmed up to learn scraping as it would make my programming easier to finish building the web crawler. Spider. Thanks! Edited May 2, 2023 by TheStudent2023 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 4, 2023 Author Share Posted May 4, 2023 @barand Ok. This is how DomDocument to extract meta tags. Before, I look into your suggested link to learn how to extract meta tags using simple_html_dom parser, what do you think of the following code ? <?php $url = " $html = file_get_contents($url); // Initiate ability to manipulate the DOM and load that baby up $doc = new DOMDocument(); libxml_use_internal_errors(true); $doc->loadHTML($html, LIBXML_COMPACT|LIBXML_NOERROR|LIBXML_NOWARNING); libxml_clear_errors(); // Fetch all <meta> tags $meta_tags = $doc->getElementsByTagName('meta'); if ($meta_tags->length > 0) { foreach ($meta_tags as $tag) { // e.g. name="robots" and content="noindex" echo '<b>Meta Name: </b>' .$name = $tag->getAttribute('name'); echo '<br>'; echo '<b>Meta Description: </b>' .$content = $tag->getAttribute('content'); echo '<br>'; } } Can this be improved in quality and cut down on quanity (lines of code) ? Do you see any errors in the code ? Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 4, 2023 Author Share Posted May 4, 2023 // Find all anchors and images with the "title" attribute $ret = $html->find('a[title], img[title]'); Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 4, 2023 Author Share Posted May 4, 2023 (edited) https://enb.iisd.org/_inc/simple_html_dom/manual/manual.htm#section_find // Find all anchors and images with the "title" attribute $ret = $html->find('a[title], img[title]'); I do not understand the comment. What does that code do ? How can an anchor have a "title" attribute ? Show me an example. Same goes for an img with "title" attribute. Edited May 4, 2023 by TheStudent2023 Link to comment Share on other sites More sharing options...
Barand Posted May 4, 2023 Share Posted May 4, 2023 Believe it or not, it finds all images and anchors that have a "title" attribute. 9 minutes ago, TheStudent2023 said: How can an anchor have a "title" attribute ? 1 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 4, 2023 Author Share Posted May 4, 2023 (edited) @Barand Thanks! I forgot about that tooltip. That it's called title. Infact, I wrote this code few weeks ago .... function item_submission_form_part_one() { ?> <div style='font-family:verdana;font-size:15px;color:black;text-align:center;' name="item_submission_form" id="item_submission_form" align="center" size="50%"> <form style="background-color:white;" method="POST" action="" name="submit_form_p1" id="submit_form_p1"> <fieldset> <legend align="center"><h3 style="color:black;">Link Submission Form - Part 1/3</h3></legend> <label for="product_type">Product Type:</label> <select name="product_type" id="product_type" title="Select product type"> <option value=""></option> <option value="physical_product" <?php if(ISSET($_POST['product_type']) && !EMPTY($_POST['product_type']) && $_POST['product_type']=='physical product'){$product_type = $_POST['product_type']; echo 'selected';}?>> Physical Product</option> <option value="intangible product" <?php if(ISSET($_POST['product_type']) && !EMPTY($_POST['product_type']) && $_POST['product_type']=='intangible product'){$product_type = $_POST['product_type']; echo 'selected';}?>> Intangible Product</option> <option value="service" <?php if(ISSET($_POST['product_type']) && !EMPTY($_POST['product_type']) && $_POST['product_type']=='service'){$product_type = $_POST['product_type']; echo 'selected';}?>> Service</option> </select> <br> Listing Type: <input type="radio" name="listing_type" id="wanted" title="Check listing type" value="wanted" <?php if(ISSET($_POST['listing_type']) && !EMPTY($_POST['listing_type']) && $_POST['listing_type']=='wanted'){$listing_type = $_POST['listing_type']; echo 'checked';}?>> <label for="under_18">Wanted Item:</label> <input type="radio" name="listing_type" id="have" title="Check listing type" value="have" <?php if(ISSET($_POST['listing_type']) && !EMPTY($_POST['listing_type']) && $_POST['listing_type']=='have'){$listing_type = $_POST['listing_type']; echo 'checked';}?>> <label for="over_18">Have Item:</label> <br> <label for="item">Item</label> <input type="text" name="item" id="item" size="50" minlength="2" maxlength="255" title="Input your item" <?php if(ISSET($_POST['item']) && !EMPTY($_POST['item'])){$item = $_POST['item']; echo 'value="'.$item.'"';}else{echo 'placeholder="'.'Item ...'.'"';}?>> <br> <label for="manufacturer">Manufacturer</label> <input type="text" name="manufacturer" id="manufacturer" size="50" minlength="2" maxlength="255" title="Input your item manufacturer" <?php if(ISSET($_POST['manufacturer']) && !EMPTY($_POST['manufacturer'])){$manufacturer = $_POST['manufacturer']; echo 'value="'.$manufacturer.'"';}else{echo 'placeholder="'.'Manufacturer ...'.'"';}?>> <br> <label for="brand">Brand</label> <input type="text" name="brand" id="brand" size="50" minlength="2" maxlength="255" title="Input your item brand" <?php if(ISSET($_POST['brand']) && !EMPTY($_POST['brand'])){$brand = $_POST['brand']; echo 'value="'.$brand.'"';}else{echo 'placeholder="'.'Brand ...'.'"';}?>> <br> <label for="model">Model</label> <input type="text" name="model" id="model" size="50" minlength="2" maxlength="255" title="Input your item model" <?php if(ISSET($_POST['model']) && !EMPTY($_POST['model'])){$model = $_POST['model']; echo 'value="'.$model.'"';}else{echo 'placeholder="'.'Model ...'.'"';}?>> <br> <label for="serial_number">Serial Number</label> <input type="text" name="serial_number" id="serial_number" size="50" minlength="2" maxlength="255" title="Input your item serial_number" <?php if(ISSET($_POST['serial_number']) && !EMPTY($_POST['serial_number'])){$serial_number = $_POST['serial_number']; echo 'value="'.$serial_number.'"';}else{echo 'placeholder="'.'Serial Number ...'.'"';}?>> <br> <label for="year">Year</label> <input type="text" name="year" id="year" size="50" minlength="2" maxlength="255" title="Input your item year" <?php if(ISSET($_POST['year']) && !EMPTY($_POST['year'])){$year = $_POST['year']; echo 'value="'.$year.'"';}else{echo 'placeholder="'.'Year ...'.'"';}?>> <br> <label for="currency">Currency</label> <input type="text" name="currency" id="currency" size="50" minlength="2" maxlength="255" title="Input your item currency" <?php if(ISSET($_POST['currency']) && !EMPTY($_POST['currency'])){$currency = $_POST['currency']; echo 'value="'.$currency.'"';}else{echo 'placeholder="'.'Currency ...'.'"';}?>> <br> <label for="price">Price</label> <input type="text" name="price" id="price" size="50" minlength="2" maxlength="255" title="Input your item price" <?php if(ISSET($_POST['price']) && !EMPTY($_POST['price'])){$price = $_POST['price']; echo 'value="'.$price.'"';}else{echo 'placeholder="'.'Price ...'.'"';}?>> <br> <label for="title">Title</label> <input type="text" name="title" id="title" size="50" minlength="2" maxlength="255" title="Input your product page's title" <?php if(ISSET($_POST['title']) && !EMPTY($_POST['title'])){$title = $_POST['title']; echo 'value="'.$title.'"';}else{echo 'placeholder="'.'product page title'.'"';}?>> <br> </fieldset> <fieldset> <button type="submit" name="submit_button_1" id="submit_button_1" title="Submit Form - Part 1/3">submit - Part 1/3</button> </fieldset> </form> </div> <?php } Anyways, that dom parser is not showing me how to extract meta tags and page titles. The closest I found a match is this: [attribute]Matches elements that have the specified attribute. Under the attribute filters tab. But that is not helpful to extract meta tags and page titles. Do you know where in the parser manual it teaches what I looking for ? https://enb.iisd.org/_inc/simple_html_dom/manual/manual.htm#section_find Or better, if you know, then show me the code snippets. And show me where in the doc you found the code snippet. Must extract using simple_html_dom parser. And can you check my DomDocument code above ? Thanks! Edited May 4, 2023 by TheStudent2023 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 4, 2023 Author Share Posted May 4, 2023 (edited) @kicken Tonight, can you teach me how to read simple_html_dom() parser syntax ? I want to extract page title. On this page: https://stackoverflow.com/questions/11385774/how-to-extract-title-and-meta-description-using-php-simple-html-dom-parser I found 4 different programmers showing 4 different ways to code using the simple_html_dom() parser. Look: 1 $meta_title = $html->find("meta[name='title']", 0)->content; 2 $title = $html->find('title',0)->innertext; 3 $title = array_shift($html->find('title'))->innertext; 4 $title = $html->load('title')->simpletext; //<title>**Text from here**</title> Q1. But where did they find these syntaxes in the parser's manual ? I cannot find any of them! Check the mini doc: https://enb.iisd.org/_inc/simple_html_dom/manual/manual.htm#section_find It seems I am missing where they are looking. have to learn to look in the right direction. So, I need your assistance again, I'm afraid. Q2. From your experience, can you rank these 4 codes where best is on top ? ANd let me know why you ranked the way you did. This should teach me to spot best and effective coding practice. Thanks! Edited May 4, 2023 by TheStudent2023 Link to comment Share on other sites More sharing options...
TheStudent2023 Posted May 6, 2023 Author Share Posted May 6, 2023 In the php oop, what does this marker mean: ->. Link to comment Share on other sites More sharing options...
kicken Posted May 7, 2023 Share Posted May 7, 2023 8 hours ago, TheStudent2023 said: In the php oop, what does this marker mean See: Reference Guide: What does this symbol mean in PHP? (PHP Syntax) Link to comment Share on other sites More sharing options...
Recommended Posts