Extract values of attributes of elements with Simple HTML-DOM-Parser

dil_bert · February 14, 2018

hello dear freaks,

try to extract some lines out of a webpage - with following technique: with the Extraction of values of attributes of elements with Simple HTML-DOM-Parser.

Here is what i have gathered and learned:

Try to retrieve the contents of a div from the external site withg PHP, and XPath: This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:

Here the example; view-source:https://wordpress.org/plugins/participants-database/

and https://wordpress.org/plugins/participants-database/

goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up:

view-source:https://wordpress.org/plugins/participants-database/

<div class="entry-meta">
        <div class="widget plugin-meta">
        <h3 class="screen-reader-text">Meta</h3>

        <ul>
            
            <li>Version: <strong>1.7.7.6</strong></li>
            <li>
                Last updated: <strong><span>5 days</span> ago</strong>            </li>
            <li>Active installations: <strong>10,000+</strong></li>

                            <li>
                Requires WordPress Version:<strong>4.0</strong>                </li>
            
                            <li>Tested up to: <strong>4.9.4</strong></li>

or here : view-source:https://wordpress.org/plugins/wp-job-manager/

</ul>
<p>See additional changelog items in changelog.txt</p></div>
    </div><!-- .entry-content -->

    <div class="entry-meta">
        <div class="widget plugin-meta">
        <h3 class="screen-reader-text">Meta</h3>

        <ul>
            
            <li>Version: <strong>1.29.3</strong></li>
            <li>
                Last updated: <strong><span>2 weeks</span> ago</strong>            </li>
            <li>Active installations: <strong>100,000+</strong></li>

                            <li>
                Requires WordPress Version:<strong>4.3.1</strong>                </li>
            
                            <li>Tested up to: <strong>4.9.4</strong></li>

Proceedings; i checked the source of the webpage. i tried to find out whether the texte is related to some kind of pattern.
i have looked closely and found that all of them have class=”widget plugin-meta”.
Well - This will make extracting them, a piece of cake. I tried with the code below helps to filter html elements based on values of attributes.

<?php
 
include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('a[class="widget plugin-meta"]') as $text) {
 $text[] = $text->plaintext;
}
print_r($headlines);

?>

but unfortunatly this ends up in a bad result

dalecosp · February 15, 2018

Looks to me like the only element with a plugin-meta class attribute is a DIV, not an A. You might start there?

dil_bert · February 16, 2018

hello dear dalescop,

many thanks - i will look after this.

i will try out this

<?php
 
include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('DIV[class="widget plugin-meta"]') as $text) {
 $text[] = $text->plaintext;
}
print_r($headlines);

?>

i will try out this later . - atm i am in office.

greetings

dil_bert · February 16, 2018

hello dear all

with a quick try i ve gotten this back:

martin@linux-3645:~/dev/php> php p100.php

PHP Warning:  include(simple_html_dom): failed to open stream: No such file or directory in /home/martin/dev/php/p100.php on line 4
PHP Warning:  include(): Failed opening 'simple_html_dom' for inclusion (include_path='.:/usr/share/php5:/usr/share/php5/PEAR') in /home/martin/dev/php/p100.php on line 4
PHP Fatal error:  Call to undefined function file_get_html() in /home/martin/dev/php/p100.php on line 6
martin@linux-3645:~/dev/php>

guess that i have to have a closer look what goes wrong here.

goal: i need the following data:

output: But the output is zero....
background:

my way to get the xpath; use google chrome: I have a webpage I want to get some data off:

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/

Version: 1.29.3
Last updated: 5 days ago
Active installations: 100,000+

i want to have a little database that runs locally - with those data of my favorite-plugins.

so i want to fetch the data automatically - with a chron job

taquitosensei · February 16, 2018

php can't find simple_html_dom you're missing the php extension

It should probably be

include(simple_html_dom.php);

then if it's not in the same folder as your p100.php then you'll need to use the relative path to it.

Sign In

Extract values of attributes of elements with Simple HTML-DOM-Parser

Recommended Posts

dil_bert

Link to comment

Share on other sites

dalecosp

Link to comment

Share on other sites

dil_bert

Link to comment

Share on other sites

dil_bert

Link to comment

Share on other sites

taquitosensei

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information