Jump to content
dil_bert

Extract values of attributes of elements with Simple HTML-DOM-Parser

Recommended Posts

hello dear freaks,

 

 

try to extract some lines out of a webpage - with following technique: with the Extraction of values of attributes of elements with Simple HTML-DOM-Parser.

Here is what i have gathered and learned:

Try to retrieve the contents of a div from the external site withg PHP, and XPath: This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:

Here the example; view-source:https://wordpress.org/plugins/participants-database/

and https://wordpress.org/plugins/participants-database/

goal: i need the following data:


 

Version:
Last updated:
Active installations:
Tested up:





view-source:https://wordpress.org/plugins/participants-database/
 

<div class="entry-meta">
        <div class="widget plugin-meta">
        <h3 class="screen-reader-text">Meta</h3>

        <ul>
            
            <li>Version: <strong>1.7.7.6</strong></li>
            <li>
                Last updated: <strong><span>5 days</span> ago</strong>            </li>
            <li>Active installations: <strong>10,000+</strong></li>

                            <li>
                Requires WordPress Version:<strong>4.0</strong>                </li>
            
                            <li>Tested up to: <strong>4.9.4</strong></li>
            




or here : view-source:https://wordpress.org/plugins/wp-job-manager/
 

</ul>
<p>See additional changelog items in changelog.txt</p></div>
    </div><!-- .entry-content -->

    <div class="entry-meta">
        <div class="widget plugin-meta">
        <h3 class="screen-reader-text">Meta</h3>

        <ul>
            
            <li>Version: <strong>1.29.3</strong></li>
            <li>
                Last updated: <strong><span>2 weeks</span> ago</strong>            </li>
            <li>Active installations: <strong>100,000+</strong></li>

                            <li>
                Requires WordPress Version:<strong>4.3.1</strong>                </li>
            
                            <li>Tested up to: <strong>4.9.4</strong></li>
            



Proceedings; i checked the source of the webpage. i tried to find out whether the texte is related to some kind of pattern.
i have looked closely and found that all of them have class=”widget plugin-meta”.
Well - This will make extracting them, a piece of cake. I tried with the code below helps to filter html elements based on values of attributes.
 

<?php
 
include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('a[class="widget plugin-meta"]') as $text) {
 $text[] = $text->plaintext;
}
print_r($headlines);

?>


but unfortunatly this ends up in a bad result
 

Share this post


Link to post
Share on other sites

hello dear dalescop,

 

 

many thanks - i will look after this.

 

i will try out this

<?php
 
include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('DIV[class="widget plugin-meta"]') as $text) {
 $text[] = $text->plaintext;
}
print_r($headlines);

?>

 i will try out this later . - atm i am in office.

 

greetings

Share this post


Link to post
Share on other sites

hello dear all

 

with a quick try i ve gotten this back:

martin@linux-3645:~/dev/php> php p100.php

PHP Warning:  include(simple_html_dom): failed to open stream: No such file or directory in /home/martin/dev/php/p100.php on line 4
PHP Warning:  include(): Failed opening 'simple_html_dom' for inclusion (include_path='.:/usr/share/php5:/usr/share/php5/PEAR') in /home/martin/dev/php/p100.php on line 4
PHP Fatal error:  Call to undefined function file_get_html() in /home/martin/dev/php/p100.php on line 6
martin@linux-3645:~/dev/php>



guess that i have to have a closer look what goes wrong here.

 

 

 

goal: i need the following data:


output: But the output is zero....
background:

my way to get the xpath; use google chrome: I have a webpage I want to get some data off:

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/

Version: 1.29.3
Last updated: 5 days ago
Active installations: 100,000+

 

 

i want to have a little database that runs locally - with those data of my favorite-plugins.

 

so i want to fetch the data automatically -  with a chron job

Share this post


Link to post
Share on other sites

php can't find simple_html_dom  you're missing the php extension

 

It should probably be 

include(simple_html_dom.php); 

then if it's not in the same folder as your p100.php then you'll need to use the relative path to it. 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.