Jump to content

Extract values of attributes of elements with Simple HTML-DOM-Parser


dil_bert

Recommended Posts

hello dear freaks,

 

 

try to extract some lines out of a webpage - with following technique: with the Extraction of values of attributes of elements with Simple HTML-DOM-Parser.

Here is what i have gathered and learned:

Try to retrieve the contents of a div from the external site withg PHP, and XPath: This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:

Here the example; view-source:https://wordpress.org/plugins/participants-database/

and https://wordpress.org/plugins/participants-database/

goal: i need the following data:


 

Version:
Last updated:
Active installations:
Tested up:





view-source:https://wordpress.org/plugins/participants-database/
 

<div class="entry-meta">
        <div class="widget plugin-meta">
        <h3 class="screen-reader-text">Meta</h3>

        <ul>
            
            <li>Version: <strong>1.7.7.6</strong></li>
            <li>
                Last updated: <strong><span>5 days</span> ago</strong>            </li>
            <li>Active installations: <strong>10,000+</strong></li>

                            <li>
                Requires WordPress Version:<strong>4.0</strong>                </li>
            
                            <li>Tested up to: <strong>4.9.4</strong></li>
            




or here : view-source:https://wordpress.org/plugins/wp-job-manager/
 

</ul>
<p>See additional changelog items in changelog.txt</p></div>
    </div><!-- .entry-content -->

    <div class="entry-meta">
        <div class="widget plugin-meta">
        <h3 class="screen-reader-text">Meta</h3>

        <ul>
            
            <li>Version: <strong>1.29.3</strong></li>
            <li>
                Last updated: <strong><span>2 weeks</span> ago</strong>            </li>
            <li>Active installations: <strong>100,000+</strong></li>

                            <li>
                Requires WordPress Version:<strong>4.3.1</strong>                </li>
            
                            <li>Tested up to: <strong>4.9.4</strong></li>
            



Proceedings; i checked the source of the webpage. i tried to find out whether the texte is related to some kind of pattern.
i have looked closely and found that all of them have class=”widget plugin-meta”.
Well - This will make extracting them, a piece of cake. I tried with the code below helps to filter html elements based on values of attributes.
 

<?php
 
include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('a[class="widget plugin-meta"]') as $text) {
 $text[] = $text->plaintext;
}
print_r($headlines);

?>


but unfortunatly this ends up in a bad result
 

Link to comment
Share on other sites

hello dear dalescop,

 

 

many thanks - i will look after this.

 

i will try out this

<?php
 
include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('DIV[class="widget plugin-meta"]') as $text) {
 $text[] = $text->plaintext;
}
print_r($headlines);

?>

 i will try out this later . - atm i am in office.

 

greetings

Link to comment
Share on other sites

hello dear all

 

with a quick try i ve gotten this back:

martin@linux-3645:~/dev/php> php p100.php

PHP Warning:  include(simple_html_dom): failed to open stream: No such file or directory in /home/martin/dev/php/p100.php on line 4
PHP Warning:  include(): Failed opening 'simple_html_dom' for inclusion (include_path='.:/usr/share/php5:/usr/share/php5/PEAR') in /home/martin/dev/php/p100.php on line 4
PHP Fatal error:  Call to undefined function file_get_html() in /home/martin/dev/php/p100.php on line 6
martin@linux-3645:~/dev/php>



guess that i have to have a closer look what goes wrong here.

 

 

 

goal: i need the following data:


output: But the output is zero....
background:

my way to get the xpath; use google chrome: I have a webpage I want to get some data off:

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/

Version: 1.29.3
Last updated: 5 days ago
Active installations: 100,000+

 

 

i want to have a little database that runs locally - with those data of my favorite-plugins.

 

so i want to fetch the data automatically -  with a chron job

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.