dil_bert Posted February 14, 2018 Share Posted February 14, 2018 hello dear freaks, try to extract some lines out of a webpage - with following technique: with the Extraction of values of attributes of elements with Simple HTML-DOM-Parser.Here is what i have gathered and learned: Try to retrieve the contents of a div from the external site withg PHP, and XPath: This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:Here the example; view-source:https://wordpress.org/plugins/participants-database/and https://wordpress.org/plugins/participants-database/goal: i need the following data: Version: Last updated: Active installations: Tested up: view-source:https://wordpress.org/plugins/participants-database/ <div class="entry-meta"> <div class="widget plugin-meta"> <h3 class="screen-reader-text">Meta</h3> <ul> <li>Version: <strong>1.7.7.6</strong></li> <li> Last updated: <strong><span>5 days</span> ago</strong> </li> <li>Active installations: <strong>10,000+</strong></li> <li> Requires WordPress Version:<strong>4.0</strong> </li> <li>Tested up to: <strong>4.9.4</strong></li> or here : view-source:https://wordpress.org/plugins/wp-job-manager/ </ul> <p>See additional changelog items in changelog.txt</p></div> </div><!-- .entry-content --> <div class="entry-meta"> <div class="widget plugin-meta"> <h3 class="screen-reader-text">Meta</h3> <ul> <li>Version: <strong>1.29.3</strong></li> <li> Last updated: <strong><span>2 weeks</span> ago</strong> </li> <li>Active installations: <strong>100,000+</strong></li> <li> Requires WordPress Version:<strong>4.3.1</strong> </li> <li>Tested up to: <strong>4.9.4</strong></li> Proceedings; i checked the source of the webpage. i tried to find out whether the texte is related to some kind of pattern.i have looked closely and found that all of them have class=”widget plugin-meta”.Well - This will make extracting them, a piece of cake. I tried with the code below helps to filter html elements based on values of attributes. <?php include('simple_html_dom'); $url = 'https://wordpress.org/plugins/wp-job-manager/'; $html = file_get_html($url); $text = array(); foreach($html->find('a[class="widget plugin-meta"]') as $text) { $text[] = $text->plaintext; } print_r($headlines); ?> but unfortunatly this ends up in a bad result Quote Link to comment https://forums.phpfreaks.com/topic/306539-extract-values-of-attributes-of-elements-with-simple-html-dom-parser/ Share on other sites More sharing options...
dalecosp Posted February 15, 2018 Share Posted February 15, 2018 Looks to me like the only element with a plugin-meta class attribute is a DIV, not an A. You might start there? Quote Link to comment https://forums.phpfreaks.com/topic/306539-extract-values-of-attributes-of-elements-with-simple-html-dom-parser/#findComment-1556446 Share on other sites More sharing options...
dil_bert Posted February 16, 2018 Author Share Posted February 16, 2018 hello dear dalescop, many thanks - i will look after this. i will try out this <?php include('simple_html_dom'); $url = 'https://wordpress.org/plugins/wp-job-manager/'; $html = file_get_html($url); $text = array(); foreach($html->find('DIV[class="widget plugin-meta"]') as $text) { $text[] = $text->plaintext; } print_r($headlines); ?> i will try out this later . - atm i am in office. greetings Quote Link to comment https://forums.phpfreaks.com/topic/306539-extract-values-of-attributes-of-elements-with-simple-html-dom-parser/#findComment-1556457 Share on other sites More sharing options...
dil_bert Posted February 16, 2018 Author Share Posted February 16, 2018 hello dear all with a quick try i ve gotten this back: martin@linux-3645:~/dev/php> php p100.php PHP Warning: include(simple_html_dom): failed to open stream: No such file or directory in /home/martin/dev/php/p100.php on line 4 PHP Warning: include(): Failed opening 'simple_html_dom' for inclusion (include_path='.:/usr/share/php5:/usr/share/php5/PEAR') in /home/martin/dev/php/p100.php on line 4 PHP Fatal error: Call to undefined function file_get_html() in /home/martin/dev/php/p100.php on line 6 martin@linux-3645:~/dev/php> guess that i have to have a closer look what goes wrong here. goal: i need the following data:output: But the output is zero....background:my way to get the xpath; use google chrome: I have a webpage I want to get some data off:https://wordpress.org/plugins/wp-job-manager/https://wordpress.org/plugins/participants-database/https://wordpress.org/plugins/amazon-link/https://wordpress.org/plugins/simple-membership/https://wordpress.org/plugins/scrapeazon/goal: i need the following data:Version:Last updated:Active installations:Tested upsee for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/Version: 1.29.3Last updated: 5 days agoActive installations: 100,000+ i want to have a little database that runs locally - with those data of my favorite-plugins. so i want to fetch the data automatically - with a chron job Quote Link to comment https://forums.phpfreaks.com/topic/306539-extract-values-of-attributes-of-elements-with-simple-html-dom-parser/#findComment-1556460 Share on other sites More sharing options...
taquitosensei Posted February 16, 2018 Share Posted February 16, 2018 php can't find simple_html_dom you're missing the php extension It should probably be include(simple_html_dom.php); then if it's not in the same folder as your p100.php then you'll need to use the relative path to it. Quote Link to comment https://forums.phpfreaks.com/topic/306539-extract-values-of-attributes-of-elements-with-simple-html-dom-parser/#findComment-1556481 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.