dil_bert Posted February 7, 2018 Share Posted February 7, 2018 hello dear PHP-Freaks, for a little programme i want to fetch the data of various plugins of Wordpress: to be concrete it is about 50 plugins that have each a domain - see below.the following data are needed: of the "Version", "Acitve installations" and "Tested up to:" Question: I can use simplehtmldom or BS4 - which solution os more apropiate. The project: for a list of wordpress-plugins: - approx 50 plugins are of interest!https://wordpress.org/plugins/wp-job-managerhttps://wordpress.org/plugins/ninja-formshttps://wordpress.org/plugins/participants-database and so on and so forth. These plugins are listed in my favorites - so if i create a login with BS4 then i can log in and parse all those favorite-pages. The first approach: Otherwise i can loop through a set of URL to fetch all the necessary pages. I can use simplehtmldom or BS4 - which solution os more apropiate. i need the data of the following three lines - in the above mentioned example:https://wordpress.or.../wp-job-manager Quote Version: <strong>1.29.3</strong>Active installations: <strong>100,000+</strong>Tested up to: <strong>4.9.4</strong> possible solutions:we can solve this task with other methods than ousing only BeautifulSoup, but we can do it for example with BS + regular expressionsassuming were able to do this with regular expression we need to locate the script tag in the HTML.The idea is to define a regular expression that would be used for both locating the element with BeautifulSoup and extractingthe above mentioned text:But i guess that we can do this also with DOM-Parsercf: http://simplehtmldom.sourceforge.net/manual.htm // Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>'; // Find all links foreach($html->find('a') as $element) echo $element->href . '<br>'; again: i need the data of the following three lines - in the above mentioned example:https://wordpress.org/plugins/wp-job-manager Version: <strong>1.29.3</strong>Active installations: <strong>100,000+</strong>Tested up to: <strong>4.9.4</strong> How to create HTML DOM object? $html = str_get_html('<html><body>Hello!</body></html>'); // Create a DOM object from a URL $html = file_get_html('http://www.google.com/'); // Create a DOM object from a HTML file $html = file_get_html('test.htm'); How to access the HTML element's attributes? // Find all anchors, returns a array of element objects $ret = $html->find('a'); // Find (N)th anchor, returns element object or null if not found (zero based) $ret = $html->find('a', 0); // Find lastest anchor, returns element object or null if not found (zero based) $ret = $html->find('a', -1); // Find all <div> with the id attribute $ret = $html->find('div[id]'); // Find all <div> which attribute id=foo $ret = $html->find('div[id=foo]'); How to traverse the DOM tree? // Example echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id; // or echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id'); and .... function my_callback($element) { // Hide all <b> tags if ($element->tag=='b') $element->outertext = ''; } // Register the callback function with it's function name $html->set_callback('my_callback'); // Callback function will be invoked while dumping echo $html; Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.