dil_bert Posted March 1, 2020 Share Posted March 1, 2020 hello dear php-experts, i fairly new to simple_html_dom usage and methods. I know a little the parser, i want to gather some information from this site: https://europa.eu/youth/volunteering/organisations_en#open is this possible to get the content - of let us say 10 or 20 last records on that page - and subesquently to store it in my mysql - db!? <?php // Report all PHP errors (see changelog) error_reporting(E_ALL); include('inc/simple_html_dom.php'); //base url $base = 'https://europa.eu/youth/volunteering/organisations_en#open'; //home page HTML $html_base = file_get_html( $base ); //get all category links foreach($html_base->find('a') as $element) { echo "<pre>"; print_r( $element->href ); echo "</pre>"; } $html_base->clear(); unset($html_base); ?> I have the above code and I'm trying to get certain elements of the page but it isn't returning anything. Is it possible that certain PHP functions might be disabled on the server to stop that? The above code works perfectly on other sites. Is there any workaround? btw: i have created a small snipped as a proof of concept to run this with Python and BeautifulSoup - import requests from bs4 import BeautifulSoup url = 'https://europa.eu/youth/volunteering/organisations_en#open' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') print(soup.find('title').text) block = soup.find('div', class_="eyp-card block-is-flex") and this.... European Youth Portal >>> block.a <a href="/youth/volunteering/organisation/48592_en" target="_blank">"Academy for Peace and Development" Union</a> >>> block.a.text '"Academy for Peace and Development" Union' >>> block.select_one('div > div > p:nth-child(9)') <p><strong>PIC:</strong> 948417016</p> >>> block.select_one('div > div > p:nth-child(9)').text 'PIC: 948417016' what is aimed in the end - i want to gather the first 20 results of the page - and put them in to a sql-db or alternatively show the information in a little widget Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/ Share on other sites More sharing options...
gw1500se Posted March 1, 2020 Share Posted March 1, 2020 Its possible that simple_html_dom is not installed on that server. Do you have error reporting turned on? Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575011 Share on other sites More sharing options...
ginerjm Posted March 1, 2020 Share Posted March 1, 2020 You set up what errors you wanted to see but you didn't actually turn it on to see in your client. error_reporting(E_ALL); ini_set('display_errors', '1'); Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575018 Share on other sites More sharing options...
dil_bert Posted March 2, 2020 Author Share Posted March 2, 2020 12 hours ago, ginerjm said: good day dear ginerjm hello dear gw1500se many many thanks for your replies. great to hear from you. 12 hours ago, ginerjm said: You set up what errors you wanted to see but you didn't actually turn it on to see in your client. error_reporting(E_ALL); ini_set('display_errors', '1'); i did a quick test . due to the fact that i am on a travel and do not have a prepared notebook with me. but i will do further tests within the next few days. WARNING error_reporting() has been disabled for security reasons on line number 3 FATAL ERROR Uncaught Error: Call to undefined function file_get_html() in /home4/php/public_html/code.php70(5) : eval()'d code:11 Stack trace: #0 /home4/php/public_html/code.php70(5): eval() #1 {main} thrown on line number 11 i will perform further tests in the next few hours ... many thanks for any and all hints with this task - i appreciate any hint and idea with the test of this tiny little script Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575025 Share on other sites More sharing options...
dil_bert Posted March 2, 2020 Author Share Posted March 2, 2020 hi there good day dear gw1500SE and ginerjm - good day,. after the first try i did another one: - i still get following errors: <br /> <b>Warning</b>: include(inc/simple_html_dom.php): failed to open stream: No such file or directory in <b>[...][...]</b> on line <b>5</b><br /> <br /> <b>Warning</b>: include(): Failed opening 'inc/simple_html_dom.php' for inclusion (include_path='.:') in <b>[...][...]</b> on line <b>5</b><br /> <br /> <b>Fatal error</b>: Uncaught Error: Call to undefined function file_get_html() in [...][...]:11 Stack trace: #0 {main} thrown in <b>[...][...]</b> on line <b>11</b><br /> and this one: FATAL ERROR syntax error, unexpected '<', expecting end of file on line number 1 hmmm - i think that i have to do some corrections. I have to investigate the target to find out what is missing - what i have to correct in my testcode. i will come back later the day.. regards Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575027 Share on other sites More sharing options...
dil_bert Posted March 2, 2020 Author Share Posted March 2, 2020 i received the annoying errors on including files _ i need to have more insights on parsing a DOM. i have to check whether the included file exists Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575029 Share on other sites More sharing options...
dil_bert Posted March 2, 2020 Author Share Posted March 2, 2020 (edited) findings: As the error states, i think that simple_html_dom.php either doesn't exist or isn't in the right location.furthermore: since file_get_html is not a native PHP function, the function resides within my included file that cannot be found so i need to to fix that first. The filed does not exist by checking the inc folder in the project root. i am going to fix it. so the question is: this is my project-folder: /project /project/includes/question: does the above mentioned file resides in that folder - the includes file!? finally i have it like this: C:\Users\Kasper\Documents\_mk_\_dev_\php\ ->here my_project-file_ C:\Users\Kasper\Documents\_mk_\_dev_\php\includes (and here the "simplehtmldom-parser" from https://sourceforge.net/projects/simplehtmldom/ goes in i am going to testrun now the whole thing on my machine - using ATOM i come back later the day love to hear from you regards see the picture: Edited March 2, 2020 by dil_bert Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575039 Share on other sites More sharing options...
Barand Posted March 2, 2020 Share Posted March 2, 2020 3 hours ago, dil_bert said: this is my project-folder: /project /project/includes/ so why would you expect to find it by defining a path starting with "inc/"? 1 Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575052 Share on other sites More sharing options...
dil_bert Posted March 3, 2020 Author Share Posted March 3, 2020 4 hours ago, Barand said: so why would you expect to find it by defining a path starting with "inc/"? hello dear Barand - many thanks! sure thing - youre right i have a folder 'includes' where the script looks in 'inc'. many thanks for this valuable hint!! i am going to correct it!! regards Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575065 Share on other sites More sharing options...
dil_bert Posted March 6, 2020 Author Share Posted March 6, 2020 hi there good day dear barand, ginerjim and gw1500se,😍 the "semantic" class is suppoese to be "eyp-card". function get_eyp_cards_data(){ $dom = new DomDocument(); $my_cards = array(); if ( $dom->load('https://europa.eu/youth/volunteering/organisations_en') ) { // true or false https://www.php.net/manual/en/domdocument.loadhtml.php $domx = new DOMXpath($dom); $eyp_cards = $domx->query('div[contains(@class,"eyp-card")]'); // returns DOMNodeList https://www.php.net/manual/en/class.domnodelist.php if ( $eyp_cards->length > 0 ) { // length IS a property of DOMNodeList. works but looks a bit JSy foreach ( $eyp_cards as $eyp_card ) { // Debug: echo '<pre>', var_dump($eyp_card), '<pre>'; $my_cards[] = array( 'title' => $eyp_card->getElementsByTagName('h5')->item(0)->nodeValue, 'content' => $eyp_card->firstChild->nodeValue, // includes title ); } } } return !empty($my_cards) ? $my_cards : false; } $my_cards = get_eyp_cards_data(); the referenc of selectors https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string note -there are approx 200 pages or more. i guess that i will rework this and enhance it to get some data stored in a sql-db many thanks for all your feed-back and your hints Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575193 Share on other sites More sharing options...
dil_bert Posted March 6, 2020 Author Share Posted March 6, 2020 to find out more about how i work with the DOMdocument i go ahead - eg like so: I have this html code: <html> <head> ... </head> <body> <div> <div class="foo" data-type="bar"> SOMECONTENTWITHMORETAGS </div> </div> </body> and now I'd like to return all html tags (including its attributes) of DOMElement. How I can do that? How to achive this!? private function get_html_from_node($node){ $html = ''; $children = $node->childNodes; foreach ($children as $child) { $tmp_doc = new DOMDocument(); $tmp_doc->appendChild($tmp_doc->importNode($child,true)); $html .= $tmp_doc->saveHTML(); } return $html; } I already can get the "foo" element (but only its content) with this function above,.... guess that it is pretty woth to thake a closer look at the optional argument to DOMDocument::saveHTML: this says "output this element only". return $node->ownerDocument->saveHTML($node); Note that the argument is now in php7 available - - since 5xcy . Before that, you would need to use DOMDocument::saveXML instead. The good thing is that the results may very very helpful - Also, if we already have a reference to the document, we can just do this: $doc->saveHTML($node); okay - and now i will work on the above mentioned example... - in europe Quote Link to comment https://forums.phpfreaks.com/topic/310162-simple_html_dom-simple-use-case-to-get-back-data-for-storing-in-sqlite-db/#findComment-1575197 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.