
dilbertone
Members-
Posts
122 -
Joined
-
Last visited
Everything posted by dilbertone
-
how to test if the DOMdocument [class] exists?
dilbertone replied to dilbertone's topic in PHP Coding Help
hello dear Dj KAT, good Evening! again - many thanks!! i did the test: suse-linux:~ # martin@suse-linux:~/perl> $ php -r 'echo (class_exists("DOMDocument")) ? "It exists \n" : "It Does NOT exist \n";' -bash: martin@suse-linux:~/perl: No such file or directory suse-linux:~ # Guess that i have to install it now! then i can run your code you posted below (in the other thread)! i will try to install DOMDocument! i will come back and report all my findings! greetings db1 -
hello dear friends - i want to test if the DOMdocument [class] exists? Can i do this in the shell (on OpenSuse 11.3)? bool class_exists ( string $class_name [, bool $autoload = true ] ) bool class_exists ( string $DOMdocument [, bool $autoload = true ] ) or do i have to create a file that i call itself in the shell!? look forward for an idea / hint / tipp regards db1
-
Simple HTML DOM Parser: Starting points for a very easy example
dilbertone replied to dilbertone's topic in PHP Coding Help
hello dear Dj Kat, good evening! - many many thanks for the answer and the hints! Yes i want to scrape the mentioned url. I will try this out - and run the mentioned parser. Why not use DOMdocument instead? <?php $dom = new DOMDocument(); @$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119'); $divElement = $dom->getElementById('wfqbeResults'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML .= $child->ownerDocument->saveXML( $child ); } echo $innerHTML; again thanks - i will run the code and do some tests. I come back and report all my findings. Have a great day! greetings dilbertone -
Simple HTML DOM Parser: Starting points for a very easy example
dilbertone replied to dilbertone's topic in PHP Coding Help
Hello - thanks for answering! simple html-dom-parser is not part of typo 3 - no - i do not think so!!! My example: i want to parse and get the following information - (in the block) consisting of the follwing 11 labels and corresponding values. see the page: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119 BTW: Sorry for the funny looking url - but it is the real url!!! Schulart: BBS Schulnummer: 60119 Anschrift: Berufsbildende Schule Boppard Antoniusstr. 21 56154 Boppard Telefon: (0 67 42) 80 61-0 Telefax: (0 67 42) 80 61-29 E-Mail: [email protected] Internet: http://www.bbs-boppard.de Träger: Kreisverwaltung Rhein-Hunsr�ck-Kreis letzte Änderung: 08 Feb 2010 14:33:12 von 60119 i try to get these infos - with the Simple HTML DOM Parser. Well - i am not very familiar with Simple HTML DOM Parser- i thougth that i have to give some attributes. is this right!? greetings dbone -
Hello dear friends, first of all : merry merry Xmas!!! i want to parse with the simple Simple HTML DOM Parser, well i am pretty new to php and to the Simple HTML DOM Parser. My example: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119 I want to collect the data in the block: I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content"><!-- TYPO3SEARCH_begin --> here the code is: - my trails. // inculde the Simple HTML DOM Parser include_once('simple_html_dom.php'); // get the file we want to parse right now,create a DOM $html = file_get_html(''); // simple_html_dom::find() creates a new // simple_html_dom-Objekt, that consists out of // corresponding childelements foreach($html->find('class: content ') as $h3) { // simple_html_dom::get the text in a tag // den Text innerhalb eines Tags if($h3->innertext == 'Text of a H3 Tag') { break; } } // simple_html_dom::next_sibling() gives the // next Element $table = $h3->next_sibling(); but believe me - it gives me not back what is aimed. what have id done wrong...? dbone
-
Good day dear community. I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=644.0013008534253&SchulAdresseMapDO=194190 well what do you think - can i apply this code here <?php require_once('config.php'); // call config.php for db connection $filename = "url.txt"; // Include the txt file which have urls $each_line = file($filename); foreach($each_line as $line_num => $line) { $line = trim($line); $content = file_get_contents($line); //echo ($content)."<br>"; $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); //var_dump($match); $sql = mysqli_query("insert into tablename(contents) values ('$match')"); //echo $match; } } ?> well i have to rework the parser-part of this script. I need to parse somway different - since i have other site here. Can anybody help me here to get a better regex - or a better way to parse this site ... Any and all help will be greatly apprecaited. regards db1
-
hello everybody! This is just basic script, where i try to modify for the needs. I try to play with it. i want to parse some data. The whole script has three parts: 1. Fetching 2. parsing 3. storing i want to put all into one script: Two are allready put together - there all seems to be clear... So this thread is one that asks for the combining two parts of a script - how to invoke a variable between them What has happened untill now: 1st i need to have a connection to database lets say MySQL. I will suggest to use mysqli instead of mysql. Well - okay i safe this db.php $host="localhost"; //database hostname $username="******"; //database username $password="******"; //database password $database="******"; //database name ?> Now i am going to take a new script and save this config.php <?php require_once('db.php'); //call db.php $connect=mysqli_connect($host,$username,$password); //connect to mysql through mysqli if(!$connect){ die("Cannot connect to host, please try later."); //throw error if any problem } else { $select_db=mysqli_select_db($database); //select database if(!$select_db){ die("Site Database is down at the moment, Please check later. We will be back shortly."); // error if cannot connect to database or db does not exist } } ?> Now i have to take care for the script, that takes the files (note this is very basic - it is only a proof of concept. In the real situation i will take cURL since cURL is much much nicer and more elegant and faster etc. <?php require_once('config.php'); // call config.php for db connection $content = file_get_contents("<-here the path to the file goes in-> Position XY! an URL is here "); var_dump($content); $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); var_dump($match); $sql = mysqli_query("insert into tablename(contents) values ('$match')"); } ?> Note: This is just basic script, where you can modify it for your taste and can play with it. Question: If i have stored the URLs that i want to parse in a local file - how do i "call" them in the script. How do i do the call to file where the URLs (there are more than 2500 URLs that have to be parsed) at the following position: $content = file_get_contents("<-here the path to the file goes in-> Position XY! an URL is here "); The folder with the URLs is stored in the same folder as the scripts reside! Many thanks for all hints and for a starting point! if i have to write more - or if you need more infos - or if i have to be more concrete, just let me know! i love to hear from you! db1
-
Hello dear Community, i need some help with preg_match - i want to optimize the code that allready runs very well! i want to get ony the results - not the overhead of HTML-tags in the result How can i improve the (allready very nice ) code!? <?php $content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb"); var_dump($content); $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); var_dump($match); } ?> Note: I want to store the results in a database - how to do that!? Each idea and tipp will be greatly appreciated regards db1
-
Hello QuickOldCar, great to hear from you again! Overwhelming! I am very happy - you give me many hints and a great learning curve to dive into PHP programming. I like it very very much that you refer to PHP Simple HTML DOM Parser. That is great. I have PHP Simple HTML DOM Parser running here. Well i try to get some insgihts into your code - it is great - and has some non-trivial assets. I try to apply it on the target - this very simple looking site, which i want to parse: http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=35877&lschb= Note: this five or six lables to parse is all i want!! And if we can do it with the PHP Simple Html DOM Parser i am happy. QuickOldCar, you are a great coder and i like this introduction that takes me into this great technique. Did i apply the url at the right position!?? i am not very sure!? <?php include('simple_html_dom.php'); function getHost($url) { $parseUrl = parse_url(trim($url)); return trim($parseUrl[host] ? $parseUrl[host] : array_shift(explode('/', $parseUrl[path], 2))); } $url = mysql_real_escape_string($_GET['http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=35877&lschb=']); //simple way to add the http:// that dom requires, using curl is a better option if (substr($url, 0, 4) != "http") { $url = "http://$url"; } $parsed_url = getHost($url); $http_parsed_host = "http://$parsed_url/"; $html = file_get_html($url); foreach($html->find('a') as $element) $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $href_link = $href->getAttribute('href'); $parse_count = count("$http_parsed_host"); $substr_count = +7; if (substr($href_link, 0, $substr_count) == "mailto:") { $mail_link = $href_link; $href_link = trim($mail_link,$href_link); } if (substr($href_link, 0, 1) == "/") { $href_link = trim($href_link,"/"); } if (substr($href_link, 0, 2) == "//") { $href_link = trim($href_link,"//"); } if (substr($href_link, 0, 3) == "///") { $href_link = trim($href_link,"///"); } if ((substr($href_link, 0, == "https://") OR (substr($href_link, 0, 12) == "https://www.") OR (substr($href_link, 0, 7) == "http://") OR (substr($href_link, 0, 11) == "http://www.") OR (substr($href_link, 0, 6) == "ftp://") OR (substr($href_link, 0, 11) == "feed://www.") OR (substr($href_link, 0, 7) == "feed://")) { $final_href_link[] = $href_link; } else { if (substr($href_link, 0, 1) != "/") { $final_href_link[] = "$http_parsed_host$href_link"; } } } $links_array = array_unique($final_href_link); sort($links_array); foreach ($links_array as $links) { //echo "$links<br />"; echo "<a href='$links'>$links</a><br />"; } echo "<a href='$mail_link'>$mail_link</a><br />"; ?> love to hear from you... greetings dilbertone
-
hello QuickOldCar, many many thanks for the quick reply! this is more than expected - i try to figure out what your code does. It is great! Note - i only wanted to get the 6 or seven values for the labels out of the example. Your code does much much more... Many thanks - this gives me a great starting point! greetings dilbertone
-
hello QuickOldCar, many thanks for the help. Great to see this results. i am triying to figure out what went wrong here... at the end of the day - i want to parse all the content. - Stripping the tags and getting all the values to the labels... Do you want to share your code? Look forward to hear from you....
-
Hi BlueSkyIs many thanks for posting. you would do it with regex - i am not very familiar with regex. Note - i tried to work with Simple HTML-DOM-Parser to get all the content within the table. Well - it failed... Can you give me a helping hand and give me some starting points with the regex-approach!? That would be great!! PS - my trials only spits out the e-mail-adress: i had other trials that tried to get the whole class - but without any luck! see here the code snippet! <?php include('simple_html_dom.php'); // Create DOM from URL or file $html = file_get_html('nrw_test.html'); // Find all links foreach($html->find('a') as $element) echo $element->href. '<br>'; ?> well - i would love to see the snippet that you were successful with! greetings dilbertone PS - my friends allways say: PHP regular expressions seems to be a quite complicated area especially if you are not an experienced Unix user. So i think it have to get started with this....technique.
-
Hi everyone, I'm trying to select either a class or an id using PHP Simple HTML DOM Parser with absolutely no luck. My example is very simple and seems to comply to the examples given in the manual(http://simplehtmldom.sourceforge.net/manual.htm) but it just wont work, it's driving me up the wall. Here is my example: http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb= I think the HTML is invalid: i cannot parse it. Well i need more examples - probly i have overseen something! If anybody has a working example of Simple-html-dom-parser...i would be happy. The examples on the developersite are not very helpful. your dilbertone
-
Hello dear Community, i have a document i need to parse it and spit out only this part of the table: see http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67003&lschb= how to i parse the stuff!? With perl or php? Note i have the xpaths (see below) Sad that i cannot apply them on Simple DOM Parser since this Dom Parser does not work with Xpaths but with CSS-Selectors: Well i want to get all the data with that are within the table that name is called class="fliess" How to dump all the results? BTW - thinking about the most elegant way, i think it is the most pretty way would be to do it with perl - So i can try it with HTML::TableExtract or.... Well what do you suggest - Which way to choose to do this [very] simple thing? Look forward to hear from you! see the xpaths: Schule: /html/body/center/table/tbody/tr[2]/td[1] Stasse: /html/body/center/table/tbody/tr[3]/td[1] Ort: /html/body/center/table/tbody/tr[4]/td[1] Tel: /html/body/center/table/tbody/tr[5]/td[1] Schulgliederungen: /html/body/center/table/tbody/tr[6]/td[1] Besonderheite: /html/body/center/table/tbody/tr[7]/td[1] E-Mail: /html/body/center/table/tbody/tr[8]/td[1] Schulnummer: /html/body/center/table/tbody/tr[9]/td[1]
-
Hello dear Community, i have a large document - and i want it to parse it and spit out only this part: schule.php?schulnr=80287&lschb= Question: How to i parse the stuff!? Well i try it with FireBug and FirePath (the Xpath-tool) i do it like the following: * Load the document into my browser, if possible * start Firebug extension/add-on * run the FirePath extension * and run the xpath //a[contains(@href, "schule")]/@href * then i click "Eval" button. i find 2030 results - that are marked - how do i get them out of the firebug - to work with them?! If i have to be more precise - plz let me know!! Perhaps i have to write more - and to add more information See the screenshot - here http://img259.imageshack.us/img259/7360/sceenshoteval5.jpg how can i copy and paste the results - in order to do further processing from thereon. look forward to hear from you regards