Jump to content

dilbertone

Members
  • Posts

    122
  • Joined

  • Last visited

Everything posted by dilbertone

  1. hello dear Dj KAT, good Evening! again - many thanks!! i did the test: suse-linux:~ # martin@suse-linux:~/perl> $ php -r 'echo (class_exists("DOMDocument")) ? "It exists \n" : "It Does NOT exist \n";' -bash: martin@suse-linux:~/perl: No such file or directory suse-linux:~ # Guess that i have to install it now! then i can run your code you posted below (in the other thread)! i will try to install DOMDocument! i will come back and report all my findings! greetings db1
  2. hello dear friends - i want to test if the DOMdocument [class] exists? Can i do this in the shell (on OpenSuse 11.3)? bool class_exists ( string $class_name [, bool $autoload = true ] ) bool class_exists ( string $DOMdocument [, bool $autoload = true ] ) or do i have to create a file that i call itself in the shell!? look forward for an idea / hint / tipp regards db1
  3. hello dear Dj Kat, good evening! - many many thanks for the answer and the hints! Yes i want to scrape the mentioned url. I will try this out - and run the mentioned parser. Why not use DOMdocument instead? <?php $dom = new DOMDocument(); @$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119'); $divElement = $dom->getElementById('wfqbeResults'); $innerHTML= ''; $children = $divElement->childNodes; foreach ($children as $child) { $innerHTML .= $child->ownerDocument->saveXML( $child ); } echo $innerHTML; again thanks - i will run the code and do some tests. I come back and report all my findings. Have a great day! greetings dilbertone
  4. Hello - thanks for answering! simple html-dom-parser is not part of typo 3 - no - i do not think so!!! My example: i want to parse and get the following information - (in the block) consisting of the follwing 11 labels and corresponding values. see the page: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119 BTW: Sorry for the funny looking url - but it is the real url!!! Schulart: BBS Schulnummer: 60119 Anschrift: Berufsbildende Schule Boppard Antoniusstr. 21 56154 Boppard Telefon: (0 67 42) 80 61-0 Telefax: (0 67 42) 80 61-29 E-Mail: [email protected] Internet: http://www.bbs-boppard.de Träger: Kreisverwaltung Rhein-Hunsr�ck-Kreis letzte Änderung: 08 Feb 2010 14:33:12 von 60119 i try to get these infos - with the Simple HTML DOM Parser. Well - i am not very familiar with Simple HTML DOM Parser- i thougth that i have to give some attributes. is this right!? greetings dbone
  5. Hello dear friends, first of all : merry merry Xmas!!! i want to parse with the simple Simple HTML DOM Parser, well i am pretty new to php and to the Simple HTML DOM Parser. My example: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119 I want to collect the data in the block: I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content"><!-- TYPO3SEARCH_begin --> here the code is: - my trails. // inculde the Simple HTML DOM Parser include_once('simple_html_dom.php'); // get the file we want to parse right now,create a DOM $html = file_get_html(''); // simple_html_dom::find() creates a new // simple_html_dom-Objekt, that consists out of // corresponding childelements foreach($html->find('class: content ') as $h3) { // simple_html_dom::get the text in a tag // den Text innerhalb eines Tags if($h3->innertext == 'Text of a H3 Tag') { break; } } // simple_html_dom::next_sibling() gives the // next Element $table = $h3->next_sibling(); but believe me - it gives me not back what is aimed. what have id done wrong...? dbone
  6. Good day dear community. I need to build a function which parses the domain from a url. I have used various ways to parse html sources. But this one is is a bit tricky! See the target i want to parse - it has some invaild Markup: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=644.0013008534253&SchulAdresseMapDO=194190 well what do you think - can i apply this code here <?php require_once('config.php'); // call config.php for db connection $filename = "url.txt"; // Include the txt file which have urls $each_line = file($filename); foreach($each_line as $line_num => $line) { $line = trim($line); $content = file_get_contents($line); //echo ($content)."<br>"; $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); //var_dump($match); $sql = mysqli_query("insert into tablename(contents) values ('$match')"); //echo $match; } } ?> well i have to rework the parser-part of this script. I need to parse somway different - since i have other site here. Can anybody help me here to get a better regex - or a better way to parse this site ... Any and all help will be greatly apprecaited. regards db1
  7. hello everybody! This is just basic script, where i try to modify for the needs. I try to play with it. i want to parse some data. The whole script has three parts: 1. Fetching 2. parsing 3. storing i want to put all into one script: Two are allready put together - there all seems to be clear... So this thread is one that asks for the combining two parts of a script - how to invoke a variable between them What has happened untill now: 1st i need to have a connection to database lets say MySQL. I will suggest to use mysqli instead of mysql. Well - okay i safe this db.php $host="localhost"; //database hostname $username="******"; //database username $password="******"; //database password $database="******"; //database name ?> Now i am going to take a new script and save this config.php <?php require_once('db.php'); //call db.php $connect=mysqli_connect($host,$username,$password); //connect to mysql through mysqli if(!$connect){ die("Cannot connect to host, please try later."); //throw error if any problem } else { $select_db=mysqli_select_db($database); //select database if(!$select_db){ die("Site Database is down at the moment, Please check later. We will be back shortly."); // error if cannot connect to database or db does not exist } } ?> Now i have to take care for the script, that takes the files (note this is very basic - it is only a proof of concept. In the real situation i will take cURL since cURL is much much nicer and more elegant and faster etc. <?php require_once('config.php'); // call config.php for db connection $content = file_get_contents("<-here the path to the file goes in-> Position XY! an URL is here "); var_dump($content); $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); var_dump($match); $sql = mysqli_query("insert into tablename(contents) values ('$match')"); } ?> Note: This is just basic script, where you can modify it for your taste and can play with it. Question: If i have stored the URLs that i want to parse in a local file - how do i "call" them in the script. How do i do the call to file where the URLs (there are more than 2500 URLs that have to be parsed) at the following position: $content = file_get_contents("<-here the path to the file goes in-> Position XY! an URL is here "); The folder with the URLs is stored in the same folder as the scripts reside! Many thanks for all hints and for a starting point! if i have to write more - or if you need more infos - or if i have to be more concrete, just let me know! i love to hear from you! db1
  8. hi Shawn many many thanks for all - you are great Thx alot! greetings dilbertone
  9. Hello dear Community, i need some help with preg_match - i want to optimize the code that allready runs very well! i want to get ony the results - not the overhead of HTML-tags in the result How can i improve the (allready very nice ) code!? <?php $content = file_get_contents("http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb"); var_dump($content); $pattern = '/<td>(.*?)<\/td>/si'; preg_match_all($pattern,$content,$matches); foreach ($matches[1] as $match) { $match = strip_tags($match); $match = trim($match); var_dump($match); } ?> Note: I want to store the results in a database - how to do that!? Each idea and tipp will be greatly appreciated regards db1
  10. hi Quickoldcar - you are the man of the year!! Many thanks!! Wow: did i get you right: i can access any url an get some results!? That sounds great. I will testrun this code later today. i come back later today and report all my findings many thanks for all so far! greetings dilbertone
  11. Hello QuickOldCar, great to hear from you again! Overwhelming! I am very happy - you give me many hints and a great learning curve to dive into PHP programming. I like it very very much that you refer to PHP Simple HTML DOM Parser. That is great. I have PHP Simple HTML DOM Parser running here. Well i try to get some insgihts into your code - it is great - and has some non-trivial assets. I try to apply it on the target - this very simple looking site, which i want to parse: http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=35877&lschb= Note: this five or six lables to parse is all i want!! And if we can do it with the PHP Simple Html DOM Parser i am happy. QuickOldCar, you are a great coder and i like this introduction that takes me into this great technique. Did i apply the url at the right position!?? i am not very sure!? <?php include('simple_html_dom.php'); function getHost($url) { $parseUrl = parse_url(trim($url)); return trim($parseUrl[host] ? $parseUrl[host] : array_shift(explode('/', $parseUrl[path], 2))); } $url = mysql_real_escape_string($_GET['http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=35877&lschb=']); //simple way to add the http:// that dom requires, using curl is a better option if (substr($url, 0, 4) != "http") { $url = "http://$url"; } $parsed_url = getHost($url); $http_parsed_host = "http://$parsed_url/"; $html = file_get_html($url); foreach($html->find('a') as $element) $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $href_link = $href->getAttribute('href'); $parse_count = count("$http_parsed_host"); $substr_count = +7; if (substr($href_link, 0, $substr_count) == "mailto:") { $mail_link = $href_link; $href_link = trim($mail_link,$href_link); } if (substr($href_link, 0, 1) == "/") { $href_link = trim($href_link,"/"); } if (substr($href_link, 0, 2) == "//") { $href_link = trim($href_link,"//"); } if (substr($href_link, 0, 3) == "///") { $href_link = trim($href_link,"///"); } if ((substr($href_link, 0, == "https://") OR (substr($href_link, 0, 12) == "https://www.") OR (substr($href_link, 0, 7) == "http://") OR (substr($href_link, 0, 11) == "http://www.") OR (substr($href_link, 0, 6) == "ftp://") OR (substr($href_link, 0, 11) == "feed://www.") OR (substr($href_link, 0, 7) == "feed://")) { $final_href_link[] = $href_link; } else { if (substr($href_link, 0, 1) != "/") { $final_href_link[] = "$http_parsed_host$href_link"; } } } $links_array = array_unique($final_href_link); sort($links_array); foreach ($links_array as $links) { //echo "$links<br />"; echo "<a href='$links'>$links</a><br />"; } echo "<a href='$mail_link'>$mail_link</a><br />"; ?> love to hear from you... greetings dilbertone
  12. hello QuickOldCar, many many thanks for the quick reply! this is more than expected - i try to figure out what your code does. It is great! Note - i only wanted to get the 6 or seven values for the labels out of the example. Your code does much much more... Many thanks - this gives me a great starting point! greetings dilbertone
  13. hello QuickOldCar, many thanks for the help. Great to see this results. i am triying to figure out what went wrong here... at the end of the day - i want to parse all the content. - Stripping the tags and getting all the values to the labels... Do you want to share your code? Look forward to hear from you....
  14. Hi BlueSkyIs many thanks for posting. you would do it with regex - i am not very familiar with regex. Note - i tried to work with Simple HTML-DOM-Parser to get all the content within the table. Well - it failed... Can you give me a helping hand and give me some starting points with the regex-approach!? That would be great!! PS - my trials only spits out the e-mail-adress: i had other trials that tried to get the whole class - but without any luck! see here the code snippet! <?php include('simple_html_dom.php'); // Create DOM from URL or file $html = file_get_html('nrw_test.html'); // Find all links foreach($html->find('a') as $element) echo $element->href. '<br>'; ?> well - i would love to see the snippet that you were successful with! greetings dilbertone PS - my friends allways say: PHP regular expressions seems to be a quite complicated area especially if you are not an experienced Unix user. So i think it have to get started with this....technique.
  15. Hi everyone, I'm trying to select either a class or an id using PHP Simple HTML DOM Parser with absolutely no luck. My example is very simple and seems to comply to the examples given in the manual(http://simplehtmldom.sourceforge.net/manual.htm) but it just wont work, it's driving me up the wall. Here is my example: http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb= I think the HTML is invalid: i cannot parse it. Well i need more examples - probly i have overseen something! If anybody has a working example of Simple-html-dom-parser...i would be happy. The examples on the developersite are not very helpful. your dilbertone
  16. Hello dear Community, i have a document i need to parse it and spit out only this part of the table: see http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67003&lschb= how to i parse the stuff!? With perl or php? Note i have the xpaths (see below) Sad that i cannot apply them on Simple DOM Parser since this Dom Parser does not work with Xpaths but with CSS-Selectors: Well i want to get all the data with that are within the table that name is called class="fliess" How to dump all the results? BTW - thinking about the most elegant way, i think it is the most pretty way would be to do it with perl - So i can try it with HTML::TableExtract or.... Well what do you suggest - Which way to choose to do this [very] simple thing? Look forward to hear from you! see the xpaths: Schule: /html/body/center/table/tbody/tr[2]/td[1] Stasse: /html/body/center/table/tbody/tr[3]/td[1] Ort: /html/body/center/table/tbody/tr[4]/td[1] Tel: /html/body/center/table/tbody/tr[5]/td[1] Schulgliederungen: /html/body/center/table/tbody/tr[6]/td[1] Besonderheite: /html/body/center/table/tbody/tr[7]/td[1] E-Mail: /html/body/center/table/tbody/tr[8]/td[1] Schulnummer: /html/body/center/table/tbody/tr[9]/td[1]
  17. Hello dear Community, i have a large document - and i want it to parse it and spit out only this part: schule.php?schulnr=80287&lschb= Question: How to i parse the stuff!? Well i try it with FireBug and FirePath (the Xpath-tool) i do it like the following: * Load the document into my browser, if possible * start Firebug extension/add-on * run the FirePath extension * and run the xpath //a[contains(@href, "schule")]/@href * then i click "Eval" button. i find 2030 results - that are marked - how do i get them out of the firebug - to work with them?! If i have to be more precise - plz let me know!! Perhaps i have to write more - and to add more information See the screenshot - here http://img259.imageshack.us/img259/7360/sceenshoteval5.jpg how can i copy and paste the results - in order to do further processing from thereon. look forward to hear from you regards
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.