Simple HTML DOM Parser: Starting points for a very easy example

dilbertone · December 24, 2010

Hello dear friends,

first of all : merry merry Xmas!!!

i want to parse with the simple Simple HTML DOM Parser,

well i am pretty new to php and to the Simple HTML DOM Parser.

My example: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

I want to collect the data in the block:

I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content">

here the code is: - my trails.


// inculde the Simple HTML DOM Parser 
include_once('simple_html_dom.php');

// get the file we want to parse right now,create a DOM 
$html = file_get_html('');

// simple_html_dom::find() creates a new 
// simple_html_dom-Objekt, that consists out of
// corresponding childelements 

foreach($html->find('class: content ') as $h3) {

  // simple_html_dom::get the text in a tag
  // den Text innerhalb eines Tags
  if($h3->innertext == 'Text of a H3 Tag') {
    break;
  }
}

// simple_html_dom::next_sibling() gives the
// next   Element
$table = $h3->next_sibling();

but believe me - it gives me not back what is aimed.

what have id done wrong...? ::)

dbone

RichardRotterdam · December 24, 2010

Is simple_html_dom.php part of typo3? Also what is it you want to accomplish? I don't see a question anywhere

dilbertone · December 24, 2010

Hello - thanks for answering!

simple html-dom-parser is not part of typo 3 - no - i do not think so!!!

My example: i want to parse and get the following information - (in the block)

consisting of the follwing 11 labels and corresponding values.

see the page: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

BTW: Sorry for the funny looking url - but it is the real url!!!

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

i try to get these infos - with the Simple HTML DOM Parser.

Well - i am not very familiar with Simple HTML DOM Parser- i thougth that i have to give some attributes.

is this right!?

greetings dbone

RichardRotterdam · December 24, 2010

So you want to scrape the following url :

=60119]http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

And filter out the following data:

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

Why not use DOMdocument instead?

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

dilbertone · December 24, 2010

hello dear Dj Kat,

good evening! - many many thanks for the answer and the hints!

Yes i want to scrape the mentioned url.

I will try this out - and run the mentioned parser.

So you want to scrape the following url :

=60119]http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

And filter out the following data:

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

Why not use DOMdocument instead?

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

again thanks - i will run the code and do some tests. I come back and report all my findings.

Have a great day!

greetings

dilbertone

jr_developer · February 19, 2013

hi,

im pretty new to simple html dom,

is there anyone can help me to code this example. this is because i want to try collect some data from this website

4D88.com - Latest 4D Results

thanks

Sign In

Simple HTML DOM Parser: Starting points for a very easy example

Recommended Posts

dilbertone

Link to comment

Share on other sites

RichardRotterdam

Link to comment

Share on other sites

dilbertone

Link to comment

Share on other sites

RichardRotterdam

Link to comment

Share on other sites

dilbertone

Link to comment

Share on other sites

jr_developer

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information