Jump to content

Simple HTML DOM Parser: Starting points for a very easy example


dilbertone

Recommended Posts

Hello dear friends,

 

first of all : merry merry Xmas!!! :D

 

 

i want to parse with the simple Simple HTML DOM Parser,

 

 

well i am pretty new to php and to the Simple HTML DOM Parser.

 

My example: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

I want to collect the data in the block:

 

I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content"><!-- TYPO3SEARCH_begin -->

 

 

here the code is: - my trails.

 


// inculde the Simple HTML DOM Parser 
include_once('simple_html_dom.php');

// get the file we want to parse right now,create a DOM 
$html = file_get_html('');

// simple_html_dom::find() creates a new 
// simple_html_dom-Objekt, that consists out of
// corresponding childelements 

foreach($html->find('class: content ') as $h3) {

  // simple_html_dom::get the text in a tag
  // den Text innerhalb eines Tags
  if($h3->innertext == 'Text of a H3 Tag') {
    break;
  }
}

// simple_html_dom::next_sibling() gives the
// next   Element
$table = $h3->next_sibling();

 

 

but believe me - it  gives me not back what is aimed.

 

what  have id done wrong...?  ::)

 

dbone

Hello - thanks for answering!

 

simple html-dom-parser is not part of typo 3 - no - i do not think so!!!

 

My example: i want to parse and get the following information - (in the block)

consisting of the follwing 11 labels and corresponding values.

 

 

 

 

see the page: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

BTW: Sorry for the funny looking url - but it is the real url!!!

 

 

 

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

 

i try to get these infos  - with the  Simple HTML DOM Parser.

 

Well - i am not very familiar with  Simple HTML DOM Parser-  i thougth that i have to give some attributes.

 

is this right!?

 

greetings dbone

So you want to scrape the following url :

=60119]http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

And filter out the following data:

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

 

Why not use DOMdocument instead?

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

hello dear Dj Kat,

 

 

good evening! - many many thanks for the answer and the hints!

Yes i want to scrape the mentioned url.

 

I will try this out - and run the mentioned parser.

 

 

So you want to scrape the following url :

=60119]http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

And filter out the following data:

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

 

Why not use DOMdocument instead?

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

 

again thanks - i will run the code and do some tests. I come back and report all my findings.

 

Have a great day!

 

greetings

dilbertone

 

 

  • 2 years later...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.