Jump to content

Simple HTML DOM Parser: Starting points for a very easy example


Recommended Posts

Hello dear friends,

 

first of all : merry merry Xmas!!! :D

 

 

i want to parse with the simple Simple HTML DOM Parser,

 

 

well i am pretty new to php and to the Simple HTML DOM Parser.

 

My example: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

I want to collect the data in the block:

 

I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content"><!-- TYPO3SEARCH_begin -->

 

 

here the code is: - my trails.

 


// inculde the Simple HTML DOM Parser 
include_once('simple_html_dom.php');

// get the file we want to parse right now,create a DOM 
$html = file_get_html('');

// simple_html_dom::find() creates a new 
// simple_html_dom-Objekt, that consists out of
// corresponding childelements 

foreach($html->find('class: content ') as $h3) {

  // simple_html_dom::get the text in a tag
  // den Text innerhalb eines Tags
  if($h3->innertext == 'Text of a H3 Tag') {
    break;
  }
}

// simple_html_dom::next_sibling() gives the
// next   Element
$table = $h3->next_sibling();

 

 

but believe me - it  gives me not back what is aimed.

 

what  have id done wrong...?  ::)

 

dbone

Hello - thanks for answering!

 

simple html-dom-parser is not part of typo 3 - no - i do not think so!!!

 

My example: i want to parse and get the following information - (in the block)

consisting of the follwing 11 labels and corresponding values.

 

 

 

 

see the page: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

BTW: Sorry for the funny looking url - but it is the real url!!!

 

 

 

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

 

i try to get these infos  - with the  Simple HTML DOM Parser.

 

Well - i am not very familiar with  Simple HTML DOM Parser-  i thougth that i have to give some attributes.

 

is this right!?

 

greetings dbone

So you want to scrape the following url :

=60119]http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

And filter out the following data:

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

 

Why not use DOMdocument instead?

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

hello dear Dj Kat,

 

 

good evening! - many many thanks for the answer and the hints!

Yes i want to scrape the mentioned url.

 

I will try this out - and run the mentioned parser.

 

 

So you want to scrape the following url :

=60119]http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119

 

And filter out the following data:

Schulart:	BBS
Schulnummer:	60119
Anschrift:	Berufsbildende Schule Boppard
Antoniusstr. 21
56154 Boppard
Telefon:	(0 67 42) 80 61-0
Telefax:	(0 67 42) 80 61-29
E-Mail:	[email protected]
Internet:	http://www.bbs-boppard.de
Träger:	Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung:	08 Feb 2010 14:33:12 von 60119

 

Why not use DOMdocument instead?

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

 

again thanks - i will run the code and do some tests. I come back and report all my findings.

 

Have a great day!

 

greetings

dilbertone

 

 

  • 2 years later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.