Jump to content

How to extract data using a pattern by Simple HTML DOM?


torontobb

Recommended Posts

:confused:

Hi Everyone,

 

I have just started using Simple HTML DOM today and I have spent 4 hours not getting what I want.

 

I want to be able to extract the following information:

 

<div class="listing_content">
	<span class="serialNumb" style="line-height: 21px;">77777</span>
<br />
444 ASDF, Alpha, Tango, Beta
<br />
77777 Director:99999
              <div>
<img title='web' src='http://cpgimg.com/images/icon_sm_web.gif' alt='web'/>  <a href='javascript:void(0)' onClick="window.open('/redir.jsp?p_url=http:%2f%2fwww.cnn.com&p_cid=2707304&p_hid=279E00&p_ct=3527&p_pr=KO&p_fr=U');" class='listing_link'>website</a>  <img title='email' src='http://cpgimg.com/images/icon_sm_mail.gif' alt='email'/>  <a class='listing_link' href="javascript:void(0)" onclick="popupEmail('/email.jsp?lang=0&p_cid=2707304');(new Image()).src='/redir.jsp?p_url=&p_cid=2707304&p_hid=279E00&p_ct=3527&p_pr=ON&p_fr=E&msec='+(new Date()).getMilliseconds()">E-mail</a>  
               </div>
</div>

               

The content I need to pull separately from above include:

1- serialNumb = 77777

2- 444 ASDF, Alpha, Tango, Beta

3- 77777 Director:99999

4- www.cnn.com

 

I want all the data to recorded to different variables so I can upload them to MySQL.

 

Any help with this is much appreciated. I don't have to use Simple DOM HTML but per my search it seems to be the best tool (however, I am not so lucky with it.)

 

***Not to forget that this page is full of <div>, <br />, <img>, and other tags. The quoted part is just one excerpt but this part is unique and used once in the page "style="line-height: 21px;". Also the "('/redir.jsp?p_url" is also unique for the URL portion.

 

Thanks again.

 

I have had a look at it, but I think you took the little minor part of my post that is not an issue to me and pointed me to it.

 

I need to do PARSING of html file. That is it in nutshell.

 

I have already overcome a lot of issues. But I have issue with space available in the html file.

 

Anyone who has experience with HTML PARSING please let me know how you would parse out the address out of this excerpt of an html (***Notice- All the spaces exist in the html source file like quoted here):

 

<span class="basic_serial">(777) 777-7777</span>

												<br />









										1111 ABCD, EFGH, IJKL

										<br />

 

 

Thanks,

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.