Jump to content

Cleaning up Data for display..


clowes

Recommended Posts

I would really appreciate some advice here. I have spent countless hours trying to come up with a suitable resolution.

 

Essentially I have a script which grabs data from various sources. All the sources provide the same data but in various different formats.

The main aim of the script is to get the data, and display the important data in a clean, easy to read manner.

 

All the data contains some common attributes.. the important stuff. Names, Addresses, emails, and websites.

In addition to this some data sources contain completely irrelevant data.

 

The perfect resolution would be to grab the important data from each source and simply display it. I can get emails and websites using regular expressions, however as far as I am aware to get Names/Addresses is impossible.

 

Some data sources say

 

Name: Jack Johnson

 

Others say

 

First Name..... Jack

Last Name..... Johnson

 

My first question is as to whether I am correct in believing it impossible to extract the names/addresses when they are always displayed in a variety of different forms?

 

 

--------

 

The approach I am currently taking is that although not ideal I will simply remove data that I don't want, and display the rest.

For example one data source separates things with a series of 10 - characters. Thus I have just used str_replace to remove these.

 

The problem with this method is that each is layed out in a different way. I have removed the rubbish, and am left with data which has varying amounts of line breaks between line for example. I cannot simply remove line breaks as that it would be a clump of ugly text, hence I have come to another fence.

 

 

Does anyone have any suggestions on a suitable way to approach the task outlined above?

Thankyou.

Link to comment
Share on other sites

Sadly the data is not returned as XML.

 

This is data received from domain whois servers. I have attached examples of 3 raw returned data variables.

In its most simple form I can simply utilize nl2br and then print the output.

 

Any advice would be greatly appreciated.

Thanks

 

domain: one.com owner: n/a organization: B-one email: jnj@b-one.net address: Dubai Internet City address: Building 9 city: Dubai postal-code: 500401 country: AE phone: +45.46907100 admin-c: CCOM-387512 jnj@b-one.net tech-c: CCOM-387512 jnj@b-one.net billing-c: CCOM-387512 jnj@b-one.net nserver: a.b-one-dns.net nserver: b.b-one-dns.net status: lock created: 1992-02-12 00:00:00 UTC modified: 2008-03-18 09:07:18 UTC expires: 2015-02-13 05:00:00 UTC contact-hdl: CCOM-387512 person: n/a organization: B-one email: jnj@b-one.net address: Dubai Internet City address: Building 9 city: Dubai postal-code: 500401 country: AE phone: +45.46907100 source: joker.com live whois service query-time: 0.013596 db-updated: 2010-05-29 17:30:47 NOTE: By submitting a WHOIS query, you agree to abide by the following NOTE: terms of use: You agree that you may use this data only for lawful NOTE: purposes and that under no circumstances will you use this data to: NOTE: (1) allow, enable, or otherwise support the transmission of mass NOTE: unsolicited, commercial advertising or solicitations via direct mail, NOTE: e-mail, telephone, or facsimile; or (2) enable high volume, automated, NOTE: electronic processes that apply to Joker.com (or its computer systems).

 

 

Domain Name………. abcd.com Creation Date…….. 1995-04-06 Registration Date…. 2009-06-02 Expiry Date………. 2011-04-08 Organisation Name…. Disney Enterprises, Inc. Organisation Address. 500 S. Buena Vista Street Organisation Address. 506 Second Ave. Suite 2100 Organisation Address. Burbank Organisation Address. 91521 Organisation Address. CA Organisation Address. UNITED STATES Admin Name……….. Domain Registrar Admin Address…….. Attn Phil Wahl 500 S Buena Vista Street Admin Address…….. Admin Address…….. Burbank Admin Address…….. 91521 Admin Address…….. CA Admin Address…….. UNITED STATES Admin Email………. domain.registrar@ONLINE.DISNEY.COM Admin Phone………. +1.8186233325  Admin Fax………… +1.8186233555 Tech Name………… Domain Registrar Tech Address……… Attn Phil Wahl 500 S Buena Vista Street Tech Address……… Tech Address……… Burbank Tech Address……… 91521 Tech Address……… CA Tech Address……… UNITED STATES Tech Email……….. domain.registrar@ONLINE.DISNEY.COM Tech Phone……….. +1.8186233325  Tech Fax…………. +1.8186233555  Name Server………. sens01.dig.com Name Server………. sens02.dig.com Name Server………. orns01.dig.com Name Server………. orns02.dig.com

 

 

Domain Name: FMA.COM Registrar: MONIKER Registrant [1690]: com fma dns-admin@fma.net Future Media Architects, Inc. P.O. Box 71 Road Town Tortola 99999 VG Administrative Contact [1690]: com fma dns-admin@fma.net Future Media Architects, Inc. P.O. Box 71 Road Town Tortola 99999 VG Phone: +1.2844945870  Fax: +1.2844948586 Billing Contact [1690]: com fma dns-admin@fma.net Future Media Architects, Inc. P.O. Box 71 Road Town Tortola 99999 VG Phone: +1.2844945870  Fax: +1.2844948586 Technical Contact [1690]: com fma dns-admin@fma.net Future Media Architects, Inc. P.O. Box 71 Road Town Tortola 99999 VG Phone: +1.2844945870  Fax: +1.2844948586 Domain servers in listed order: NS1.US.FMA.NET 72.32.55.82 NS2.US.FMA.NET 72.3.153.73 Record created on: 2002-01-18 02:37:00.0 Database last updated on: 2010-04-15 06:24:37.183 Domain Expires on: 2012-01-18 02:37:00.0
Link to comment
Share on other sites

My first question is as to whether I am correct in believing it impossible to extract the names/addresses when they are always displayed in a variety of different forms?

 

No, it's not impossible. Check for firstname lastname first and as a last resort check for name.

Link to comment
Share on other sites

Utilizing the examples above, I would be looking to return the following:

 

Address:

B-one

Dubai Internet City

Building 9

Dubai

500401

AE

+45.46907100

 

Email:

jnj@b-one.net

 

Nameservers:

a.b-one-dns.net

b.b-one-dns.net

 

Creation:

1992-02-12

Updated:

2008-03-18

Expiration:

2015-02-13

 

 

 

 

Address

Disney Enterprises, Inc.

500 S. Buena Vista Street

506 Second Ave. Suite 2100

Burbank

91521

CA

UNITED STATES

+1.8186233325

 

Email:

domain.registrar@ONLINE.DISNEY.COM

 

Name Servers:

sens01.dig.com

sens02.dig.com

orns01.dig.com

orns02.dig.com

 

Creation:

1995-04-06

Updated:

2009-06-02

Expiration:

2011-04-08

 

 

Address:

com fma

Future Media Architects, Inc.

P.O. Box 71

Road Town

Tortola

99999

VG

+1.2844945870

 

Email:

dns-admin@fma.net

 

Nameservers:

NS1.US.FMA.NET

NS2.US.FMA.NET

 

Creation:

2002-01-18

Updated:

2010-04-15

Expiration:

2012-01-18

 

I can extract the nameservers/email/dates. It is the Address/Phone details I am having trouble with as every different source provides the data in a different way. Any advice would be great.

 

Thanks

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.