Jump to content

Archived

This topic is now archived and is closed to further replies.

James_S

Need help with splitting an address string into multiple parts

Recommended Posts

Long time ago when i created my first database, i thought it was smart to store address, zip code, and city as one string in one DB-field.

Now i have created a new database and splitted these into seperate fields.

But i would really want to transfer the old data into the new database.

Does anyone have a suggestion how i would achieve this:

//current - note that it may contain line feeds, or commas.
$address = "Thestreet 123
3131 MYPLACE
COUNTRY";

//wanted
$address = "Thestreet 123, COUNTRY" (for the ease, i can live with the country thing still being there)
$zip = "3131"
$place = "MYPLACE"

In words:

- Search for a 4-digit number and "cut" this and save to $zip instead
- If zip is found, the next word is "place", so this should also be cut and saved to $place
(- trim the string, and replace linefeeds with commas.)

Note: The original address string varies a lot, since users entered this by them selves. So it's not always the same format, thus making it harder to solve...

If someone wants to help me, i would be really thankful.

Share this post


Link to post
Share on other sites
[quote author=redarrow link=topic=106443.msg425732#msg425732 date=1157086534]
hard code it.

blue fingers.
[/quote]

I think you misunderstand. I'm not trying to fix user input when it's being posted - then i would of course have separated these fields on the form instead.

I already have 10 000 records, that i need to convert.

So i need to create a function, so that i can loop through all records and apply it.

Share this post


Link to post
Share on other sites
sorry i diffently was wrong so sorry hope you get this sorted 10.000 no way your have no hands good luck.

Share this post


Link to post
Share on other sites
I'm not sure i want to do that since it's my customers real addresses...

But the content shouldn't matter the function works like i wrote in words:

- Search the string for a 4-digit number and "cut" this and save to $zip instead
- If the 4 digit zip was found, the next word is for sure "place", so this should also be cut and saved to $place. If zip was not found, don't do this.
(- trim the string, and replace linefeeds with commas.)

Share this post


Link to post
Share on other sites
[code]
<?php

$address = "The street 123 3131 MYPLACE COUNTRY";

preg_match('/^(.*)\s+([0-9]+)\s+([0-9]+)\s+([aA-zZ]+\s+?[aA-zZ]+)+?/i',$address, $newadd);

$zip = $newadd[3];
$place = $newadd[4];

?>[/code]

Share this post


Link to post
Share on other sites
I guess you are assuming that the formating will always be the same there. Unfortunately, it is not... Sometimes the entire field will be empty, sometimes normal address without zip/country, sometimes even more details etc.

I'm pretty sure that the only way to be sure that there is a zip code in the string, is to actually search for 4 digits.

Thanks for all reaplies though. Fast responses here  :o

Share this post


Link to post
Share on other sites
This might be a mad idear but maybe it will be best to email all the users to reenter there address send them to the page and enter into the new database colum.

copy all the ids to the new colum then use an update.


the only other way is to look at all the possablitys of sentence format then use preg_match
good luck.

Share this post


Link to post
Share on other sites
If the format and order are going to vary, it's going to be a bitch to get it working well.

This will extract only 4 digits numbers.

[code]<?php

$address = "Thestreet 123 3131 MYPLACE COUNTRY";
preg_match('/[0-9][0-9][0-9][0-9]/i',$address, $zip);

echo"$zip[0]";

?>[/code]

Share this post


Link to post
Share on other sites
ceaser is it possable to format the data output as it is in a good order then insert the output  into the new database colum.

Share this post


Link to post
Share on other sites
It will be dificult if your initial data is not in any particular order...considering both country and street names can have one or two words...street numbers can also be 4 digits.

The truth is, you should do some error checking in the forms, to make sure addresses are submitted correctly by people.

Share this post


Link to post
Share on other sites
that's why i was hoping you could do a data dump, to confirm that it probably won't be possible to make a fool proof auto conversion.  Unfortunately, you are going to have to do some manual labor in your list conversion.  Some suggestions on making it easier:

- Make an "update your information form." For your new table of information.  Make a script that upon user login, check to see if they have their info in the new table. If not, prompt the user to update their information, except this time, have the input fields in seperate fields.  This solution may not take care of all 10,000 clients, but it should at least take care of the active ones, and reduce the amount of clients you will have to manually convert. 

- Make a script to query the db and do a dump of all the rows.  Or a little at a time, or whatever, since it's a long list.  Have it display the info in some text input fields, as the field value, so that it is "auto filled." Then use a special character of your choosing, like a ~ or something, and type it into the parts that need to be seperated.  When you submit the update, you can then split the string at the ~ or whatever, and update your new table with the split data.  Or hell, if  you want to get really fancy, you can throw in some javascripting so all you have to do is point and click and it will auto insert some seperater for you, before you click on submit.  Long tedious work? Yep. But better than manually re-entering it 100%.

Probably you can do a combination of the 2 things: Setup a client update info script and let that run for a bit, until you get all your active clients updated, then do the 2nd thing after that. 

Good luck!

Share this post


Link to post
Share on other sites
I like the idea of having them update their own information.

You should also do a check when the new information is being submitted.

Example: If their street address does not match " 123 Somewhere Street (And maybe Apartment/Suite# 9/A)", then you echo out error messages and don't allow the information to be submitted. Etc...and so on. You check to make sure each field follows a certain format, with only allowable characters.

A clean database, is a happy database!!

Share this post


Link to post
Share on other sites
Of course, the new database is done properly, and form have all fields, and they are properly validated before insert.
I just want to transfer the old data to the new database.

I have made a fairly good solution now:
[code]
<?php

$address = "Thestreet 123 ,
   3131 MYPLACE,COUNTRY"; //formatting messed up on purpose

$cutLinefeeds = str_replace ( "\r\n", "", $address);
$cutCommas = str_replace ( ",", " ", $cutLinefeeds);
$cutSpaces = preg_replace('/\s\s+/', ' ', $cutCommas);

preg_match('/[0-9][0-9][0-9][0-9]/i',$cutSpaces, $zip);

$theZip = $zip[0];

if(!empty($theZip))
{

$adressArray = explode(" ", $cutSpaces);
$index = array_search($theZip, $adressArray);
$city = $adressArray[$index+1];

$streetArray = array_slice($adressArray, 0, $index);
$street = trim(implode(" ", $streetArray));
} else

$street = $cutSpaces;

echo "Street: $street<br>";
echo "Zip: $theZip<br>";
echo "City: $city<br>";

?>[/code]

Comments:
- First i'm removing linefeeds and commas, so that every part have only a space between.
- I assume that city will always come right after the zip.
- I assume that if theres anything after city, it'is garbage. Usually theres nothing, but theres something it's always(?) country. Since all my customers are from same country, I don't need it.

The most obvious bug i find now, is that if the street address also contains 4 digits, it will assume that is the zip.

Does anyone have any other suggestions/improvements?

Thanks!

Share this post


Link to post
Share on other sites
lol.. by the time you "perfect" this "one time transfer" code... it may have taken you less time to just do it manually, lol.  I mean, even if you manage to make conditions to handle all your possible exceptions, you're still going to have to double check the list manually to make sure it did it right, right?  So tell me how having one person check 10,000 records is easier than 10,000 people each checking 1 record? Or how you going from looking at 5,000 remaining non-active clients' info manually is really that much faster than doing that anyways, but adding an extra click here and there to seperate it with a ~ ?

i admire your enthusiasm, and maybe your data is not as unique as it seems; maybe there is a relatively small pattern to it, and it might really be easier to continue on this quest for the perfect one-time-transfer script.  that's why i wanted to see a snapshot of the data. so you know more about it than we do.  good luck!

Share this post


Link to post
Share on other sites
Reasons why i'm doing this:

1. Emailing customers and ask to update is not an option. Maybe 10% will do it? 95% are one-time-customers - i wouldn't bother my self if i got such email - at least not until next time i shop there. And i really don't like emailing customers without good reasons (such email is close to what i call spam: emails you delete pretty instantly). But of course - next time i will email my customers, i will ask them to ALSO double check their addresses.

2. It's not a dissaster if the columns for address, zip or city are not perfect. The fields are not related to other tables with possible values (for instance a table of all existing zips) that needs to be matched. So no matter what i do, there will always be errors in the database, for instance because of typos in address users enter.

3. It's better to have 10% fields "not perfect" (and 90% perfect), than leaving everything completely out?

4. No, i will not double check the results, because of reason #2. :)

I will do a try on the whole database afterwords, and do a quick scan over the results to see if i find any obvious faults. My guess is that at least 90% will be perfect.

I've spent approx 2 hours on this project now, and i'm pretty sure that i would spend at least 100 hours to do this manually!

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.