Hi,

I\'m developing a personals site whereby users can search profiles using zip codes.

One of the most important features of this site will be the ability to search for profiles within a certain mile radius of a zip code.

e.g. I am an x looking for a y within a z mile radius of \'A\'

(where A is the searchers ZIP CODE)

I have a zip code database which consists of 42,000 rows each entry has

CITY,ZIPCODE,AREACODE,STATENAME,LATITUDE,LONGITUDE,COUNTYNAME

Rather than put a huge stress on the server by calculating distances between A and B on the fly, I thought I would create a matrix whereby each unique zip code would be cross-referenced with every other unique zip code. A distance in miles (as the crow flies) would be calculated using PHP (Long,Lat) and inserted for each unique combination of 2 zip codes.

So,

TABLE would resemble something like.

ZIP1,ZIP2,30

ZIP1,ZIP3,47

ZIP1,ZIP4,47

ZIP2,ZIP3,17

ZIP2,ZIP4,50

ZIP3,ZIP4,32

IF I have 42,000 unique zip codes,

1. How many rows will I end up with (approximately)?

2. Is this the best approach?

I envisaged that queries where:

ZIP1=\'A\' AND DISTANCE<z

would work quite well?

Any input greatly appreciated.

PS.

(I\'m in the UK, dont know much about zip codes, are there really 42,000 or do I have too many!?)

# BIG Database: input please

Started by gcheers, May 01 2003 03:21 AM

3 replies to this topic

### #1

Posted 01 May 2003 - 03:21 AM

### #2

Posted 02 May 2003 - 04:32 AM

greetings g. my math is shaky, but i\'m guessing you\'d have 881,979,000 rows :shock: does the user have to be logged in to preform the search? if so, perhaps you could store their zip\'s lattitude and longitude in a cookie or session and calulating on the fly would be more feasible? i have no clue about the amount...

### #3

Posted 05 May 2003 - 08:13 AM

Hi,

You have 42000 cities and apparently a lot of UK cities have identical zipcodes or UK has 42000 unique zipcodes !! (which wouldn\'t surprise me, no offense )

Try

select distinct ZIPCODE from TABLE_42000_row;

It will output all the unique zipcodes...

About the rest: It is probably a good idea to calculate it once and store it (remember to index it on the distance, speeds things up considerably when having 882 million rows... ) Also, note that this is only half the matrix, so you would have to select it like

select * from distances where zip1 = $ZIP or zip2 = $ZIP and distance < $distance....

But it depends a little bit on the structure of the DB, if you have 42000 unique zips, no problem - otherwise you\'ll have to define the \"center\" of a zipcode and then do the math....

Alternatively you could probably do something like

select (arccos(cos($a1)*cos($b1)*cos(LATITUDE)*cos(LONGITUDE) + cos($a1)*sin($b1)*cos(LATITUDE)*sin(LONGITUDE) + sin($a1)*sin($a2)) * 6378) as distance....

6378 is the earth radius in km, hence the units are km.

I\'m not sure if mysql supports arccos - but it would probably be too slow anyway...

Greetings,

P.

You have 42000 cities and apparently a lot of UK cities have identical zipcodes or UK has 42000 unique zipcodes !! (which wouldn\'t surprise me, no offense )

Try

select distinct ZIPCODE from TABLE_42000_row;

It will output all the unique zipcodes...

About the rest: It is probably a good idea to calculate it once and store it (remember to index it on the distance, speeds things up considerably when having 882 million rows... ) Also, note that this is only half the matrix, so you would have to select it like

select * from distances where zip1 = $ZIP or zip2 = $ZIP and distance < $distance....

But it depends a little bit on the structure of the DB, if you have 42000 unique zips, no problem - otherwise you\'ll have to define the \"center\" of a zipcode and then do the math....

Alternatively you could probably do something like

select (arccos(cos($a1)*cos($b1)*cos(LATITUDE)*cos(LONGITUDE) + cos($a1)*sin($b1)*cos(LATITUDE)*sin(LONGITUDE) + sin($a1)*sin($a2)) * 6378) as distance....

6378 is the earth radius in km, hence the units are km.

I\'m not sure if mysql supports arccos - but it would probably be too slow anyway...

Greetings,

P.

Palle Villesen, www.birc.dk
[br]Bioinformatics Research Center

### #4

Posted 05 May 2003 - 08:26 AM

Sorry.... the values in the formula are in radians by hte way...

1. convert lat, long to decimal format

2. convert to radians

3. perform the calculation...

$radians = $degrees*(pi/180); (eastern or north)

$radians = -1 * $degrees*(pi/180); (western or southern)

... this gets too heavy for mysql.... you should do the precalculations, remember to index it on zip1 and zip2 - since they are defining the subset you want to search for distances < something...

P.

1. convert lat, long to decimal format

2. convert to radians

3. perform the calculation...

$radians = $degrees*(pi/180); (eastern or north)

$radians = -1 * $degrees*(pi/180); (western or southern)

... this gets too heavy for mysql.... you should do the precalculations, remember to index it on zip1 and zip2 - since they are defining the subset you want to search for distances < something...

P.

Palle Villesen, www.birc.dk
[br]Bioinformatics Research Center

#### 0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users