Jump to content

Archived

This topic is now archived and is closed to further replies.

gcheers

BIG Database: input please

Recommended Posts

Hi,

 

I\'m developing a personals site whereby users can search profiles using zip codes.

 

One of the most important features of this site will be the ability to search for profiles within a certain mile radius of a zip code.

 

e.g. I am an x looking for a y within a z mile radius of \'A\'

(where A is the searchers ZIP CODE)

 

I have a zip code database which consists of 42,000 rows each entry has

 

CITY,ZIPCODE,AREACODE,STATENAME,LATITUDE,LONGITUDE,COUNTYNAME

 

Rather than put a huge stress on the server by calculating distances between A and B on the fly, I thought I would create a matrix whereby each unique zip code would be cross-referenced with every other unique zip code. A distance in miles (as the crow flies) would be calculated using PHP (Long,Lat) and inserted for each unique combination of 2 zip codes.

 

So,

 

TABLE would resemble something like.

 

ZIP1,ZIP2,30

ZIP1,ZIP3,47

ZIP1,ZIP4,47

ZIP2,ZIP3,17

ZIP2,ZIP4,50

ZIP3,ZIP4,32

 

IF I have 42,000 unique zip codes,

1. How many rows will I end up with (approximately)?

2. Is this the best approach?

 

I envisaged that queries where:

ZIP1=\'A\' AND DISTANCE<z

would work quite well?

 

Any input greatly appreciated.

 

PS.

(I\'m in the UK, dont know much about zip codes, are there really 42,000 or do I have too many!?)

Share this post


Link to post
Share on other sites

greetings g. my math is shaky, but i\'m guessing you\'d have 881,979,000 rows :shock: does the user have to be logged in to preform the search? if so, perhaps you could store their zip\'s lattitude and longitude in a cookie or session and calulating on the fly would be more feasible? i have no clue about the amount...

Share this post


Link to post
Share on other sites

Hi,

 

You have 42000 cities and apparently a lot of UK cities have identical zipcodes or UK has 42000 unique zipcodes !! (which wouldn\'t surprise me, no offense :) )

 

Try

 

select distinct ZIPCODE from TABLE_42000_row;

 

It will output all the unique zipcodes...

 

About the rest: It is probably a good idea to calculate it once and store it (remember to index it on the distance, speeds things up considerably when having 882 million rows... ) Also, note that this is only half the matrix, so you would have to select it like

 

select * from distances where zip1 = $ZIP or zip2 = $ZIP and distance < $distance....

 

 

But it depends a little bit on the structure of the DB, if you have 42000 unique zips, no problem - otherwise you\'ll have to define the \"center\" of a zipcode and then do the math....

 

Alternatively you could probably do something like

 

select (arccos(cos($a1)*cos($b1)*cos(LATITUDE)*cos(LONGITUDE) + cos($a1)*sin($b1)*cos(LATITUDE)*sin(LONGITUDE) + sin($a1)*sin($a2)) * 6378) as distance....

 

6378 is the earth radius in km, hence the units are km.

 

I\'m not sure if mysql supports arccos - but it would probably be too slow anyway...

 

Greetings,

 

P.

Share this post


Link to post
Share on other sites

Sorry.... the values in the formula are in radians by hte way...

 

1. convert lat, long to decimal format

2. convert to radians

3. perform the calculation...

 

$radians = $degrees*(pi/180); (eastern or north)

$radians = -1 * $degrees*(pi/180); (western or southern)

 

... this gets too heavy for mysql.... you should do the precalculations, remember to index it on zip1 and zip2 - since they are defining the subset you want to search for distances < something...

 

P.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.