Jump to content


Photo

BIG Database: input please


  • Please log in to reply
3 replies to this topic

#1 gcheers

gcheers
  • New Members
  • Pip
  • Newbie
  • 1 posts
  • LocationUK

Posted 01 May 2003 - 03:21 AM

Hi,

I\'m developing a personals site whereby users can search profiles using zip codes.

One of the most important features of this site will be the ability to search for profiles within a certain mile radius of a zip code.

e.g. I am an x looking for a y within a z mile radius of \'A\'
(where A is the searchers ZIP CODE)

I have a zip code database which consists of 42,000 rows each entry has

CITY,ZIPCODE,AREACODE,STATENAME,LATITUDE,LONGITUDE,COUNTYNAME

Rather than put a huge stress on the server by calculating distances between A and B on the fly, I thought I would create a matrix whereby each unique zip code would be cross-referenced with every other unique zip code. A distance in miles (as the crow flies) would be calculated using PHP (Long,Lat) and inserted for each unique combination of 2 zip codes.

So,

TABLE would resemble something like.

ZIP1,ZIP2,30
ZIP1,ZIP3,47
ZIP1,ZIP4,47
ZIP2,ZIP3,17
ZIP2,ZIP4,50
ZIP3,ZIP4,32

IF I have 42,000 unique zip codes,
1. How many rows will I end up with (approximately)?
2. Is this the best approach?

I envisaged that queries where:
ZIP1=\'A\' AND DISTANCE<z
would work quite well?

Any input greatly appreciated.

PS.
(I\'m in the UK, dont know much about zip codes, are there really 42,000 or do I have too many!?)

#2 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 02 May 2003 - 04:32 AM

greetings g. my math is shaky, but i\'m guessing you\'d have 881,979,000 rows :shock: does the user have to be logged in to preform the search? if so, perhaps you could store their zip\'s lattitude and longitude in a cookie or session and calulating on the fly would be more feasible? i have no clue about the amount...
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#3 pallevillesen

pallevillesen
  • Members
  • PipPipPip
  • Advanced Member
  • 135 posts
  • LocationDenmark

Posted 05 May 2003 - 08:13 AM

Hi,

You have 42000 cities and apparently a lot of UK cities have identical zipcodes or UK has 42000 unique zipcodes !! (which wouldn\'t surprise me, no offense :) )

Try

select distinct ZIPCODE from TABLE_42000_row;

It will output all the unique zipcodes...

About the rest: It is probably a good idea to calculate it once and store it (remember to index it on the distance, speeds things up considerably when having 882 million rows... ) Also, note that this is only half the matrix, so you would have to select it like

select * from distances where zip1 = $ZIP or zip2 = $ZIP and distance < $distance....


But it depends a little bit on the structure of the DB, if you have 42000 unique zips, no problem - otherwise you\'ll have to define the \"center\" of a zipcode and then do the math....

Alternatively you could probably do something like

select (arccos(cos($a1)*cos($b1)*cos(LATITUDE)*cos(LONGITUDE) + cos($a1)*sin($b1)*cos(LATITUDE)*sin(LONGITUDE) + sin($a1)*sin($a2)) * 6378) as distance....

6378 is the earth radius in km, hence the units are km.

I\'m not sure if mysql supports arccos - but it would probably be too slow anyway...

Greetings,

P.
Palle Villesen, www.birc.dk [br]Bioinformatics Research Center

#4 pallevillesen

pallevillesen
  • Members
  • PipPipPip
  • Advanced Member
  • 135 posts
  • LocationDenmark

Posted 05 May 2003 - 08:26 AM

Sorry.... the values in the formula are in radians by hte way...

1. convert lat, long to decimal format
2. convert to radians
3. perform the calculation...

$radians = $degrees*(pi/180); (eastern or north)
$radians = -1 * $degrees*(pi/180); (western or southern)

... this gets too heavy for mysql.... you should do the precalculations, remember to index it on zip1 and zip2 - since they are defining the subset you want to search for distances < something...

P.
Palle Villesen, www.birc.dk [br]Bioinformatics Research Center




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users