Jump to content

Postcode search based on distance


speckytwat

Recommended Posts

Hi, I'm trying to find a way to allow users to search based on UK Postcode and have results pulled from a database of members all of whom have a postcode associated with them (i.e their location). For example the user types in W1 1AA and the query pulls in members in order of how far away they are from that postcode.

I've seen a number of scripts that offer a database of postcodes and calculate distances between them, but I need this to work based on comparing the postcode entered in a form, with a fairly small list of members' postcodes, and then listing the members starting with the geographically closest at the top. The solutions I've seen seem to be pretty complex and I only need this for a small-scale solution.

I already have a search script for actual postcode but it searches based on it being a literal string so ignores actual distance.

Anyone know how this can be set up? Thanks!

Link to comment
Share on other sites

Just because you see this as a "small" problem does not affect the complexity of the problem. If you have a script that does what you want then that is what you should be using. Regardless of scale. If it doesn't do what you want, then why are we discussing it?

Link to comment
Share on other sites

"If it doesn't do what you want, then why are we discussing it?" The answer is in my original post. It searches by literal string so "W1" brings up everything with W1 in it - it doesn't calculate distance and output accordingly. And so, as you pointed out, the script doesn't do what I need it to... hence my reason for discussing it. Thank you.

 

Going back to the problem in hand - getting the database is one thing, but I also need a script that will do the calculation based on an inputted postcode- so does anyone know if something suitable exists that can be used?

Link to comment
Share on other sites

I must of misunderstood the following line from your post:

 

 

I've seen a number of scripts that offer a database of postcodes and calculate distances between them,

 

What does that mean if not that it is doing what you seem to be asking for? They have a db of codes and they do a calculation. Can you not adapt that for your purposes?

Link to comment
Share on other sites

It's just math. Great Circle might not matter much to you, and if you don't need exact distances - merely want to sort by proximity - then you can calculate distance between two lat/long coordinates using basic geometric algebra:

not quite the distance = (lat 1 - lat 2) ** 2 + (long 1 - long 2) ** 2
Actual distance needs a square root there at the end, but for your purposes it won't change the ordering and so is unnecessary computation cycles that can be left out.

 

Once you have the database in your... database... construct an appropriate query that will tie the postcodes to the appropriate lat/long coordinates, then incorporate that formula into the ORDER BY.

Link to comment
Share on other sites

To get something efficient requires some form of pre-computation and indexing in a relational database. If you don't have precomputation, you will be left with a feature that requires you to:

 

Have a table with all the postalcodes associated with a longitude,latitude combination

Get a list of all the postalcodes associated with users

query against a table of precomputed distances between coordinate points

 

The computation as described by Requinix involves spherical geometry, and can be performed without issue using the haversine formula.

 

To create the cross-reference table you would do this:

 

Compute using the haversine formula, the distance between the submitted postalcode and all the other postalcodes, then returning only those that are less than or equal to the desired range.

 

This will degrade substantially as the number of users and corresponding postalcodes increase. In the US we have a zipcode system that is far less granular with something like 43000 different zipcodes in total. This number is important as I'll explain in a moment.

 

To work with a relational database, what sort of table of values would one need so you did not need to calculate distances between other postalcodes on the fly?

 

The answer as is typically implemented, is to create a table that relates postalcodes to each other with the pre-computed distance between them. That table would look something like this:

 

postalcode_id

distance

from_postalcode_id

If you are working with mysql, you would want this table to be an innodb table, with the Primary key including all 3 columns, ensuring that it is a "clustered index". This is often confusing, given that most people consider an index to be a seperate thing, but what it actually means is that the table itself is equivalent to an index. Performing queries against this table will be as optimal as possible and no additional index is required.

 

Now to calculate (for the US) a cross-reference from every zipcode to every other zipcode would require a 1.8b row table.

 

Assuming you wanted to actually all the cross references, you would actually only need to do half the calculations, as you already have the reciprocal distance for any pair of zipcodes.

 

Still 1.8b rows is not a viable table size for a small/medium server implementation. What you would want to do in order to limit the table size down to a manageable size, would be to start with a reasonable parameter constraining the maximum distance between 2 points. Let's say for example that you decide that 50 or 100 miles is the maximum distance between users you will allow in a search. At that point, your calculation script would discard any zipcodes that exceed the maximum distance. So long as your table remains in the millions to tens of millions, you will most likely have a viable feature that can be run on a small to medium footprint server.

 

The actual calculation in a php implementation can be found in many places. This one comes from the rosetta code project which provides algorithm implementation for a lot of different formulas and algorithms across various computer languages.

 

class POI {
    private $latitude;
    private $longitude;
    public function __construct($latitude, $longitude) {
        $this->latitude = deg2rad($latitude);
        $this->longitude = deg2rad($longitude);
    }
    public function getLatitude() return $this->latitude;
    public function getLongitude() return $this->longitude;
    public function getDistanceInMetersTo(POI $other) {
        $radiusOfEarth = 6371000;// Earth's radius in meters.
        $diffLatitude = $other->getLatitude() - $this->latitude;
        $diffLongitude = $other->getLongitude() - $this->longitude;
        $a = sin($diffLatitude / 2) * sin($diffLatitude / 2) +
            cos($this->latitude) * cos($other->getLatitude()) *
            sin($diffLongitude / 2) * sin($diffLongitude / 2);
        $c = 2 * asin(sqrt($a));
        $distance = $radiusOfEarth * $c;
        return $distance;
    }
}
Once your database was created, a query for all zipcodes (joined to users) within ie

 

For UK postalcodes, this really is only a viable solution if you can limit your postalcode list to the outcodes, you can utilize this strategy to create a cross reference table which would be completely viable at >= 9m rows.

 

For the full postal code, the number of combinations runs into the millions, and this strategy becomes essentially non-viable, even if you were to obtain and utilize a full postalcode database table with all long/lat combinations.

 

 

An alternative to consider that could work for you is to utilize MongoDb which has built-in features that aid greatly when dealing with geography. It has for example, a geosphere index that, assuming you loaded the complete list of UK postalcodes into a collection, would allow you to avoid all this overhead and get results on the fly.

 

You can look at this page to see a description of this feature.

 

You could for example, utilize mongo to get the list of matching zipcodes and feed that into an IN () clause in your relational query of matching users although you'd have to inject the associated distance you got from mongo to get the ordering you want.

 

Depending on where you are in the development process, mongodb might be a better backend for the application, saving you from having to deal with the complexity and overhead of multiple persistence servers.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.