Spatial Indexing


Release 0.7.5 included a major change to how spatial data is indexed and queried. This post is for those of you that have some database background and are interested in the details.

Cache Table
BCaching uses a shared Cache table where data loaded from pocket queries is stored. This data includes CacheID, Latitude, Longitude, Waypoint, Name, Terrain, Difficulty, Archived, and so on. Previously there was also a spatial 2D Point column named “Location” that represented the Latitude and Longitude of each Cache. There was a spatial index on Location which allowed location-based queries to quickly identify a list of caches in a requested region.

CacherCacheAccess Table
Per groundspeak rules, cachers are only allowed access to caches they have uploaded themselves. Therefore there is also a CacherCacheAccess table which holds a record for every cacher and cache they have access to. The CacherCacheAccess table is also a convenient place to store cacher-specific attributes such as a FoundIt flag, IgnoreIt flag, private corrected coordinates, and some other things.

So every Cache query, location-based or not, needs to join to CacherCacheAccess to make sure the requesting cacher has access. Most queries also include filters to exclude archived caches and possibly found or ignored caches.

The Problem
Since a spatial index cannot include any other attributes besides a spatial column, every location-based query will identify all caches in a region, THEN filter them based on the other attributes like archived, cacherid, foundit, ignoreit, etc.

If a cacher is in a region that has lots of caches he’s already found and he wants to see the nearest unfound caches, the database will have to take the time to needlessly identify possibly hundreds of caches to join to CacherCacheAccess before filtering them back out again.

It sure would be nice if we could have an index that includes those other attributes as well as the spatial data.

Enter geocells.
A geocell coordinate system divides the world up into rectangular cells that are each represented by a hexadecimal digit. Each cell is then subdivided again and is represented by the next digit. Each subsequent digit represents an increasingly granular region. See the geocell link for more nitty gritty details.

For example, Groundspeak Headquarters (GCK25B) has posted coordinates of N47°38.000 W122°20.000. That location has a corresponding geocell of “a44b8aa9f4313” at 13-digit accuracy which is a range of about 2 feet. The shorter string “a44b8a” represents a 6 mile geocell that also includes that point.

Since geocells are represented text strings, they can be indexed as a text column along with other attributes. Searching a particular region simply requires identifying a short list of geocells that contain that region, and those geocells can be included along with the other criteria.

So the spatial column and index on the Cache table is now gone and a copy of the Cache Latitude, Longitude, and Geocell is kept in the CacherCacheAccess table where it can be overridden by each cacher with alternate corrected coordinates. Finally, there is a composite index used for proximity searches which includes CacherID, FoundIt, IgnoreIt, and Geocell.

Now each proximity query can use a single index to identify a much smaller set of candidate records before applying a join and additional criteria.

Have you noticed any improvements in performance? The Map and Nearest functions have been noticeably faster for me.

It’s only been 3 days, but the data looks good so far. The average proximity search response times on the desktop map view 2 weeks prior to the release was 2.3 seconds and after was 1.1 seconds.

That may not sound like much, but when queries are taking less CPU, less memory, and fewer disk reads, that frees up resources for other requests and more users.

Advertisements

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: