What’s new database?

Every month bcaching grows a little bigger with more users and more data. The database has grown to over 12 GB in order to accommodate 1.5 million geocache records and over 32 million finder logs. One of the persistent problems has been how to keep that geocache data up-to-date as efficiently as possible and working fast enough on reasonably-priced hosting services (currently a single medium-sized Windows based VPS).

Over 700 GPX files are processed every day. Over the past 24 hours, 890 GPX files were processed containing 388,891 geocaches, 1,902,996 finder logs, 106,832 waypoints, and 85,612 travel bugs. It took 17 hours and 41 minutes of processing time to read and load those files. That’s 2,341 data objects per minute and it’s too slow even after several performance improvements over the years. It also takes resources away from serving web and api requests, even though GPX processing is run at a lower priority than other work.

For the past couple of months I have spent some time investigating, testing and implementing a completely new back-end database using MongoDB to replace MySql. Mongo is fast. It sacrifices features that could slow it down and requires the application to take on responsibility for more functions but it provides excellent performance in return.

The jump from a traditional SQL database to a schema-less database required a complete rewrite of all the data access logic but it simplified some of the logic as well (especially the GPX file processing). Some of the data model also had to be reorganized to best take advantage of certain mongo features.

One of the remaining problems is how to migrate from the old database to the new one. Normally a release includes only minor database changes and can be completed in a couple hours or less but a full database migration would take the better part of a day and I’m not willing to shut down the site for that long. Another approach would be to synchronize the two databases while the site is live, then shut down the site only long enough for a final synchronization before switching over. That is not an option either because there is not enough disk space to support two full databases at the same time.

The remaining option (and current plan) is to synchronize some of the data, then switch to the new database but continue to use the old database in a temporary hybrid mode until the remaining data can be moved. The new database is now being synchronized with everything but finder logs (logs take up the largest percentage of space) and the new application can load logs from both databases and merge the results. There is a definite performance hit to loading the logs this way, bit it’s not much worse than before and it will get better after the migration is complete.

There is still more testing to do so I haven’t scheduled a release date but I wanted to give everyone a heads up. There will also be at least one new feature with this release: support for Cache Attributes! If you’re interested in testing the new site, you can use your existing credentials at http://test.bcaching.com but any uploads or logs may be overwritten by the nightly sync from the main database.

Questions or comments are welcome here or in the forums.


No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: