Web server power failure

I don’t know all the details at the moment, but apparently there was a power failure last night around 10 PM EST. The server restarted but not cleanly and needed some manual intervention to get things running correctly again.

Alert emails were sent last night, but I did not check my email until this morning.

Sorry for any inconvenience this may have caused.

Release 0.8.1

0.8.1 is a minor release with a few enhancements.

You can now read your autogpx email messages online, and GPS management has been improved on the mobile map view and the mobile location page.

See the forum post for more details:

Happy July 4th Weekend

We’re on our way to GeoWoodstock IX. Maybe we’ll see a few of you there!

M & W

Site maintenance

Some of you may have noticed bcaching was out of commission for a lengthy period from Thursday night until Friday morning. This was in order to complete as much of the database reorganization (a.k.a. phase III) as possible.

When cache logs were migrated to the new database (mongodb) using the same layout as before (a single list of logs by cache log id with a secondary index on cache id), it caused similar slowdowns as before (on mysql). It’s just too much data and the indexes are too large for the limited resources we have available.

The reorg – to a list of cache documents, each containing all of its related logs – was going painfully slow for about a week and a half when I finally decided to shut the site down for an extended period and use all the server resources to try to just get it done.  It was going reasonably well (maybe 85% complete) until around 5:30 in the morning when mongo decided it needed to allocate more space in the filesystem, even though there was more than adequate space already freed up from dropping the old cache logs table. The problem was likely due to fragmentation within the database files. A “repairdatabase” would have solved it by defragmenting and really freeing up the unused space, but like mysql, mongodb (1.8) requires free disk space to create a new clean copy of the database before it deletes the old one. I didn’t have the space. Luckily mongodb has excellent backup and restore functions that allows backups to be done to and from a separate server so that’s what I did. Incidentally, the next release (2.0) will support an in-place repairdatabase function.

So after the database was restored, I brought the system back up with partially migrated cache logs. You may encounter a few caches that have no logs, but don’t be alarmed. They will reappear over the next few days.

Update 6/19/2011: The cache logs migration is complete. Finally!

MongoDB data conversion phase II complete

The data conversion from MySQL to MongoDB is finally done. Phase I was to move everything except cache logs and was completed as part of the upgrade to 0.8. Phase II was to move the cache logs out of the old database and into the new database.

Since there is not enough disk space to support two full copies of the logs (for 32 million logs, 8-11 GB per copy depending on indexes) I planned to free up disk space out of the old database periodically. That turned out to be impossible since it was too expensive to delete logs from MySQL and freeing up space requires making a new copy of the data before the old one is deleted… did I mention I don’t have enough disk space?

Instead I migrated about a third of the cache logs (about 10 million) then dumped the rest to an off-line archive. Then I took the MySQL DB offline, freed up the space, and started loading the archived logs into MongoDB.

Now for something a little more technical…

In the MySQL database, cache logs were stored in a single table with the cache log ID as the primary key and a secondary index on cache ID + log time. That structure requires index storage in the 2+ GB size. That’s a lot of index data to traverse, especially when looking up logs for many caches at once (as would be the case for the synchronization API). With limited memory resources it also means that very little of the index can be cached in memory so MOST cache log lookups will require quite a lot of disk seeks.

But, since cache logs are always accessed with the cache ID available, a much more efficient structure would be to group all cache logs in a “table” organized by cache ID. For the current 1.5 million cache records, the index storage would be less than 200 MB.

Unlike MySQL, MongoDB allows for unstructured data storage and it would be easy to do this as a table of cache documents (one for each cache ID), then each document would contain an array of all the related cache logs. For cache log retrieval, it would require a single index lookup per cache, followed by (approximately) one disk seek to retrieve ALL cache logs. The downside to this structure is that when adding additional logs to a cache document, the document will grow and may have to be moved (on disk) if there is not enough free space in the document’s current location. Even so, this structure has a strong enough advantage with the significantly reduced index size and will free up memory resources for other indexes and data.

That WAS my original plan, but it was taking so long to extract the logs from MySQL by cache ID I ended up migrating the data using the SAME structure, organized by cache LOG ID. As a result, the performance under the new database is only marginally better than before.

So now there is a “Phase III” where the old structure will be converted and migrated to the new structure.

Stay tuned.

Release 0.8

It’s been a while since there was a major bcaching release. There are two new features:

  • Add support for geocaching Cache Attributes. Note that you must set your GPX version to 1.0.1 or later for cache attributes to be included. You can set the preference on your geocaching.com account details page. Cache attributes are displayed on the mobile cache details page.
  • Add support for Metric distance units. You can set your preference on your bcaching profile page or in the mobile options page.

The most significant change is a migration of the database from MySql to MongoDB. It may not sound like much, but it required a major rewrite of a lot of behind-the-scenes logic and some reorganization of the data model.

Due to limited server resources (disk space), it was not possible to migrate all the finder logs at once so logs are still being retrieved from MySql for now and they will be migrated gradually over the next week or two. During that period the synchronization process with Geobeagle/Geohunter/OpenGPX may be a little slower since logs are being retrieved from both databases, but performance will be better overall once the migration is complete.

There were a lot of changes and the risk of bugs and issues is high so please be on the lookout for any problems and report them on the forums.

What’s new database?

Every month bcaching grows a little bigger with more users and more data. The database has grown to over 12 GB in order to accommodate 1.5 million geocache records and over 32 million finder logs. One of the persistent problems has been how to keep that geocache data up-to-date as efficiently as possible and working fast enough on reasonably-priced hosting services (currently a single medium-sized Windows based VPS).

Over 700 GPX files are processed every day. Over the past 24 hours, 890 GPX files were processed containing 388,891 geocaches, 1,902,996 finder logs, 106,832 waypoints, and 85,612 travel bugs. It took 17 hours and 41 minutes of processing time to read and load those files. That’s 2,341 data objects per minute and it’s too slow even after several performance improvements over the years. It also takes resources away from serving web and api requests, even though GPX processing is run at a lower priority than other work.

For the past couple of months I have spent some time investigating, testing and implementing a completely new back-end database using MongoDB to replace MySql. Mongo is fast. It sacrifices features that could slow it down and requires the application to take on responsibility for more functions but it provides excellent performance in return.

The jump from a traditional SQL database to a schema-less database required a complete rewrite of all the data access logic but it simplified some of the logic as well (especially the GPX file processing). Some of the data model also had to be reorganized to best take advantage of certain mongo features.

One of the remaining problems is how to migrate from the old database to the new one. Normally a release includes only minor database changes and can be completed in a couple hours or less but a full database migration would take the better part of a day and I’m not willing to shut down the site for that long. Another approach would be to synchronize the two databases while the site is live, then shut down the site only long enough for a final synchronization before switching over. That is not an option either because there is not enough disk space to support two full databases at the same time.

The remaining option (and current plan) is to synchronize some of the data, then switch to the new database but continue to use the old database in a temporary hybrid mode until the remaining data can be moved. The new database is now being synchronized with everything but finder logs (logs take up the largest percentage of space) and the new application can load logs from both databases and merge the results. There is a definite performance hit to loading the logs this way, bit it’s not much worse than before and it will get better after the migration is complete.

There is still more testing to do so I haven’t scheduled a release date but I wanted to give everyone a heads up. There will also be at least one new feature with this release: support for Cache Attributes! If you’re interested in testing the new site, you can use your existing credentials at http://test.bcaching.com but any uploads or logs may be overwritten by the nightly sync from the main database.

Questions or comments are welcome here or in the forums.

Site semi-outage

Some of you may have had problems getting to bcaching this afternoon and evening. It appears to have been related to a of DDoS (Distribute Denial of Service) attack on our name-server provider.

Until now we have had two free name servers, but this evening we purchased a 3rd backup server that should keep things a little more reliable.

Forum Logins

After putting up with an endless barrage of spam user registrations I finally took the time to integrate the forum authentication / users with the main bcaching.com user database.

It is still necessary to login to the forums separately, but you must now use your main bcaching.com login credentials. Your old forums login credentials if they were different will NO LONGER WORK.

In order to login and POST on the forums, your main bcaching.com account must be fully registered — including having been synchronized with your geocaching.com account. If you’re having problems with that, try reading the 2. Getting Started section of the User Guide forum, or send an email to bcaching support for help.

Site Upgrade Update

The migration of the main bcaching database from the MySQL 5.1 engine to 5.5 was completed this evening and was fairly uneventful (a good thing).