Today Amazon announced the launch of Amazon Glacier, which is a low-cost data archiving service designed for rarely accessed data.
As Werner Vogels described it in his blog post this morning:
Amazon Glacier provides the same high durability guarantee as Amazon S3 but relaxes the access times to a few hours. This is the right service for customers who have archival data that requires highly reliable storage but for which immediate access is not needed.
At first sight, Amazon Glacier looks to be a fantastic fit for archiving the raw event logs generated by the Snowplow collector (whether the CloudFront collector or alternatives such as SnowCannon). Once the nightly Snowplow ETL has been run on your raw event logs, you shouldn’t need to access those raw logs frequently. However, we would always recommend retaining them, as there may well be a reason to revisit them in the future. We never recommend throwing away atomic source data!
This is where Amazon Glacier comes in - at the proposed pricing levels for Glacier, you could archive 2 terabytes of raw Snowplow data for around $20 a month; this would be significantly cheaper than storing your raw logs in Amazon S3, which is the current Snowplow approach.
Moreover, Werner has indicated that:
In the coming months, Amazon S3 will introduce an option that will allow customers to seamlessly move data between Amazon S3 and Amazon Glacier based on data lifecycle policies.
Once Amazon has launched this feature, we’ll get this automatic S3->Glacier archiving process working internally, and then release a howto for Snowplow users so you can do the same, and start running your Snowplow over Amazon Glacier!
Exciting times for everybody who likes storing atomic event data cheaply and safely - stay tuned!