Using an AWS Storage Gateway – Keep in mind a couple of things

Just though I’d share a quick note about my experience using AWS Storage Gateway.

After troubleshooting multiple issues on a Windows 2012 R2 file server with Storage Gateway ISCSI presented volumes there were some learning both myself and some AWS contacts had.

So if your using or looking at deploying an AWS Storage Gateway (in “Gateway-Cached” mode) for cheap storage in S3 its a great idea and works well, just be mindful of a few things before implementing it. One of the common use cases posted by Amazon is as a corporate File Sharing (AWS Storage Gateway Overview). After troubleshooting a Windows 2012 R2 Fileserver with

  1. Make sure you configure your AV policy only to perform real time virus scan. (Managing Volumes for Gateway-Cached Volumes) Note: The article states “we don’t recommend using virus scanning software that scans the entire cached volume. Such a scan, whether on-demand or scheduled, will cause all data stored in Amazon S3 to be downloaded locally for scanning, which results in high bandwidth usage.
  2. Customize the Windows ISCSI timeout to 600 seconds (this makes Windows ISCSI connector more tolerant to network disruptions. (How to can be found here: Customizing Your Windows iSCSI Settings and Customizing Your Linux iSCSI Settings)
  3. Windows Volume Shadow Copy (VSS) is not supported. VSS will work initially but once your data your snapshoting via VSS gets to double the cache size you will start to see VSS failing and or not work at all. This was an interesting point as even a number of AWS professionals had expected that this should work. (an AWS case was logged confirming that it wasn’t supported)
  4. Windows data de-dupe can work but requires a chunk of the servers resources to work. You might be able to get some significant drive space savings enabling it but as the Storage Gateway is S3 backed your paying peanuts for it anyway so unless you have a decent business case I would leave this off.
  5. Be aware what happens to the Storage Gateway when it is restarted and how it behaves and the drive status (Understanding Volume Status)
  6. Patching/Updates to Storage Gateway. Updates generally don’t cause any outages its more of a service degradation. This is due to how the ISCSI caches requests while the update is in progress or the Gateway is rebooting (another reason why you should increase the ISCSI timeout) see Managing Gateway Updates Using the AWS Storage Gateway Console
  7. Sizing of the cache and upload buffer. Start with what you believe is the minimum volume sizes, reson being is that when you add volumes to the cache or upload buffer you cant (at this time remove them). So if you have over provisioned you would need to rebuild the Storage Gateway and migrate data in order to shrink the volumes which is just a messy process.
  8. Limits to be aware of (AWS Storage Gateway Limits)
  9. And finally monitoring. Keep a close eye on the gateway metrics as performance degradation can be in different areas. Once to be aware of is the cache hit ratio of the volumes. You want to see around 80% ideally as this seems to be the sweet spot. If your seeing a low value this means that data is having to be downloaded to cache from S3 so you might need to add more space to your cache. (cache usage should also be hovering at 99.9%) so read and understans the metrics to make decisions about possible bottlenecks. See Monitoring Your Gateway

Hope these tips help you out!