Linking to files on AWS S3
Linking to files on S3 from RSpace
This document explains how it's possible to configure RSpace so that your users can create links to files and objects in one or more S3 buckets. This could be useful if your organisation holds large, private research datasets on S3.
This feature relies on an AWS service called an AWS Storage Gateway. The storage gateway can be deployed as an EC2 instance (see https://docs.aws.amazon.com/storagegateway/latest/userguide/ec2-gateway-file.html). It acts as a bridge between a Samba or NFS filesystem clients and S3.
This enables RSpace to link to files and objects in S3 using the same functionality used to link to regular Samba file systems.
After setting up the Gateway, file-shares can be set up, and the configuration information used to connect RSpace to the FileShares. Each Gateway can support up to 10 S3 buckets.
Prerequisites
- VPC(Virtual Public Cloud) set up along with VPC endpoints for both storage gateway and S3
- IAM roles, one allowing access to storage gateway, one for S3 with required regions, and optionally one for the S3 bucket you wish to create the file share on.
How could this be used?
There are several use-cases this could support:
- Customer already has an AWS FileGateway backed by one or more S3 buckets. In this case, RSpace could connect to file-shares that exist already.
- Customer has data in one more S3 buckets. A FileGateway and FileShares could be created to enable RSpace to connect to these buckets.
Here is a diagram showing the various components.
What are likely problems or challenges?
Permissions and access control would depend on who has access to the S3 bucket and the file-shares, and how that relates to permissions and identities in RSpace. There are many possible scenarios.
Related documents
Storing RSpace archives on S3 describes a simpler scenario, where RSpace XML and HTML archives can be stored directly on S3.