Storing RSpace archives on S3
This page explains how to configure RSpace so that archive exports (HTML or XML) are sent to an AWS S3 bucket, rather than stored on the RSpace server.
Background
With a standard setup, when a user makes an export of their work, the generated archive is stored on the RSpace server for a short time (usually 24 or 48 hours), and then deleted. The user receives a link which allows them to download the exported content to their own computer. This works well for exports made for personal use, but is cumbersome if the purpose of the export is long-term backup, or if generated archive files are very large.
This article describes a feature that sends RSpace archive exports to an S3 bucket, rather than storing them locally. The advantages are:
- S3's unlimited storage space and low cost mean that archives can be kept for much longer - potentially forever - and the lifecycle period is controlled by the bucket's owner, not RSpace.
- The archive files can be accessed directly from S3, as well as from an RSpace-generated link. This expands the potential for downstream processes to work with RSpace archive files.
- More efficient transport of large archives. Previously, to put an archive in an S3 bucket, you had to download the export to your computer, then upload to S3.
- Users can access their exports exactly as they have always done, via the RSpace-generated link.
Current limitations
- All archives go to a single bucket. This feature doesn't yet support user-specified buckets.
- It's not yet possible to send a single archive file to multiple buckets at once.
Setting up S3 export
1. AWS credentials & authentication
In order for RSpace server to be able to send exports to S3, the server (specifically S3 JDK that runs as a part of RSpace) needs permissions to access the S3 bucket. This can be accomplished by various AWS IAM mechanisms such as:
- An IAM role assigned to the instance (if RSpace is running on an AWS EC2 instance)
- Configuring access to an IAM identity from RSpace server, accessible by account running RSpace (tomcat) process. AWS documentation lists options ranging from SSO scenarios, to standalone access through config/credentials files. IAM Identity used for sending exports to buckets should have as restrictive policy as possible, e.g. to the specific bucket. Here is an example minimum IAM policy:
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:CreateBucket",
"s3:ListBucket",
"s3:HeadBucket",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::MyExampleBucket"
],
}
]
}
2. Configuring RSpace
Use the following deployment properties for enabling and configuring S3 access details:
aws.s3.hasS3Access=true
aws.s3.bucketName=test-my-export-bucket
## for RSpace SaaS customers, archivePath should always be `export-archives`
aws.s3.archivePath=export-archives
aws.s3.region=eu-west-2
3. Testing
Restart RSpace, and make an XML or HTML export. Once completed, it should appear in your S3 bucket.
Related documents
Linking to files on AWS S3 describes a different scenario, where RSpace users can link to files on S3 via AWS Storage Gateway.