Storing RSpace archives on S3

The following information is only applicable to an Enterprise instance of RSpace.

This page explains how archive exports (HTML or XML) exports can be sent and stored to an AWS S3 bucket.

Background

Previously, when a user made an export of their work, it was stored on the RSpace server for a short time (usually 24 or 48 hours) and then deleted. The user received a link which enabled them to download the export to their own computer. This worked well for exports made for personal use, but is cumbersome if the purpose of the export is for long-term backup, or is a very large file.

This new feature enables RSpace exports to be optionally sent to an S3 bucket.

Export to S3

In RSpace 1.69.45, archive exports can be configured to be sent to an |Amazon S3 bucket. The advantages are:

  • S3's unlimited storage space and low cost mean that archives can be kept for much longer - potentially forever - and the lifecycle period is controlled by the bucket's owner, not RSpace.
  • The archive files can be accessed directly from S3, as well as from an RSpace-generated link. This expands the potential for downstream processes to work with RSpace archive files.
  • More efficient transport of large archives. Previously, to put an archive in an S3 bucket, you had to download the export to your computer, then upload to S3.
  • Users can access their exports exactly as they have always done, via the RSpace-generated link.

Current limitations

  • All archives go to a single bucket. This feature doesn't yet support user-specified buckets.
  • It's not yet possible to send a single archive file to multiple buckets at once.

Setting up S3 export

1. AWS credentials

In order for RSpace server to be able to send exports to S3, the server needs permissions to access the S3 bucket. This can be accomplished by various AWS IAM mechanisms such as:

  • An IAM role assigned to the instance (if RSpace is running on an AWS EC2 instance)
  • Access credentials for an IAM identity, which are used by RSpace. This should be as restrictive a policy as possible, e.g. to the specific bucket. Here is an example minimum IAM policy:
       "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "VisualEditor0",
    "Effect": "Allow",
    "Action": [
    "s3:PutObject",
    "s3:GetObject",
    "s3:CreateBucket",
    "s3:ListBucket",
    "s3:HeadBucket",
    "s3:PutObjectAcl"
    ],
    "Resource": [
    "arn:aws:s3:::MyExampleBucket"
    ],
    }
    ]
    }
2. Configuring RSpace

Set these properties in /etc/rspace/deployment.properties

aws.s3.hasS3Access=true

changing these values as appropriate

aws.s3.bucketName=test-my-export-bucket
aws.s3.archivePath=export-archives
aws.s3.region=eu-west-2

aws.s3.archivePath is set as a bucket prefix. (For RSpace SaaS customers, this prefix should always be `export-archives`

3. Testing

Restart RSpace, and make an XML or HTML export. Once completed, it should appear in your S3 bucket.

Linking to files on AWS S3 describes a more complex scenario, where RSpace can link to files on S3 via AWS FileGateway.


How did we do?


Powered by HelpDocs (opens in a new tab)