How does RSpace store files? Are there configurable options?
There are four ways that RSpace can store and link to files. This document provides some background information into the ways RSpace can work with new or existing data in various locations.
1. RSpace Internal file store
By default, RSpace stores all uploaded files, unaltered, on a file-system outside the RSpace database. We store references to the files in the database, and text-files (including Word, PDF etc) are indexed for full-text search from the ELN.
The file-system and directory structure is opaque to end users; they have no direct access to it, because it is on the RSpace server. From the user's perspective, files added to RSpace reside in their own personal storage area called the Gallery. The metadata about permissions, whose file it is, etc, are stored in the database. This has several advantages compared to storing the files direct as BLOBs in the database, for example:
- The files can all be backed up, copied, audited or retrieved using standard Unix commands.
- The database remains a manageable size for backup and recovery. … at the cost of some additional complexity (hidden in the internals of RSpace) to coordinate file locations and database metadata.
The internal file store works well for files that can be conveniently uploaded through the web interface or API - Office documents, PDFs, most images etc. The maximum allowable size for the files stored in RSpace is configurable on a server-by-server basis using a setting in the server properties file. You can for example set this to something quite small (e.g. 250 kb if you want to encourage users to keep most of their files somewhere else, or you can set it to something larger (e.g. 1 GB) if you want users to enjoy the convenience of drag-and-drop addition of larger files into RSpace. We generally don't recomend setting the max file size to a value higher than about 2 GB. For files larger than taht we would generally recommend external storage (see below).
2. RSpace External File Systems
It may be the case that researchers already have large volumes of data on existing institutional file servers, or the files are too large to conveniently upload through a web interface. RSpace can connect to these file systems through Samba (SMB) or SFTP protocols.
Normally this is used read-only to make links to datasets from the ELN, however in the future we plan to support file writing as well. This solution is good for linking to large files that you do not want to physically copy to RSpace; but if for SMB / SFTP file-systems that are also accessible by users from outside RSpace it can be a challenge to deal with broken links/ deleted files etc.
Once solution to the problem of lbroken links to externana files is to make use of RSpace's ability to link to files in your University's iRODs middleware layer.
RSpace can be configured to link to files in an unlimited number of External File Systems, and these can work alongside the RSpace File Store as well.
Linking to files in your existing Amazon S3 buckets is also an option.
3. Linking to Cloud File App files (Dropbox, GoogleDrive, Box, OneDrive, Owncloud)
RSpace can link to files in existing popular cloud-based file stores. In this way, disparate resources held across multiple environments can be brought together with the lab's research documents in RSpace.
4. RSpace Remote File Store
We recently completed a substantial refactoring of the RSpace FileStore such that it can store files in a remote back end, using Egnyte https://www.egnyte.com as a proof-of-principle. Our plan is to enable RSpace to be able to work with a variety of file backends - including object stores such as S3, Minio or Azure Blob Storage.