The Blackboard Xythos web file server application that manages content files, other than legacy storage, abstracts metadata, (information about the file such as the name, path, permissions, etc) from the data of the file itself.
In other words, if 500 people upload 500 copies to 500 courses of the same ½ megabyte clip-art file, the expected total disk consumption is still ½ megabyte, not 250 megabytes. Xythos 'hashes' the file's data and realizes it is already stored. The new data is discarded and Xythos creates additional metadata records that point to the existing data. Therefore, no file space will ever be recovered until every last "pointer" to the data itself is removed. In the clip-art scenario above, it is necessary to delete 500 courses to recover the space the clip art occupies.
What this means in practice, is that most disk savings have already taken place and it is common for 'surprisingly' little space to be recovered even when large amounts of courses are deleted. The Xythos application pre-optimized the utilization and has already saved those "costs" from the very beginning. Sites never "incurred" the storage that was saved by deduplication, therefore inversely there is not "future savings" left on the vine.
Xythos keeps count of how many "files" refer to a given piece of data, represented as a physical back-end storage file. Once the 'count' of references to a given piece of data drops to zero, the data is marked as "eligible" for deletion. Eventually, it will be cleaned up. A subsystem called the XythosStorageThread examines the system periodically. Among its functions is to clean up back-end physical storage data files that are "eligible for deletion" and have been so for longer than the deferral interval. These are then deleted permanently. On Managed and Self-Hosting the deferred deletion interval is 45 days. On SaaS it was initially 45 days but changed to 32 days in September, 2020.
If someone adds a new copy of the file, that is byte-for-byte identical to data that is in the deferred delete status, it will immediately be taken out of this state and the reference count will increase by one.
On SaaS the files in the deferred deletion status reflect as "Deleted Files" in the SaaS Insite Report.
A Practical Example
The majority of courses will be copied from other courses, as most courses are offered more than once. Accordingly, if ENG-101 is copied from a template in the spring, then copied from spring to summer, from summer to fall to intersession and back to spring anew in the next year, all those files that are part of the main content of the course may incur no further storage use because they are byte-for-byte identical.
New database records are created to indicate the paths, permissions, access and so-forth, but the actual physical back-end storage files for their data, which present the vast majority of the 'utilization' of space, is not stored again and again. The only files that are going to incur additional storage are new files (never-before existing), such as new materials that the instructors upload, students' submissions to assessments, attachments to discussion board postings and things like this.
Limitation of the GUI Report
It is not possible to take any of these effects into account from the disk GUI report (course size report). Therefore, common questions such as "how large is this course?" and "how many megabytes of space does this term take up?" cannot be answered easily. Course size is not a well-defined feature of Learn. The GUI report will show deletions immediately, and will count data stored-in-fact-but-once as many times as there are records in that course. In other words, the GUI report thinks that the clipart from the first example is worth 250 megabytes.
The output of commands such as du for Self-Hosting and the Managed-Hosting Insight report will always diverge from the GUI Disk report because it takes into account the deduplication and deferred delete interval. The UNIX system always shows the actual, physical storage without complications. There is no good theoretical way to present a graph or other indication that shows the deduplication factor, so it is best to consider both reports.
Exception - "Legacy Storage"
Legacy storage is as it is named and predates the acquisition and incorporation of the Xythos company and product into Learn. Legacy storage may consist of older course messages and course message attachments in courses created prior to Learn 3300.0.0, as well as SCORMs from prior to Learn 3500.3.0. There can also be very old course file attachments. However, in a recently-created course in a modern site, there will be little or no legacy storage. A few older content types such as Self & Peer Assessments, and Glossaries still create files, but these do not commonly increase to significant size.
Prior to Learn SaaS 3800.6.0, when a course was deleted, the 'Protected' (Internal) files described below were copied into the Legacy recycle bin.
Legacy SCORMs and Course message attachments are a common source of large consumption. Migrating course messages can reduce this utilization but is rather intensive. Because Legacy storage does not deduplicate at all, and is not eligible for deferred deletion, targeting courses which have a lot of legacy storage is an effective strategy for reducing overall storage consumption.
What are 'Protected' (Internal) Files?
In addition to course archive and export files, there are some other attachment types that are stored in the "/internal" folder. The disk GUI report refers to this as the 'protected' storage. Internal storage is mostly student submissions, attachments to discussion boards and modern (post 3300.0.0) course messages as well as some other miscellaneous files that are usually unique to a specific course. From Learn 3500.3.0 and later, SCORM packages began to be stored here.
To reduce the consumption of internal storage, please delete old Archive and Export files from each course, and check for practices such as 'icebreaker' assignments that involve recording a video of one self speaking for a few minutes. With HD video commonly available on phones now, these submissions and discussion board posts can incur significant storage cost. Upload those kinds of videos to Youtube, Kaltura, Panopto, or similar services.