Active data archiving is a really simple solution
Data archiving solves many data issues
Active Data Archiving is the process of moving inactive data to a lower cost/performing tier, whilst at the same time maintains access to the archived data. Deploying an Active Archive solution will help a business immediately see the benefits of this technology. Easily locate key business data in a centralised location for eDiscovery, HR or Legal search.
A business today generates vast amounts of data daily, and it needs to act upon this information. The generated data has a value, whether it is a technical drawing or an invoice. Data is normally stored on the fastest performing storage and 9 times out 10 this is where it remains. Gradually over time new storage is bought and added in order to meet the constant flow of data, IT purchases a larger backup solution in order to backup the data
After 90 days that data may never be accessed again, but it is still remains on expensive, high performing storage, why?
Businesses do not have the correct data management systems in place in order to analyse the data to determine what to do with the information or where to store it.
All Data is not Equal
What to keep and what to delete
Performing a backup is hard enough and now archive, why bother backup is good enough. Technically any business generating large amounts of data should have in place data protection including storage, backup and data archiving. Sadly, this isn’t always the case and archiving is normally the last part to be considered and yet it could save a business many thousands of pounds annually.
There are compelling articles from the IDC, Gartner, Forrester etc. informing us that we will producing more data during the next 5 years than any other time in history and storing this information will create problems. If we are to believe these reports two things are going to make things difficult for any IT department. The backups will now need to run 7×24 and the backup time available will disappear, unless we start doing something now.
Active Data Archiving
So, what’s the answer, quite simply we need to get smarter with data management and this involves the creation of an “Active Data Archiving Solution”. Data archiving used to be sold as a method of replacing the file data with a “stub” to an alternate location. This worked for some, but when the storage was replaced, or a system crashed it caused problems when restoring the stubs.
An active archive on the other hand allows users to create policies based on a number of rules these could be for example “all financial data should be retained for 7 years – then delete”. The policy could also include a rule “only archive financial data older than 12 months”. The active archive data can be searched is fully indexable, legal search, eDiscovery and provides results in seconds.
What can you store using Active Data Archiving?
- Video - This can be searched for audio or video sequences
- Scanned documents - Perform full text searches
- Archive - over a 1,000 file types including Microsoft Office, Office 365, Google docs etc.
- Audio - Full search on words or phrases
- Translate - documents from foreign languages
- Email - Scan and identify messages and attachments
Questions to ask to consider when using active data archiving
- Is the data legally required for compliance, legislation or governance?
- Can the data cause the business embarrassment and fines?
- Does the data have a value to the business but can’t be identified?
- Can we archive scanned documents going back decades?
- Ingest Email PST files containing years of conversations.
- How much unstructured data do we have?
- How many copies of the same file do we have?
- On which systems and data storage platforms does the information reside?
- When was it created?
- When it was last accessed?
- What size is the file data?
- Who owns the files?
- When it was last modified?
- Is the data relevant to the business?
- How many copies do we have?
- Do the files need to be archived?
- Should the data be restricted?
- Who is generating this data?
- Is the data ours?
Backup vs Archiving
Firstly, these are two very different technologies although they are mentioned many times that they are the same. It’s a bit like copper and Brass, they are both metals but have completely different use cases.
Backup – Restore lost data to a point in time when the last backup was taken. The backup software may write this data in a proprietary format.
Archive – Immediately restore data that the business deems important to any point in the past. This data is normally read only as it may be used in court and the original file is un-edited.
Active Data Archiving how to store information
Active Data Archiving is going to be used for eDiscovery, GDPR and may be used as evidence in court. Therefore, any data residing in the archive should be in its native file format and retain all the original file information including “created, last modified, accessed etc”.
There is a danger that if this information is compressed or deduplicated to reduce space, the original file format and file information could be lost when it is recreated. Some archiving vendors perform exactly this, and should you want to move your archive later could prove problematic and costly. Store active archive data on local storage or use cloud storage.
Stake Holder Discussions
Any data can be archived; will involve various discussions with the business stake holders to understand how long each department’s data needs to be retained. From these planning meetings create a detailed analysis as per below:
Accounting – Keep records of financial information for 7 years. This could be backup copies of your accounting package, payroll, bank statements, document scans, email, created by, modified by, sent by. This is where businesses tend to fall into a trap, they want to keep all the information relating to accounting including weekly, monthly and yearly backups. In an archive it needs to be treated differently as explained below:
Example: Active Data Archiving Rules
1 | Accounting Data – Month 1 | Retain for 12 months – then delete |
2 | Accounting Data – Month 2 | Retain for 11 months – then delete |
3 | Accounting Data – Month 3 | Retain for 10 months – then delete |
4 | Accounting Data – Month 12 | Retain for 1 month – then delete |
5 | Accounting Data – Year End | Retain for 7 years – then delete |
After 1 year you would have 13 archive copies of your data. Depending on how you created your archive you could simplify it even more, if your accounting data was rolled up into the following month.
1 | Accounting Data – Month 1 | Retain for 1 month – then delete |
2 | Accounting Data – Month 2 | Retain for 1 month – then delete |
3 | Accounting Data – Month 3 | Retain for 1 month – then delete |
4 | Accounting Data – Month 12 | Retain for 1 month – then delete |
5 | Accounting Data – Year End | Retain for 7 years – then delete |
After 1 year you would have 2 archive copies of your data.
Remember this isn’t a backup, it’s an archive of information that should only have “READ” access. By following the above the business should have access to any financial data created over 7 years and find out the transactions that occurred on any given day.
This type of approach needs to be done with every department including marketing, legal, manufacturing, sales, HR, R&D etc. Once you have written up the requirements for the departments, get the stake holders to sign off on the data retention times for the archived information and review this every quarter.
Why use active data archiving?
It isn’t so much about why the need to archive, it is more about understanding the types of information you are archiving and then turning that data into something useful using “Data Analytics”. Many businesses have 100’s TB’s of data that is unstructured. Unstructured data is everything that isn’t stored in a database, because this data is random information it isn’t easy to monitor, move or control and yet it has a value to the business or does it?
Due to the sheer number of files types that create unstructured data it is difficult for businesses to make informed decisions on what to do with it. Any legal case requires many hours of searching through files, scans, email, voice, CCTV etc in order to put a case together ready to go to court. Lawyers charge upwards of £250 per hour to carry out these searches in order to find specific information relating to a case.
If the business was using active data archiving it would find all the information relevant to a case in minutes and save the business thousands of pounds in legal fees. This is one example that would save business huge sums of money by deploying an active archive to create a structure of unstructured data.
Decreasing backup window
As businesses generate more data, protecting this data takes longer, eventually the amount of available time to perform a backup of the data will disappear. If by deploying an active data archiving solution you reduce your backups by 80% this has to be a good thing from an OPEX perspective i.e.
- Sweat assets for longer.
- Reduced licensing and running costs of backup hardware and software.
- Reduced hardware maintenance due to less “wear n tear” on equipment.
- Faster restore of systems and data in the event of a failure.
- Reduced backup complexity.
- Provides the business with greater data insights.
Written by: Ray Quattromini – 18/12/19