Unstructured Data is the biggest headache today for any organisation trying to control and manage data. It consumes over 70% of all information stored and is growing at 61% per annum!
How IT departments deal with this information when they are struggling with reduced budgets and headcount remains to be seen.
Hopefully this white paper will give you a greater insight in planning and managing unstructured data.
Firstly, let us understand what we are dealing with. This type of data is the information which is typically not stored in a database.
Unstructured Information manifests itself in two ways:
As we have explained, unstructured information consumes vast amounts of storage, but another consideration is legislation. Where this data resides is important if you need to retrieve the information for a compliance audit or lawsuit.
This type of data is organised and easily accessible such as databases and large search indexes. This type of data is also very fast to retrieve and interrogate for analysis or usage patterns.
Definition
How organisations identify this data is of vital importance to find whether it has an intrinsic value to the business or the next lawsuit waiting to happen. Firstly, we need to identify the types of unstructured information and where it currently resides. From this we can make plans to carry out the following:
We have now identified our unstructured data but what can we do with it? You should have a report informing you of the types of data that resides on your network. The trick is to find out the value of the data and how we can use the data as a business benefit, once we understand this we need to decide where to store it. If the data is valuable you might need to keep it in two different locations on two different type of storage. How we move the unstructured information to some sort of structured resemblance is a big challenge for any organisation.
Assuming we can move the data, we have the logistical issue of deciding the following:
We all know Tier 1 storage is expensive to purchase and maintain. So why do we store inactive data on them? We now have the tools and technology to move this inactive data to more cost-effective storage tiers.
This is the storage tier that normally runs the fastest disk or SSD drives with the emphasis on outright performance. These systems have 99.999% availability and run the company databases, applications and user files. The time to access these files is typically milliseconds.
These systems normally run 7.2k rpm drives that are either SAS or SATA. They have a high disk density raid and the performance is typically about 30% slower than Tier 1. They can provide 99.99% uptime and can have dual RAID controllers. The time to access these files is typically milliseconds.
This tier typically consists of storage data to a tape library or optical jukebox using Blu-ray optical.
The Cloud is also another option for storing data although the cost per GB of around £0.01 or £10 per TB per month makes this expensive. The time to access these files is typically minutes or hours depending on WAN performance and how your cloud provider has stored your archive data.
Depending on the type of data and the required retention period, consideration needs to be made regarding the use of appropriate storage technology.
Companies spend a huge amount of money in purchasing storage and servers. The investment in the solutions is growing year on year. Recent reports indicate that by 2025 we will be purchasing twice as much storage capacity as we are today. These systems are typically retained for 3-5 years and then replaced.
By implementing a tiered data archive containing unstructured information and moving this through the different storage tiers frees up valuable disk space on the most expensive highest performing storage. By moving this data we can slow down the necessary and ongoing investment in purchasing tier 1 storage giving a huge ROI benefit. An additional benefit with active archiving is that you may be able to utilise your existing older storage systems to archive data.
Where we store this unstructured data it is an important consideration when looking at the overall IT budget and available resources in order to explore the business benefits and cost savings.
As mentioned, typically 70% of stored information has not been accessed within 60 days. By moving this data to optical, tape or even a high capacity SATA RAID array will save a considerable amount of energy.
It is a well known fact:
1 WATT CONSUMED = 1 WATT TO COOL (3.412141633 BTU/h)
Many of the RAID storage arrays provide a function called MAID (Massive Array Idle Disks). In effect the RAID powers down the volumes which are not being used or accessed thus saving energy. The largest shipping MAID system we supply is a 4U rack mount model containing 60 or 64 drives depending on host interface.
Clearly the growth of unstructured information is the number one problem facing IT managers today. The issue is how to control and manage this growth in an organised manner that provides a long term business benefit.
Storage arrays are offering higher performance and greater disk densities with 20+ TB drives shipping this year, LTO-9 tape storing 18TB native and Blu-ray now at 125GB.
Organisations cannot afford to be offline for prolonged periods of time. Scheduling downtime can take weeks and if it doesn’t go according to plan can be a huge headache, come Monday morning. The systems we supply are non disruptive so users are unaware data has been moved to a more efficient platform for storing unstructured data and the business benefits with better utilisation of all available storage resources.
The cost-effective solutions we supply can be tailored according to your firm’s requirements and budget constraints. We believe the solutions we have from our vendors will provide the following:
We are to offer comprehensive professional services to understand your current infrastructure and assist with any requirements you have.