What is Storage Architecture | Storage Architecture Tips

What Is Storage Architecture? | May 20^th, 2020

The storage architecture of your system is a critical component of data transfer and accessing vital information. It provides the foundation for data access across an enterprise. Depending on your operations and the needs of your business, specific storage architectures might be necessary to enable employees to work to their fullest potential.

So what is IT storage architecture and how does it play into the everyday tasks you need to get done? To help you understand storage optimization, we’ve outlined the details of storage architecture and what you need to know to make informed decisions about the design and maintenance of one of the most critical components of your enterprise.

What Is Network Storage Architecture?

Network storage architecture refers to the physical and conceptual organization of a network that enables data transfer between storage devices and servers. It provides the backend for most enterprise-level operations and allows users to get what they need.

The setup of a storage architecture can dictate what aspects get prioritized, such as cost, speed, scalability or security. Since different businesses have different needs, what goes into IT storage architecture can be a big factor in the success and ease-of-use of everyday operations.

The two primary types of storage systems offer similar functions but vary widely in execution. These storage types include network-attached storage (NAS) and a storage area network (SAN).

1. Network-Attached Storage (NAS)

A NAS system connects a computer with a network to deliver file-based data to other devices. The files are usually held on several storage drives arranged in a redundant array of independent disks (RAID), which helps to improve performance and data security. This user-friendly approach appears as a network-mounted volume. Security, administration and access are relatively easy to control.

NAS is popular for smaller operations, as it allows for local and remote filesharing, data redundancy, around-the-clock access and easy upgrading. Plus, it isn’t very expensive and is quite flexible. The downside to NAS is that server upgrades may be necessary to keep up with growing demand. It can also struggle with latency for large files. For small file sizes, it wouldn’t likely be noticeable, but if you work with large files like videos, this latency can interrupt many processes and significantly slow you down.

2. Storage Area Network (SAN)

SAN creates a storage system that works with consolidated, block data. It bypasses many of the restrictions caused by TCP/IP protocols and congestion on the local area network, giving it higher access speed than a NAS system. Part of the reason for this improvement in speed involves the way files are served. NAS uses Ethernet to access the files, which are then served over an incredibly high-speed fiber channel, allowing for fast access. NAS improves accessibility and appears to users like external hard drives.

Due to its complexity, SAN is often reserved for big businesses that have the capital and the IT department to manage it. For businesses with high-demand files like video, the low latency and high speeds of SAN are a significant benefit. It also fairly distributes and prioritizes bandwidth throughout the network, great for businesses with high-speed traffic like e-commerce websites. Other bonuses of SAN include expandability and block-level access to files. The biggest downside to SAN is its cost and challenges for upkeep, hence why it typically is used by large corporations.

Configurations

Within these storage systems, you can find a wide variety of setups. Different structures can influence the performance of any given storage system. The components of these setups include:

The front end interface: Usually connected to the access layer of the server infrastructure, this interface is what allows users to interact with the data.
Master nodes: A master node is the one that communicates with the compute nodes using information from outside the system. It manages the compute nodes and takes care of monitoring resources and node states. Often, these are housed in a more powerful server than the compute nodes.
Compute nodes: A compute node helps to run a wide variety of operations like calculations, file manipulation and rendering.
A consistent file system: With a parallel file system shared across the server cluster, compute nodes can access file types easily and offer better performance.
A high-speed fabric: Creating communication between nodes requires a fabric that offers low latency and high bandwidth. Gigabit Ethernet and Infiniband technologies are the primary options.

Below are some of the styles of architecture you may find.

1. Multi-Tiered Model

With a multi-tiered data center, HTTP-based applications make good use of separate tiers for web, application and database servers. It allows for distinct separation between the tiers, which improves security and redundancy. Security-wise, if one tier is compromised, the others are generally safe with the help of firewalls between them. As for redundancy, if one server goes down or needs maintenance, other servers in the same tier can keep things moving.

2. Clustered Architecture

In a clustered system, data stays behind a single compute node. They don’t share memory between them. The input-output (I/O) path is short and direct, and the system’s interconnect has exceedingly low latency. This simple approach is actually the one that touts the most features because of how easy it is to add on data services.

One approach to the clustered architecture model is to layer “federation models” on top of them to scale it out somewhat. This bounces the I/O around until it reaches the node that contains the data. These federated layers require additional code to redirect data, which adds latency to the entire process.

3. Tightly-Coupled Architectures

These architectures distribute data between multiple nodes, running in parallel, and use a grid of multiple high-availability controllers. They have a significant amount of inter-node communication and work with several types of operations, but the master node organizes input processing. These systems were originally designed to make I/O paths symmetric throughout the nodes and limit how much drive failure can unbalance I/O operations.

With a more complex design, a tightly-coupled architecture requires much more code. This aspect limits the availability of data services, making them rarer in the core code stack. However, the more tightly coupled a storage architecture is, the better it can predictably provide low latency. Since tight coupling improves performance, it can be difficult to add nodes and scale up, which inevitably adds complexity to the entire system and opens you up to bugs.

4. Loosely Coupled Architectures

This type of system does not share memory between nodes. The data is distributed among them with a significant amount of inter-node communication on writes, which can make it expensive to run when you look at cycles. The data transmitted is transactional. Sometimes, low latency gets hidden in write locations that are themselves low-latency, like SSDs or NVRAM, but there is still going to be more movement in a loosely-coupled architecture, creating extra I/Os.

Similar to the tightly-coupled architecture, this one can also follow a “federation” pattern and scale out. Usually, it entails grouping nodes into subgroups with special nodes called mappers.

This architecture is relatively simple to use and good for distributed reads where data can be in multiple places. Since the data is in more than one spot, multiple nodes can hold it and speed up access. This factor makes this architecture particularly suited for server and storage software as well as hyper-convergence on transactional workloads.

Just as each node doesn’t share memory, they also don’t share code, which stands separate from other nodes. This design has a few effects. If the data is heavily distributed on writes, you’ll see higher latency and less efficiency in I/O operations per second (IOPS). If you have less distribution, you might get lower latency, but you won’t see as much parallelism on reading as you would otherwise. Finally, the loosely coupled architecture can offer all three options — low write latency, high parallelism and high scaling — if the data is sub-stratified and you don’t write a large number of copies.

5. Distributed Architectures

While it may look similar to a loosely coupled architecture, this approach works with non-transactional data. It does not share memory between the nodes, and data is distributed across them. The data gets chunked up on one node and occasionally distributed as a measure of security. This type of system uses object and non-POSIX filesystems.

This type of architecture is less common than many others but used by extremely large enterprises, as it works easily with petabytes of storage. Its parallel processing model and speed make it a great fit for search engines. It is incredibly scalable due to its chunking methods and its independence from transactional data. Due to its simplicity, a distributed, non-shared architecture is usually software-only and lacks any dependency on hardware.

What Are the Elements of Storage Architecture?

Designing a storage architecture is often a balance of different features. Improve one aspect, and you may worsen another. You’ll have to identify what features are most critical for your type of work and how you can most effectively get the most out of them. You’ll also need to balance the cost and the needs of the organization. Here are some of the most prevalent aspects of developing storage architecture.

1. Data Pattern

Depending on the type of work you do, you may have a random or sequential pattern of I/O requests. Which type of pattern you work with most will affect the way that the components of the disk physically reach the area that contains the data.

Random: In a random pattern, the data is written and read at various locations on the disk platter, which can influence the effectiveness of a RAID system. The controller cache uses patterns to predict the data blocks it will need to access next for reading or writing. If the data is random, there is no pattern for it to work from. Another issue with a random pattern is the increase in seek time. With data spread out across data blocks, the disk head needs to move each time a piece of information is requested. The arm and disk head physically have to move there, which can add to the seek time and impact performance.
Sequential: The sequential pattern works, as you would imagine, in an ordered fashion. It is more structured and provides predictable data access. With this kind of layout, the RAID controller can more accurately guess which data blocks will need to be accessed next and cache that information. It boosts performance and keeps the arm from moving around as much. These sequential applications are usually built with throughput in mind. You’ll see sequential patterns with large filetypes, like video and backups, where they are written to the drive in continuous blocks.

In random workloads, the performance of the disk has to do with the spin speed and time it takes to access the data. As the disk moves faster, it offers more IOPS. In sequential operations, all three major disk types — SATA, SAS and SSD — offer similar performance levels. In general, though, sequential patterns often occur with large or streaming media files, which are best suited to SATA drives. Random patterns happen with small files or inconsistent storage requests, like those on virtual desktops. SAS and SSD are usually the best options for random patterns.

As far as spinning speeds and access times go, here’s how the drives compare.

SATA: SATA drives have relatively large disk platters that can struggle with random workloads due to their slow speed. The large platter size can cause longer seek times.
SAS: These drives have smaller platters with faster speeds. They can cut the seek time down significantly.
SSD: The SSD drive is excellent for extremely high-performance workloads. It has no moving parts, so seek times are almost nonexistent.

2. Layers

In data center storage architecture, you’ll typically see several layers of hardware that serve separate functions. These layers typically include the:

Core layer: This first layer creates the high-speed packet switching necessary for data transfer. It connects to many aggregation modules and uses a redundant design.
Aggregation layer: The aggregation layer is the place where traffic flows through and encounters services like a firewall, network analysis, intrusion detection and more.
Access layer: This layer is where the servers and network physically link up. It involves switches, cabling and adapters to get everything connected and allow users to access the data.

3. Performance vs. Capacity

Disk drive capabilities are always changing. Just think about how expensive a 1 terabyte (TB) hard drive was only five years ago, and how the first 1 megabyte (MB) hard drive cost $1 million. Disk capacity used to be so low that SAN systems didn’t have to worry about the number of disks not creating enough IOPS per gigabyte (GB) — they had plenty. Nowadays, SATA drives and SAS drives can offer similar capacities, with the SATA drive using significantly fewer disks. Fewer disks reduce the number of IOPS generated per GB.

If your work involves a lot of random I/O interactions or extreme demand, using SATA disks can quickly cause your IOPS to bottleneck before you reach capacity. One option here is to front the disks with a solid-state cache, which can greatly improve random I/O performance.

4. RAID Considerations

If using a RAID system, you’ll have one more factor to think about: the parity penalty. This term refers to the performance cost of protecting data with RAID and only affects writes. If your work is write-sensitive, the parity penalty may affect you more since RAID is less stable when it comes to write tasks. Different levels of RAID protection can also affect the level of overhead.

Determining the level of overhead is a complex calculation, one that you can figure out with some information about your prospective system.

Remember that some drive types can benefit from different configurations. An SSD, for instance, can have a RAID1+0 configuration for better performance, while a SATA drive with a RAID6 configuration offers extra security during rebuilds and high capacity.

How Is Storage Architecture Designed?

Designing storage architecture asks us to look closely at the requirements set forth by the business and the environment. It probably goes without saying, but meetings and discussions will help determine your needs. You’ll also want to enlist professional services to help with the specifics and building the architecture itself.

Once you determine what your data pattern looks like, you can start to review aspects like:

Capacity needs
Throughput
IOPS
Additional functions, like replication or snapshots

If you can’t get data on these aspects, looking closely at your operating system and applications can get you started. If you find yourself with a random data pattern, try to balance capacity with IOPS requirements. For sequential workloads, prioritize capacity and throughput. Your MB per second (MB/s) ratings for sequential data will usually exceed requirements.

Tips for Designing a Storage Architecture

Of course, we can’t put everything you need to know about storage architecture in one article, but here are a few more of our tips to help you create the ideal storage structure without too much of a headache.

Evaluate cost from the outset: Keeping cost in mind as you design from the ground up allows you to make realistic decisions that will work in the long term. You wouldn’t want to end up with an architecture that needs to be reorganized right away because upkeep is too expensive or it doesn’t meet the company’s needs. Be realistic about the costs of a storage architecture so it fits within the business budget.
Find areas where you can compromise: You won’t be able to prioritize everything. In many instances, focusing on one aspect will hurt the quality of another. A high-performance system will be costly and could be less scalable. A scalable system might require more skilled administration and could lose speed. Talk with stakeholders about what aspects are necessary for the system and why so you can evaluate possible trade-offs with business needs in mind.
Work in phases: Your first draft is not going to be the same as the final. As you work through the project, you will encounter specific challenges and learn more about the technical details of your system. Try not to lock yourself into a plan and allow the architecture to change organically as you uncover more information.
Identify your needs first: While it may be tempting to dive right into the specific components that you want to use, identifying more abstract requirements is an excellent way to start. Think about the state of your data, what formats you’ll be working with and how you want it to communicate with the server. Try to develop as much information about the required tasks as you can. This approach allows you to work your way down the chain and find solutions that match the needs of more than one operation.

Work With an IT Expert

As you’ve probably gathered, an enterprise’s storage architecture is a complicated piece of technology. And it’s too foundational to try to piece together if you don’t know what you’re doing. That’s where IT experts come in.

Here at Worldwide Services, we know data, and we know businesses. Our team of professionals can design software architecture from the ground up with your company’s needs as their top priority. Whether you need a system that focuses on speed, scalability or something else, we can help. We can also provide maintenance for an existing storage architecture. To learn more about our services, reach out to us today.

What Is Storage Architecture? | May 20th, 2020