FCoE Handbook First-A eBook
Short Description
Download FCoE Handbook First-A eBook...
Description
FCOE HANDBOOK By Ahmad Zamer and Chip Copper
© 2010 Brocade Communications Systems, Inc. All Rights Reserved. Brocade, the B-wing symbol, BigIron, DCX, Fabric OS, FastIron, IronView, NetIron, SAN Health, ServerIron, and TurboIron are registered trademarks, and Brocade Assurance, DCFM, Extraordinary Networks, and Brocade NET Health are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. Other brands, products, or service names mentioned are or may be trademarks or service marks of their respective owners. Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on feature and product availability. Export of technical data contained in this document may require an export license from the United States government. FCoE Handbook First Edition, April 2010; revision A, June 2010 Witten by Ahmad Zamer and Chip Copper Edited by Victoria Thomas Illustrated by David Lehmann, Jim Heuser, and Victoria Thomas Design and layout by Victoria Thomas
Ahmad Zamer Ahmad is responsible for new technology protocols at Brocade, helping evangelize and drive the adoption of new networking and storage technologies such as FCoE, DCB, and TRILL. Ahmad is a high tech veteran with more than 25 years of global computer networking and networked storage industry experience. Most recently, at Intel, he led the introduction of iSCSI to the marketplace and helped drive market adoption. He is a patent holder and an author with more than 50 published articles covering a wide range of technology topics and a frequent speaker at industry events.
Chip Copper As a Brocade Solutioneer, Chip tracks the SAN market and serves as a resource to customers, partners, and integrators--helping them solve real-world business problems through SANs. Copper has over 30 years of experience in program management and information systems with a broad technical background. He has a Ph.D. from the University of Pittsburgh in the area of Distributed Computing.
FCoE Handbook
Page 2
CONTENTS CHAPTER 1: OVERVIEW ....................................................................................... 5 Introduction................................................................................................... 5 The Challenges ............................................................................................. 6 The Solution .................................................................................................. 8 Ethernet and Fibre Channel Remain ......................................................... 10 CHAPTER 2: FCOE & DCB INDUSTRY STANDARDS BODIES ........................ 11 INCITS Technical Committee T11 --FCoE ................................................... 11 IEEE—Data Center Bridging ........................................................................ 12 802.1Qbb: Priority-based Flow Control (PFC) ........................................ 12 802.1Qaz: Enhanced Transmission Selection (ETS)............................. 12 802.1Qau: Congestion Notification (QCN) ............................................. 13 IETF – TRILL ................................................................................................ 13 Fibre Channel over Ethernet (FCoE) .......................................................... 14 FCoE Encapsulation ................................................................................ 14 The FCoE Protocol Stack ........................................................................ 15 FCoE Initialization Protocol (FIP) ............................................................ 16 CHAPTER 3: ARCHITECTURAL MODELS .......................................................... 17 Logical vs. Physical Topologies .................................................................. 19 The FCoE Controller .................................................................................... 21 FIP and MAC Addresses ............................................................................. 22 FPMA and SPMA ......................................................................................... 24 Making Ethernet Lossless .......................................................................... 25 FC Transport Requirements ....................................................................... 27 The Ethernet PAUSE Function .................................................................... 27
FCoE Handbook
Page 3
Priority-based Flow Control ........................................................................ 29 Enhanced Transmission Selection ............................................................ 31 Data Center Bridge eXchange ................................................................ 33 Building the DCB Cloud .............................................................................. 36 Congestion Notification .............................................................................. 38 802.1Qau: Congestion Notification (QCN) ............................................. 38 CHAPTER 4: TRILL—ADDING MULTI-PATHING TO LAYER 2 NETWORKS ... 39 Why Do We Need It? ................................................................................... 39 Introducing TRILL ........................................................................................ 43 The TRILL Protocol ...................................................................................... 45 TRILL Encapsulation ............................................................................... 45 Link-State Protocols ................................................................................ 46 Routing Bridges....................................................................................... 46 Moving TRILL Data...................................................................................... 46 Summary ..................................................................................................... 48 CHAPTER 5: DELIVERING DCB/FCOE .............................................................. 50 Converged Network Adapters (CNAs) .................................................... 50 DCB /FCoE Switches............................................................................... 52 CHAPTER 6: FCOE IN THE DATA CENTER........................................................ 54 Where Will FCoE Be Deployed? .............................................................. 56 Top of Rack ............................................................................................. 57 End of Row .............................................................................................. 59
FCoE Handbook
Page 4
CHAPTER 1: OVERVIEW INTRODUCTION Data networking and storage networking technologies have evolved on parallel but separate paths. Ethernet has emerged as the technology of choice for enterprise data networks, while Fibre Channel (FC) became the dominant choice for enterprise shared Storage Area Networks (SANs). Ethernet and Fibre Channel continue to evolve—Ethernet is poised to reach speeds of 40 Gigabits per second (Gbps) and Fibre Channel will achieve 16 Gbps—on their way to higher speeds and new features. Most large organizations invested in both technologies for data center needs. Ethernet provided the front-end Local Area Networks (LANs) linking users to enterprise servers and Fibre Channel provided the back-end SAN links between server and storage. Today, server virtualization requires powerful CPUs and servers, placing higher demands on server networking and storage I/O interconnects. Maintaining separate data and storage networks also adds to the complexity of managing and maintaining data centers. As enterprises embrace virtualization to increase operational efficiencies and multiple applications are consolidated onto a smaller number of powerful servers, the next step appears to be simplifying server I/O links by converging the data and storage networks onto a common fabric. New industry standards that enable the convergence of data and storage networking interconnect now exist. The new Fibre Channel over Ethernet (FCoE) protocol enables the transport of FC storage traffic over a new lossless Ethernet medium. FCoE is an encapsulation protocol and is not a replacement technology for Fibre Channel. In fact, FCoE builds on the success of FC in the data center and utilizes FC along with new lossless Ethernet to solve server I/O challenges facing IT professionals in the data center.
FCoE Handbook
Page 5
This book investigates the implementation and uses of FCoE. It discusses the technologies of FCoE and lossless Ethernet, and shows how these can be combined to produce solutions resulting in a simplified physical infrastructure. This new infrastructure will consolidate I/O traffic for storage and data networks over common lossless Ethernet links. Before discussing the new technology, let’s start with a discussion of why FCoE is being considered at all. By understanding the challenges presented by consolidation, it’s easier to understand the choice of technologies selected for the solution.
THE CHALLENGES Throughout the enterprise and especially in the data center, there is a continuing need to drive down costs as much as possible. Budgets for capital expenses (CapEx) and operating expenses (OpEx) are almost always being squeezed and rarely increased. Data center managers and architects continue to look for new and creative ways to make infrastructures and operations more cost effective. One common theme for achieving cost reduction is consolidation. An example of a successful consolidation technology is Storage Area Networks. Before SANs, each server had a certain amount of storage directly attached to it (see Figure 1). The storage on each server had to be managed independently, and available disk space on one server could not be effectively deployed on another. Data protection, backups, and migration were difficult and time consuming, since manipulating the data on storage devices meant also dealing with servers.
FCoE Handbook
Page 6
Servers Direct-Attached Storage SAN Server FC Switch
Storage Storage
Figure 1. Direct-Attached Storage (DAS, left) and a simple SAN (right). In a SAN solution, a specialized network is installed between the servers and the storage devices. This network has protocols and hardware to guarantee that storage traffic sent across it will arrive in a timely fashion and that data will be received in order, non-duplicated, and non-corrupted. By consolidating the storage, larger, more cost-effective solutions can be deployed, and the overall management of storage can be simplified. The deployment of a SAN solution involves the implementation of a separate network in the data center. In addition to the Ethernet network used for peer-to-peer or client-server traffic, the SAN uses separate network controllers called host bus adapters (HBAs), distinct cabling, and specialized switches—which all have to be configured and maintained. Because of the timing, reliability, and management differences between Ethernet networks and SANs, these two infrastructures have been deployed separately. This means that a server had to have separate storage and networking I/O adapters, more cabling, and more networking devices had to be used. The opportunity for cost reduction using FCoE is the reduction in the amount of hardware required to deploy a data center solution for both IP networking and storage traffic. This approach reduces not only the initial CapEx required to purchase and deploy the equipment, but also optimizes ongoing OpEx, such as electricity and cooling.
FCoE Handbook
Page 7
THE SOLUTION The FCoE solution for server I/O consolidation and for reducing the amount of cabling in the data center recognizes several fundamental principles: •
TCP/IP is the protocol of choice for peer-to-peer and client server networking.
•
Fibre Channel is the protocol of choice for storage traffic in the data center.
•
Ethernet is a ubiquitous networking technology. Although traditional Ethernet has several characteristics that do not work well with native storage applications, the volume of Ethernet equipment and expertise deployed worldwide makes it a natural candidate for the solution.
These factors influenced the development of FCoE. The idea behind FCoE is to encapsulate FC frames into Ethernet frames, so that they can be transported over a lossless Ethernet medium. To achieve that, Ethernet had to be enhanced with new features so that is could natively provide the lossless delivery of frames containing FC data. In creating the new FCoE stack, the higher levels of the Fibre Channel protocol are layered on top of the new lossless Ethernet. By isolating the changes to Fibre Channel to the lowest layers (FC-0 and FC-1, see Figure 4), the upper constructs of FC are preserved, allowing FCoE to seamlessly integrate into existing Fibre Channel SANs without disrupting installed resources. Although this results in a change of transport, there are no fundamental differences in the behavior of Fibre Channel whether deployed natively or across lossless Ethernet.
FCoE Handbook
Page 8
Operating Systems/Applications SCSI Layer
iSCSI
FCP
FCP
TCP IP
FCoE
Ethernet
DCB
FC
iSCSI
Fibre Channel over Ethernet
Fibre Channel
Figure 2. Protocols today and converged over DCB. The changes, or enhancements, made to Ethernet to support convergence do not prevent the simultaneous deployment of a TCP/IP stack. More importantly, since the new lossless Ethernet can be beneficial to the flow control aspects of TCP/IP, the lossless behavior of the new Ethernet can be turned on or off for LAN TCP/IP traffic. That gives data center professionals the ability to deploy LANs using all of the following on one converged medium: •
Traditional Ethernet, lossless
•
Ethernet with DCB features
•
FCoE with DCB features enabled
The combination of DCB and FCoE technologies provides a solution for the challenges of physical infrastructure reduction and cabling simplicity. Subsequent chapters describe how this was accomplished.
FCoE Handbook
Page 9
ETHERNET AND FIBRE CHANNEL REMAIN Although DCB and FCoE provide a mechanism for storage and networking traffic to share or converge on one wire, this capability is not expected to eliminate the need for either native Fibre Channel or traditional lossy Ethernet. FCoE is expected to be deployed in data centers alongside existing infrastructures or on new servers with Windows or Linux environments running virtualized tier 1 and some tier 2 applications (see Chapter 6). Both FC and Ethernet technologies will continue to evolve and be deployed in data centers, often alongside FCoE, for the foreseeable future. More often than not, SAN and IP infrastructures are managed by separate groups. They have different maintenance schedules, different service level agreements (SLAs), and different architectural goals. The advantages gained by reducing the number of adapters, wires, and infrastructure devices may not offset the perceived impact of convergence on organizational considerations. It is likely that most organizations will maintain their best IT practices and keep separate storage and networking staff, while taking advantage of role-based management to administer converged networks.
FCoE Handbook
Page 10
CHAPTER 2: FCoE & DCB INDUSTRY STANDARDS BODIES The advent of Fibre Channel over Ethernet (FCoE) and Data Center Bridging (DCB) ushers in the beginning of renewed efforts to converge data center storage SAN and LAN networks. The diverse nature of the technologies needed to enable convergence requires the development of new industry standards that cover Fibre Channel, Ethernet, and Link Layer (Layer 2) routing. The FCoE- and DCB-related protocols are being developed by three different industry standards bodies, each focusing on the technology areas that fall into their domain of expertise:
INCITS TECHNICAL COMMITTEE T11 --FCOE
The FCoE protocol was developed by the INCITS Technical Committee T11 as part of the T11 FC-BB-5 project. The FCoE protocol and the FCoE Initialization (FIP) protocol are defined in FC-BB-5, which describes how other protocols are transported and mapped over a Fibre Channel network. The T11 committee completed its technical work for FC-BB-5 in June 2009 and forwarded the draft standard to INCITS for approval and publishing. The INCITS public review was completed with no comments, which means that the standard will be published by INCITS as an industry standard very soon. The new FCoE standard is an encapsulation protocol that wraps FC storage data into Ethernet frames, which enables them to be transported over a new lossless Ethernet medium.
FCoE Handbook
Page 11
IEEE—DATA CENTER BRIDGING
The Data Center Bridging (DCB) effort undertaken by the IEEE 802.1 work group is aimed at adding new extensions to bridging and Ethernet, so that it becomes capable of converging LAN and storage traffic on a single link. Often, you hear that DCB new features will make Ethernet “like Fibre Channel.” That is true, because the new features being added to Ethernet are solving issues that FC faced in the past and successfully resolved. IEEE is expected to complete its work on the components of DCB in the second half of 2010. The new enhancements are PFC, ETS, and DCBX.
802.1Qbb: Priority-based Flow Control (PFC) •
Establishes eight priorities for flow control based on the priority code point field in the IEEE 802.1Q tags. This enables controlling individual data flows on shared lossless links. PFC capability allows FC storage traffic encapsulated in FCoE frames to receive lossless service from a link that is shared with traditional LAN traffic, which is loss-tolerant.
802.1Qaz: Enhanced Transmission Selection (ETS) •
ETS provides the capability to group each type of data flow, such as storage or networking, and assigns a group identification number to each of the groups, which are also called traffic class groups. The value of this new feature lies in its ability to manage bandwidth on the Ethernet link by allocating portions (percentages) of the available bandwidth to each of the groups. Bandwidth allocation allows traffic from the different groups to receive their target service rate (e.g., 8 Gbps for storage and 2 Gbps for LAN). Bandwidth allocation provides quality of service to applications.
FCoE Handbook
Page 12
•
Incorporates Data Center Bridging Exchange (DCBX), a discovery and initialization protocol that discovers the resources connected to the DCB cloud and establishes cloud limits. DCBX distributes the local configuration and detects the misconfiguration of ETS and PFC between peers. It also provides the capability for configuring a remote peer with PFC, ETS, and application parameters. The application parameter is used for informing the end station which priority to use for a given application type (e.g., FCoE, iSCSI). DCBX leverages the capabilities of IEEE 802.1AB Link Layer Discovery Protocol (LLDP).
802.1Qau: Congestion Notification (QCN) •
An end-to-end congestion management that enables throttling of traffic at the edge nodes of the network in the event of traffic congestion
IETF – TRILL
Internet Engineering Task Force (IETF) is developing a new shortest path frame routing for multi-hop environments. The new protocol is called Transparent Interconnection of Lots of Links, or TRILL for short, and is expected to be completed in the second half of 2010. •
TRILL provides a Layer 2 (L2) multi-path alternative to the single path and network bandwidth limiting Spanning Tree Protocol (STP), currently deployed in data center networks.
•
TRILL will also deliver L2 multi-hop routing capabilities, which are essential for expanding the deployment of DCB/FCoE solutions beyond access layer server I/O consolidation and into larger data center networks.
FCoE Handbook
Page 13
FIBRE CHANNEL OVER ETHERNET (FCOE) FCoE Encapsulation FCoE is a new industry standard protocol that is aimed at enabling the transport of Fibre Channel (FC) storage traffic over new enhanced lossless Ethernet links. To achieve that goal, FCoE simply encapsulates, or wraps, FC frames into Ethernet frames (see Figure 2) and prepares them for transport over Data Center Bridging links. It is important to note that FCoE simply wraps the entire FC frame as is without any modifications. The fact that the FC payload remains intact throughout its FCoE journey means that FCoE preserves the FC constructs and services and enables FCoE solutions to utilize existing management applications. As a result, FCoE solutions are designed to integrate seamlessly into existing FC environments without introducing incompatibilities or disrupting existing infrastructures. But FCoE will not alleviate existing incompatibilities in existing FC products or environments. As an encapsulation protocol, FCoE builds on the success of Fibre Channel in the data center and serves to supplement its presence. Contrary to some beliefs, FCoE does not compete with FC, since encapsulation protocols tend to supplement rather than compete with storage interface or networking protocols such as FC.
FCoE Handbook
Page 14
The FCoE Protocol Stack The FCoE protocol stack (see Figure 4) is constructed by taking FC upper services (Layers FC-2, FC-3, and FC-4) and placing them on top of Ethernet physical and Data Link layers (L2). Note that the layer labeled “Data Link (DCB)” in Figure 4 is the Ethernet Link Layer (L2) in which DCB enhancements aimed at making Ethernet lossless are being added. Sandwiched between the FC and Ethernet layers is the FCoE layer. The FCoE layer encapsulates FC-to-DCB traffic and performs the reverse function on DCB-to-FC traffic. Ethernet frames carrying FC traffic are assigned a new EtherType code to distinguish FCoE traffic. FCoE also requires a new larger Ethernet frame size called baby jumbo frames that accommodate the FC frame and the associated Ethernet and FCoE headers.
FCoE-HDR FC-HDR
DA
SA
TYPE
Data
Data
CRC
E O F
FC frame with SOF in FCoE header and EOF at the end of the FC frame
CRC
Ethernet frame
Figure 3. Ethernet frame with FC frame insertion.
FCoE Handbook
Page 15
FC-4 FC-3
Fibre Channel Services
FC-2 FCoE Data Link (DCB)
Ethernet
Physical
Figure 4. FCoE Protocol stack.
FCoE Initialization Protocol (FIP) FIP is the control plane protocol used in the discovery and initialization stages of establishing links among the elements of the fabric. Once discovery and initialization are established and the nodes have successfully performed fabric login, the FCoE data plane frames are used for data transmission operations. Unlike FCoE frames, FIP frames do not transport FC data, but contain discovery and Login/Logout parameters. FIP frames are assigned a unique EtherType code to distinguish them from FCoE frames with FC storage data.
FCoE Handbook
Page 16
CHAPTER 3: ARCHITECTURAL MODELS The Fibre Channel architectural model consists of five layers: FC-0, FC-1, FC-2, FC-3, and FC-4. Because of the many functions performed at the FC-2 layer, the T11 committee decided that all layers above and including FC-2 in the FC model should remain intact in the implementation of FCoE. This decision preserves the investments made by Fibre Channel developers and customers and allows the integration of FCoE into current FC SAN deployments. Although this handbook focuses on FCoE, Fibre Channel as a transport will continue to be developed independently of FCoE. Layers FC-0, FC-1, FC-2P, and FC-2M have ambitious roadmaps ahead of them, and the industry will continue to see more advances in these areas. Some storage architects have decided that they will not deploy FCoE at all and that native Fibre Channel will continue to be the transport of choice. By limiting the impact of FCoE to the lower layers of the protocol stack, the advances made in layers FC-2V and above will be available in all deployments independent of the technology used at the transport layers. In the FCoE architecture, the functionality of layers FC-2M and below are implemented in three layers. At the bottom, the physical interface transceiver (PHY) and media access controller (MAC) layers of lossless Ethernet are used. Above that, a new layer called the FCoE Entity is defined. This layer is responsible for managing FCoE Link End-Points (FCoE_LEP). Each FCoE_LEP represents a virtual connection and is responsible for the encapsulation, de-encapsulation, transmission, and reception of Fibre Channel frames across the lossless Ethernet network. In Fiber Channel (FC), there is a very natural mapping of a physical port into the function it performs. For example, an N_Port, or Node Port, is not thought of as just the physical connector at the point where a Fibre Channel cable is plugged into a host or storage device. It also defines the behavior of the port and the logical end-point of storage traffic moving into or out of the device. If an FC device is described as having two N_Ports, then it is understood that there are two separate physical connectors and that each of these connectors has associated with it an independently functioning logical component representing the functionality of the N_Port. FCoE Handbook
Page 17
This same concept is true for F_Ports and E_Ports. Consider a Fibre Channel switch with 32 ports. Because there are 32 connectors on the switch, it is commonly understood that the device can support up to a total of 32 F_Ports and E_Ports, and that any particular port can take on either personality. There can be, however, no more than a total of 32 F_Ports and E_Ports, because the logical functionality of a port is associated with the physical connector. This one-to-one mapping of ports to logical functionality does not exist as the lower layers of Fibre Channel are replaced with lossless Ethernet. One physical lossless Ethernet port is capable of supporting the functionality of multiple logical Fibre Channel ports. A single server with dual lossless Ethernet ports and controllers may be capable of supporting many more instances of logical Fibre Channel ports through the FCoE Entity layer. In order to distinguish between the physical ports and the logical functionality, a new nomenclature is used in FCoE. A “P” prefix on a port type name refers to the physical entity, and a “V” prefix refers to the virtual functionality. For example, a VN_Port represents the logical functionality of a Fibre Channel N_Port on an FCoE-capable server. In this example, there is no corresponding PN_Port, since all of the VN_Port’s traffic would flow through a lossless Ethernet port. This convention provides much more clarity when describing FCoE implementation and architecture. Certain characteristics such as MAC addresses can be associated not only with lossless Ethernet ports but also with the VN_Ports in a particular server. By differentiating between the physical and logical, it is clear which entity is being referenced by an attribute such as a MAC address. In the FCoE environment, the equivalent of a Fibre Channel node is an Enode. By identifying something as an Enode, the use of FCoE for FC protocol connectivity is implied. Similarly, an FCoE switch is called a Fibre Channel Forwarder (FCF). An FCF performs all of the functions and has all of the services of an FC switch, and is equipped with one or more FCoE ports. Additionally, an FCF may contain one or more lossless Ethernet bridges and possibly native Fibre Channel ports, but these elements are not necessary.
FCoE Handbook
Page 18
LOGICAL VS. PHYSICAL TOPOLOGIES As discussed in the previous section, in Fibre Channel, there is a one-to-one mapping between the cabling of nodes to switching elements and the logical layout of the fabric. Each PN_Port cabled to a PF_Port represents a logical link between the corresponding VN_Port and VF_Port. The reverse is also true. If a logical topology diagram shows connectivity between two VE_Ports, then there must be a cable connecting the two corresponding PE_Ports. If an additional data path is required, a cable must be added. This symmetric relationship is shown in Figure 5. In FCoE, this is not the case. Since each Enode and FCF can create multiple FCoE_LEPs, there may be many more virtual connections between devices than there are physical connections. Suppose that there are two Enodes and two FCFs, each with only one Ethernet port, communicating through a lossless Ethernet cloud. In this configuration, four different virtual connections can be established: •
Through a single physical connection, each of the Enodes can be logically connected to each of the FCFs.
•
Similarly, FCFs can establish multiple logical ISLs out of a single port. Three FCFs, each with a single port, can have six ISLs among them.
FCoE Handbook
Page 19
FC-3/FC-4s
FC-3/FC-4s
VN_Port
F2-2V
VN_Port
F2-2V
FCoE_LEP
FCoE entity
FCoE_LEP
FCoE entity
FCoE controller
Lossless Ethernet MAC
Ethernet port
Figure 5. FCoE port model Although the FCoE specification allows multiple virtual connections from a single physical port on an FCF, it requires all of the connections from a single physical port to be of the same port type. While a particular lossless Ethernet port is being used to support VE_Port connections, it cannot be used for VF_Ports, and vice versa. This is because of the way Ethernet MAC addresses are used in the discovery process, as discussed in the section “Establishing Virtual Connections.”
FCoE Handbook
Page 20
This characteristic of not requiring a physical connection for each logical connection has significant ramifications for the design and behavior of a fabric. Although a single Enode may be able to attach to many switches logically, it is not clear that this is a desired behavior. For example, if there are 10 FCFs in a fabric, an administrator may not want a single Enode to establish a connection with each of them. This and additional topics relating to the merging of Ethernet and Fibre Channel are discussed later.
THE FCOE CONTROLLER In Fibre Channel, the process of discovering adjacent devices is fairly straightforward. Since Fibre Channel is a point-to-point network, a neighbor can be discovered by beginning a protocol conversation on a port. If a response is received, then that response came from the device on the other end of the cable. Because FCoE has the ability to have a lossless Ethernet cloud between Enodes and FCFs, the process of discovering and establishing logical neighbors is more complex. These neighbors may be several hops away. FCoE therefore includes mechanisms to allow Enodes and FCFs to advertise themselves and to be sought out on a network. Reviewing the architectural models already presented, there is not a natural place for this discovery process to be implemented in the FCoE stack. The FC-2V layer is responsible for VN_Ports, VF_Ports, and VE_Ports, but as defined in the Fibre Channel standard, it is not responsible for seeking out and identifying other ports on the network. The FCoE Entity is responsible for the encapsulation, de-encapsulation, transmission, and reception of FC frames through a virtual link, but does not set up or tear down that link. The lossless Ethernet layers below know nothing about Fibre Channel other than it is another payload type. FCoE therefore specifies an additional component called the FCoE Controller. One FCoE Controller is associated with each lossless Ethernet MAC. The FCoE Controller is given the responsibility of managing the logical connections for the Enode or FCF across that Ethernet MAC. A single Enode or FCF may have multiple lossless Ethernet MACs, and so will have multiple FCoE Controllers. FCoE Handbook
Page 21
The FCoE Controller uses the FCoE Initialization Protocol (FIP) to communicate with other FCoE Controllers for connection management. The FIP protocol is not really an extension of Fibre Channel or FCoE, but rather exists as an independently operating protocol. It has a unique EtherType, meaning that Ethernet switches that do not understand FCoE or Fibre Channel can still distinguish FIP messages from FCoE traffic. Once an FCoE Controller has identified a peer of an appropriate type (using a mechanism to be described shortly), it establishes a connection with that peer and creates an FCoE Entity representing that connection. The partner FCoE Controller will create an FCoE Entity representing the opposite side of the connection for use by its local Fibre Channel protocol stack. Because multiple eligible partners may be discovered, more than one FCoE Entity may be created. The FCoE Controller is also responsible for ongoing maintenance of the connections. It will tear down connections when appropriate, and can optionally send keep-alive messages to insure that ongoing connectivity is maintained.
FIP AND MAC ADDRESSES In a Fibre Channel network, fabric addresses are used to identify the source and destination of messages. These addresses are assigned to nodes as they log into the fabric. Each fabric address is guaranteed to be unique. In a (non-FCoE) Fibre Channel network, the fabric address is sufficient for directing messages throughout the infrastructure. In FCoE, Fibre Channel frames still contain fabric addresses, but they are encapsulated as payload for transmission. The lossless Ethernet network uses MAC addresses to determine where messages should go. The job of managing the association between MAC addresses and fabric addresses is handled by the FCoE Controller using the FIP protocol. FCoE Controllers use multicast messages to locate other FCoE Controllers on the network. The FIP protocol uses several different predefined multicast types to allow FCoE Controllers to advertise their presence and to discover peers of the appropriate type.
FCoE Handbook
Page 22
An FCoE Controller on an FCF can represent VE_Ports or VF_Ports. (Remember that the standard doesn’t allow a single FCF port to support both types simultaneously, although a single port can support multiple logical ports of the same type at the same time.) When the FCoE Controller represents a VE_Port, it periodically multicasts an announcement specific to other VE_Port-configured FCoE Controllers, indicating its availability. This announcement includes the MAC address of the sending controller so that receiving stations can reply with a unicast request for connection if they want to. This periodic multicast also serves as a keep-alive message to other stations currently logically connected to this FCF. FCF
Enode VLAN Request VLAN Notification Discovery Solicitation
FIP
Discovery Advertisement FLOGI Request FLOGI Accept
FCoE
Discovery
PLOGI (Directory Server) PLOGI Accept
Login
Data Transfer
Figure 6. Roles of FIP and FCoE in discovery and data transmission. When an FCoE Entity representing VE_Ports comes online, it multicasts a request for other VE_Port-configured FCoE Controllers. All VE_Port FCoE Controllers receiving this request respond with a unicast message if they are interested in establishing a logical connection. This response includes their MAC address so that the receiving station can contact them directly.
FCoE Handbook
Page 23
If an FCF happens to have multiple lossless Ethernet MACs, then each of the FCoE Controllers associated with those MACs will perform independent advertisements and solicitations. Similarly, when a new VF_Port comes online, it also sends a multicast announcement. This multicast is sent to an address representing all FCoE Controllers for Enodes on the network. It serves as a keep-alive message to all Enodes currently logically connected to that MAC and can be used by other Enodes to build a list of reachable FCFs. A new Enode coming online sends a multicast to all FCFs. FCF MACs receiving this multicast respond with a unicast message containing their MAC addresses to allow the Enode to contact them directly. When an FCF creates a new FCoE Entity, that new entity continues to be addressed using the MAC address of the lossless Ethernet controller. In the case of an Enode, however, different rules apply. The FCoE Entities in Enodes are permitted to have unique MAC addresses of their own.
FPMA AND SPMA The FCoE specification describes two methods for assigning MAC addresses to FCoE Entities: Fabric Provided MAC Addresses (FPMA) and Server Provided MAC Addresses. Support for FPMA is required by all FCFs and Enodes in the fabric. Support for SPMA is optional. The FPMA approach allows FCFs to respond with a MAC address for an Enode FCoE Entity when the FCoE Controller requests a connection. This MAC address is built from two different components: •
The first three bytes are globally defined across the fabric and serve as a type of fabric identifier. The values for these three bytes have been reserved to insure that they will not coincide with MAC address provided by manufacturers.
•
The last three bytes of the address are the same as the three bytes representing the FCl address assigned by the FCF. The FPMA scheme makes it easy for any device to determine the logical Fibre Channel source and destination for a frame given the source and destination MAC addresses.
FCoE Handbook
Page 24
FPMA also allows quick differentiation between multiple separate streams between an Enode and a single FCF. An Enode can create several logical VF_Ports, and each of those ports can have a logical link with the same FCF. By assigning a distinct MAC address to each VF_Port, multiple streams are differentiated only by the Ethernet headers. With SPMA, the Enode device is responsible for the management and assignment of MAC addresses to FCoE Controllers and FCoE Entities. To be sure that duplicate MAC addresses do not appear on the network, these MAC addresses must be globally defined and so cannot be dynamically created. A node implementing SPMA may choose to use the same MAC address for multiple VN_Ports and the FCoE Controllers, so it’s impossible to distinguish between different streams by looking only at the MAC addresses. The frames require deeper inspection to determine the unique fabric identifiers from the encapsulated FC frame.
MAKING ETHERNET LOSSLESS In its original form, Ethernet was designed as a best-effort delivery architecture. It does its best to be sure that a frame is correctly transmitted from one station to another until it has reached its final destination. As a result, Ethernet is a lossy transport that is not suitable for transporting storage data, which requires lossless medium to ensure data delivery and protect against loss of customers’ valuable data. As originally designed, many Ethernet stations would listen to the same segment of media and would copy all frames they heard, keeping only those frames intended for that station. When a station wanted to transmit, it would begin by listening to the media to be sure that no one else was transmitting. If so, it would wait until the media was silent and then begin to modulate the frame out onto the shared media. At the same time, the station would listen to the media to be sure that it heard the same data that it was sending. If it heard the same frame from beginning to end, then the station considered the frame sent and would do no further work at the Ethernet layer to be sure that the message arrived correctly. If the station happened to hear something other than what it was transmitting, it would assume that another station began transmitting at FCoE Handbook
Page 25
about the same time. This station would then continue to transmit for a period to be sure that the other station was also aware of the collision but would then abort the frame. After a random time interval, the station would reattempt to send the frame. By using this approach, a station can be confident that a frame was sent correctly but not whether the frame was received correctly. Ethernet implementations have moved from this shared-media approach to one in which each segment of media is shared by only two Ethernet stations. Dual unidirectional data paths allow the two stations to communicate with each other simultaneously without fear of collisions. Although this approach addresses how frames are delivered between Ethernet stations, it doesn’t change the behavior of how frames are treated once they’re received. The rules of Ethernet allow a station to throw away frames for a variety of reasons. For example, if a frame arrives with errors, it’s discarded. If a nonforwarding station receives a frame not intended for it, it discards the frame. But most significantly, if a station receives an Ethernet frame and it has no data buffer in which to put it, according to the rules of Ethernet, it can discard the frame. It can do this because it’s understood that stations implementing the Ethernet layer all have this behavior. If a higher level protocol requires a lossless transmission, another protocol must be layered on top of Ethernet to provide it. Consider an implementation of the FTP protocol running across an Ethernet network. FTP is part of the TCP/IP tool suite. This means that from the bottom layer up, FTP is based on Ethernet, IP, TCP, and finally FTP itself. Ethernet does not guarantee that frames will not be lost and neither does IP. The TCP layer is responsible for monitoring data transmitted between the FTP client and server, and if any data is lost, corrupted, duplicated, or arrives out of order, TCP will detect and correct it. It will request the retransmission of data if necessary, using the IP and Ethernet layers below it to move the data from station to station. It will continue to monitor, send, and request transmissions until all the necessary data has been received reliably by the FTP application.
FCoE Handbook
Page 26
FC TRANSPORT REQUIREMENTS The architecture of the Fibre Channel protocol is different. Ethernet only guarantees the best-effort delivery of frames and allows frames to be discarded under certain circumstances. Fibre Channel, however, requires reliable delivery of frames at the equivalent level of the Ethernet layer. At this layer, a Fiber Channel switch or host is not allowed to discard frames because it does not have room for them. It accomplishes this by using a mechanism called buffer credits A buffer credit represents a guarantee that sufficient buffer space exists in a Fibre Channel node to receive an FCl frame. When a Fibre Channel node initializes, it examines its available memory space and determines how many incoming frames it can accommodate. It expresses this quantity as a number of buffer credits. A Fibre Channel node wishing to send a frame to an adjacent node must first obtain a buffer credit from that node. This is a guarantee that the frame will not be discarded on arrival because of a lack of buffer space. The rules of Fibre Channel also require a node to retain a frame until it has been reliably passed to another node or it has been delivered to a higher level protocol. As discussed previously, implementations of FCoE replace the lower layers of Fibre Channel with Ethernet. Since the lower layers of Fibre Channel are responsible for guaranteeing the reliable delivery of frames throughout the network, that role must now fall to Ethernet. The behavior of Ethernet must therefore be changed to accommodate this new responsibility.
THE ETHERNET PAUSE FUNCTION A decision was made by the IEEE committee members that lossless behavior for Ethernet would be implemented by using a variant of the PAUSE function currently defined as part of the Ethernet standard. The PAUSE function allows an Ethernet station to send a PAUSE frame to an adjacent station. The PAUSE semantics require the receiving station not to send any additional traffic until a certain amount of time has passed. This time is specified by a field in the PAUSE frame.
FCoE Handbook
Page 27
Using this approach, lossless behavior can be provided if a receiving station issues PAUSE requests when it does not have any buffer space available to receive frames. It assumes that by the time the PAUSE request expires, there will be sufficient buffer space available. If not, it is the responsibility of the receiving station to issue ongoing PAUSE requests until sufficient buffer space becomes available. The PAUSE command provides a mechanism for lossless behavior between Ethernet stations, but it is only suited for links carrying one type of data flow. Recall that one of the goals of FCoE is to allow for I/O consolidation, with TCP/IP and Fibre Channel traffic converged onto the same media. If the PAUSE command is used to guarantee that Fiber Channel frames are not dropped as is required by that protocol, then as a side effect, TCP/IP frames will also be stopped once a PAUSE command is issued. The PAUSE command doesn’t differentiate traffic based on protocols. It pauses all traffic on the link between two stations, even control commands. So a conflict between what must be done to accommodate storage traffic in FCoE and TCP/IP traffic—both of which need to coexist on the same segment of media. And problems could arise because one type of network traffic may interfere with the other. Suppose, for example, that storage traffic is delayed because of a slow storage device. In order to not lose any frames relating to the storage traffic, a PAUSE command is issued for a converged link carrying both FCoE and TCP/IP traffic. Even though the TCP/IP streams may not need to be delayed, they will be delayed as a side effect of having all traffic on the link stopped. This in turn could cause TCP time-outs and may even make the situation worse as retransmit requests for TCP streams add additional traffic to the already congested I/O link. The solution to this problem is to enable Ethernet to differentiate between different types of traffic and to allow different types of traffic to be paused individually if required.
FCoE Handbook
Page 28
PRIORITY-BASED FLOW CONTROL The idea of Priority-based Flow Control (PFC) is to divide Ethernet traffic into different streams or priorities. That way, an Ethernet device can distinguish between the different types of traffic flowing across a link and exhibit different behaviors for different protocols. For example, it will be possible to implement a PAUSE command to stop the flow of FCoE traffic when necessary, while allowing TCP/IP traffic to continue flowing. When such a change is made to the behavior of Ethernet, there is a strong desire to do with minimum impact to Ethernet networks already deployed. An examination of the IEEE 802.1q VLAN header standard reveals that a three-bit field referred to as the Priority Code Point (PCP) could be used to differentiate between eight different traffic priority levels, and therefore distinguish eight different types of traffic on the network. Destination Address
Source Address
Length / Type
MAC Client Data
802.1Q VLAN Tag Tag Type 2 Bytes
• TPID: Tag Protocol
Identifier. Indicates that frame is 802.1Q tagged
Tag Control Info 2 Bytes
• PCP: Priority Code Point.
Used for DCB Priority Flow Control (PFC)
802.1Q VLAN Ta Field TPID 16 bits Used by DCB for PFC Value
PCP 3 bits
CFI 1 bit
Frame Check
• CFI: Canonical Format VID 12 bits
Indicator
• VID: VLAN ID
For proper FCoE traffic, Brocade 8000 DCB ports are set to converged mode to handle tagged frames with PFC value
Figure 7. Priority Flow Control in IEEE 802.1q VLAN. In addition, the Ethernet PAUSE command has a sufficient number of bytes available to allow an individual pause interval to be specified for each of the eight levels, or classes, of traffic. FCoE and TCP/IP traffic types can therefore be converged on the same link but placed into separate traffic
FCoE Handbook
Page 29
classes. The FCoE traffic can be paused in order to guarantee the reliable delivery of frames, while TCP/IP frames are allowed to continue to flow. Not only can different traffic types coexist, but best practices for each can be implemented in a non-intrusive manner.
Priority 0: FCoE Priority 1: FCoE Priority 2: LAN
Priority 6: User X Priority 7: User Z
Figure 8. Eight priorities per link using PFC. From another perspective, consider that PFC attempts to emulate Virtual Channel (VC) technology widely deployed in current Brocade Fibre Channel SANs. While borrowing the lossless aspect of VCs, PFC retains the option of being configured as lossy or lossless. PFC is an enhancement to the current link-level of Ethernet flow control mechanism defined in IEEE 802.3x (PAUSE). Current Ethernet protocols support the capability to assign different priorities to different applications, but the existing standard PAUSE mechanism ignores the priority information in the Ethernet frame. Triggering the PAUSE command results in the link shutting down, which impacts all applications even when only a single application is causing congestion. The current PAUSE is not suitable for links in which storage FCoE and networking applications share the same link, because congestion caused by any one of applications shouldn’t disrupt the rest of the application traffic.
FCoE Handbook
Page 30
IEEE 802.1Qb is tasked with enhancing the existing PAUSE protocol to include priority in the frames contributing to congestion. PFC establishes eight priorities using the priority code point field in the IEEE 802.1Q tags (see Figure 8), which enable the control of individual data flows, called flow control, based on the frame’s priority. Using the priority information the peer (server or switch) stops sending traffic for that specific application, or priority flow, while other applications data flows continue without disruption on the shared link. The new PFC feature allows FC storage traffic encapsulated in FCoE frames to receive lossless service from a link that is shared with traditional LAN traffic which is loss-tolerant. In other words, separate data flows, can share a common lossless Ethernet, while each is protected from flow control problems of the other flows. Note that LAN traffic priorities can be configured with PFC off, allowing for lossy or lossless LAN transmissions.
ENHANCED TRANSMISSION SELECTION With the use of Priority Flow Control, it is possible to combine eight different levels or classes of traffic onto the same converged link. Each of these classes can be paused individually if necessary without interfering with other classes. PFC does not, however, specify how the bandwidth is to be allocated to separate classes of traffic. Suppose, for example, that a particular application happens to hit a hot spot that causes it to send a large number of TCP/IP messages. There is a good chance that the transmission of all these messages could interfere with the operating system’s attempt to either retrieve or store block information from the storage network. Under these or similar circumstances, bandwidth starvation could cause either the application or the operating system to crash.
FCoE Handbook
Page 31
This situation does not occur if separate channels are used for storage and non-storage traffic. A Fibre Channel-attached server could access its block traffic independent of the messages traveling across an Ethernet TCP/IP connection. Competition for bandwidth occurs only when these two ordinarily independent streams share a common link. In order to insure that all types of traffic are given the appropriate amount of bandwidth, a mechanism called Enhanced Transmission Selection (ETS) is used with Priority Flow Control. ETS establishes priorities and bandwidth limitations to insure that all types of traffic receive the priority and bandwidth they require for the proper operation of the server and all applications. ETS establishes Priority Groups, or traffic class groups. A Priority Group is a collection of priorities as established in PFC. For example, all of the priorities associated with Inter Process Communication (IPC) can be allocated to one Priority Group (traffic class group). All priorities assigned to FCoE can be assigned to a second traffic class group, and all IP traffic can be assigned to a third group, as shown in Figure 9. Each Priority Group has an integer identifier called the Priority Group ID (PGID) assigned to it. The value of the PGID is either 15 or a number in the range of 0 through 7. If the PGID for a Priority Group is 15, all traffic in that group is handled on a strict priority basis. That is, if traffic becomes available, it is handled before traffic in all other Priority Groups without regard for the amount of bandwidth it takes. A PGID of 15 should be used only with protocols requiring either an extremely high priority or very low latency. Examples of traffic in this category include management traffic, IPC, or audio/video bridging (AVB). The other traffic class groups with PGID identifiers between 0 and 7 are assigned a bandwidth allocation (PG%). The sum of all bandwidth allocations should equal 100%. The bandwidth allocation assigned to a traffic class group is the guaranteed minimum bandwidth for that group assuming high utilization of the link. For example, if the PG for the traffic class group containing all storage traffic is 60%, it is guaranteed that at
FCoE Handbook
Page 32
least 60% of the bandwidth available after all PGID 15 traffic has been processed will be allocated to the storage traffic Priority Group. The specification for ETS allows a traffic class group to take advantage of unused bandwidth available on the link. For example, if the storage traffic class group has been allocated 60% of the bandwidth and the IP traffic class group has been allocated 30%, the storage group can use more than 60% if the IP traffic class group does not require the entire 30%.
Priority Group 1: Storage 60% Priority Group 2: LAN 30% Priority Group 3: IPC 10%
Figure 9. Assigning bandwidth using ETS.
Data Center Bridge eXchange In order for PFC and ETS to work as intended, both devices on a link must use the same rules. (PFC and ETS are implemented only in point-to-point, full-duplex topologies) The role of Data Center Bridge eXchange (DCBX) is to allow two adjacent stations to exchange information about themselves and what they support. If both stations support PFC and ETS, then a lossless link can be established to support the requirements of FCoE. As with other protocols, an implementation goal of DCBX is that it must be possible to add DCBX-equipped devices to legacy networks without breaking them. That’s the only way to build a lossless Ethernet sub-cloud inside a larger Ethernet data center deployment. This is discussed in more detail in a following section.
FCoE Handbook
Page 33
Today, nearly all Ethernet devices are equipped to support the Link Layer Discovery Protocol (LLDP). LLDP is a mechanism whereby each switch periodically broadcasts information about itself to all of its neighbors. It’s a one-way protocol, meaning that there is no acknowledgement of any of the data transmitted. Broadcasted information includes a chassis identifier, a port identifier, a time-to-live (TTL) field, and other information about the state and configuration of the device. Information in an LLDP data unit (LLDPDU) is encoded using a type-lengthvalue (TLV) convention: •
Each unit of information in the LLDPDU starts with a type field that tells the receiver what that information block contains.
•
The next field, the length field, allows a receiver to determine where the next unit of information begins. By using this field, a receiver can skip over any TLVs that it either doesn’t understand or doesn’t want to process.
•
The third element of the TLV is the value of that information unit.
The LLDP standard defines a number of required and optional TLVs. It also allows for a unique TLV type, which permits organizations to define their own additional TLVs as required. By taking advantage of this feature, DCBX can build on LLDP to allow two stations to exchange information about their ability to support PFC and ETS. Stations that do not support PFC and ETS are not negatively impacted by the inclusion of this information in the LLDPDU, and they can just skip over it. The absence of DCBX-specific information from an LLDPDU informs an adjacent station that it is not capable of supporting those protocols. DCBX also enhances the capabilities of LLDP by including additional information that allow the two stations to be better informed about what the other station has learned to keep the two stations in sync. For example, the addition of sequence numbers in the DCBX TLV allows each of the two stations to know that it has received the latest information from its peer and that its peer has received the latest information from it.
FCoE Handbook
Page 34
There are currently three different subclasses of information exchanged by DCBX: •
The first subclass is control traffic for the DCBX protocol itself. By using this subtype, state information and updates can be exchanged reliably between two peers.
•
The second subtype allows the bandwidth for Traffic Class Groups to be exchanged. The first part of this data unit identifies the PGID for each of the seven message priorities. (For a review of message priorities, Priority Groups, PGIDs, and PG%, see previous several sections.) The second part of the data unit identifies the bandwidth allocation that is assigned to each of the PGIDs 0 through 7. Recall that PGID 15 is a special group that always gets priority over the others independent of any bandwidth allocation.
•
The final part of the subtype allows a station to identify how many traffic classes it supports on this port. You can think of a traffic class as a collection of different types of traffic that are handled collectively. The limitation on the number of traffic classes supported by a port may depend on physical characteristics such as the number message queues available or the capabilities of the communication processors.
Because of the grouping of message priorities into traffic classes, it’s not necessary for a communications port to be able to support as many traffic classes as there are priorities. To support PFS and ETS, a communications port only needs to be able to handle three different traffic classes: •
One for PGID 15 high priority traffic
•
One for those classes of traffic that require PFC support for protocols such as FCoE
•
One for traffic that does not require the lossless behavior of PFC, such as TCP/IP.
By exchanging the number of traffic classes supported, a station can figure out if the allocation of additional Priority Groups is possible on the peer station.
FCoE Handbook
Page 35
The third subtype exchanged in DCBX indicates two characteristics of the sender. First, it identifies which of the message priorities should have PFC turned on. For consistency, all of the priorities in a particular Priority Group should either require PFC or not. If those requiring PFC are mixed up with those that do not, buffer space will be wasted and traffic may be delayed. The second piece of information in this subtype indicates how many traffic classes in the sender can support PFC traffic. Because the demands for PFC-enabled traffic classes are greater than those classes of traffic that do not require lossless behavior, the number of traffic classes supporting PFC may be less than the total number of traffic classes supported on a port. By combining the information in this subtype with that in the previous subtype, a station can determine the number of PFC-enabled and non-PFC-enabled traffic classes supported by a peer.
BUILDING THE DCB CLOUD The FC-BB-5 specification requires that FCoE run over lossless Ethernet. One of the first tasks of an Ethernet environment intended to run FCoE must be to determine which stations can participate in FCoE and which links can be used to connect to those stations. PFC and ETS must be supported on all links carrying this traffic, and the boundaries between lossy and lossless must be established. To accomplish this, switches and devices capable of supporting FCoE will broadcast their LLDP messages as they are initialized. These messages will include the DCBX extensions to identify themselves as being PFS and ETS capable. If a station receives an LLDP message from a peer and that message does not contain DCBX information, then the link will not be used for lossless traffic.
FCoE Handbook
Page 36
When a station receives a DCBX-extended LLDP message, it will examine the values of the parameters for compatibility. The peer must be capable of supporting the required protocols and must have like configurations for Priority Groups and PGIDs. At first, this may sound like a daunting problem, but most experts agree that just as there are best practices for other protocols, there will be best practices for determining Priority Groups, PGIDs, and PG%s. There will be mechanisms for customizing these values for special situations, but the default configuration values will be those generally agreed upon by the industry. As devices participating in the LLDP process establish which links will be used for lossless Ethernet traffic, a natural boundary will form. Within this boundary, FCoE traffic will be allowed to move between stations and switches. TCP/IP traffic will be allowed to travel within, across, and beyond this boundary. But to minimize the impact of TCP/IP on storage paths, a best practice will be to direct all IP traffic out of the cloud as quickly as possible toward nodes not within the lossless boundary. Upper Layers
Upper Layers 4
Upper layer driver Data Center Bridging parameter exchange
3
2 Auto-negotiation
Declare link UP
DCB parameter exchange
Speed negotiation
Driver initialization
Data Center Bridging parameter exchange Auto-negotiation Driver initialization
1 MAC
Upper layer driver
Ethernet link
Local Node
MAC Remote Node
Figure 10. Exchanging attributes and capabilities using DCBX.
FCoE Handbook
Page 37
CONGESTION NOTIFICATION 802.1Qau: Congestion Notification (QCN) An end-to-end congestion management mechanism enables throttling of traffic at the end stations in the network in the event of traffic congestion. When a device is congested, it sends a congestion notification message to the end station to reduce its transmission. End stations discover when congestion eases so that they may resume transmissions at higher rates. NIC Bridge
NIC
Congestion
NIC RL CNM Bridge
Bridge
Bridge NIC
PFC
NIC from IEEE DCB tutorial
CNM (Congestion Notification Message) Message is generated sent to ingress end station when a bridge experiences congestion RL (Rate Limiter) In response to CNM, ingress node rate-limits the flows that caused the congestion
Figure 11. Achieving lossless transport with PFC and CN. It is important to note the QCN is a separate protocol independent of ETS and PFC. While ETS and PFC are dependent on each other, they do not depend on or require QCN to function or be implemented in systems.
FCoE Handbook
Page 38
CHAPTER 4: TRILL—ADDING MULTI-PATHING TO LAYER 2 NETWORKS The ever-increasing adoption of virtual environments in the data center necessitates a more resilient Layer 2 network. Efficient and reliable L2 infrastructure is needed to support the I/O demands of virtual applications, especially when applications are migrated across servers or even different data centers. Today’s STP-based networks limit the available network bandwidth and fail to maintain reliable complex networks architectures. IETF is developing a new shortest path frame L2 routing protocol for multihop environments. The new protocol is called Transparent Interconnection of Lots of Links, or TRILL, and it is expected to be completed in the second half of 2010. TRILL will enable multi-pathing for L2 networks and remove the restrictions placed on data center environments by STP (single path) networks. Data centers with converged networks will also benefit from the multi-hop capabilities of TRILL Routing Bridges (RBridges) as they also enable multihop FCoE solutions.
WHY DO WE NEED IT? Resiliency in a network is always good. In the event of a failure in any one component, it’s important to have sufficient resources to allow the network to continue functioning around the problem areas. This is especially important in converged networks with storage and IP resources, where the interruption of service may result in the disastrous failure of one or more server platforms. And to reap the greatest advantage from a network, an administrator would like to be able to use all of the available capacity of the network for moving traffic. If there are multiple paths of lowest equal cost through a network from Point A to Point B, in a best-case scenario all of those paths would be used. Unfortunately, the evolution of Ethernet has introduced restrictions that do not always allow for either the maximum resiliency or the highest capacity in a network.
FCoE Handbook
Page 39
Because Ethernet was originally a flat point-to-point topology across a single segment of shared media (see Figure 12), you didn’t need to be concerned about multiple paths through the network. Each node was logically connected directly to each of its peers with no intermediary devices along the way. That meant that the Ethernet protocol could ignore cases in which multiple paths from a source to a destination were available. As a result counters and headers for management metrics such as hop count and time-out values were unnecessary and were not included in the standard Ethernet frame.
Figure 12. Basic Ethernet topology. As Ethernet deployments became more common, new devices were introduced into the infrastructure to enable larger networks. Analog repeaters and digital bridges began to appear, and as they did, new complexities began to surface. For example, with these devices, you could design a network where there was more than one physical path from any one source to a destination (see Figure 13).
FCoE Handbook
Page 40
Figure 13. More complexity in the Ethernet topology. The problem was that a network device receiving a frame on a port didn’t know if it had seen that frame before. This introduced the possibility of a single frame circulating throughout the network indefinitely (see Figure 14). Left unchecked, a network would soon be saturated with frames that couldn’t be removed because they couldn’t be identified or limited.
Figure 14. Circulating indefinitely ….
FCoE Handbook
Page 41
To address this problem, a logical topological restriction in the form of a spanning tree was placed on Ethernet networks. The STP protocol meant that: although there may be many physical paths through the network at any given time, all traffic will flow along paths that have been defined by a spanning tree that includes all network devices and nodes (see Figure 15). By restricting traffic to this tree, loops in the logical topology are prevented at the expense of blocking alternative network paths.
Figure 15. Need to detect and eliminate loops.
FCoE Handbook
Page 42
While STP solves the problem of traffic loops, it prevents network capacity from being fully used. Algorithms that calculate this spanning tree may take a lot of time to converge. During that time, the regular flow of traffic must be halted to prevent the type of network saturation described above. Even if multiple simultaneous spanning trees are used for separate VLANs to better distribute the traffic, traffic in any one VLAN will still suffer from the same disadvantage of not being able to use all of the available capacity in the network.
INTRODUCING TRILL To eliminate the restriction of a single path through the network, the IETF formed a working group to study this problem. The official documentation states the goal of the group this way: “The TRILL WG will design a solution for shortest-path frame routing in multi-hop IEEE 802.1-compliant Ethernet networks with arbitrary topologies, using an existing link-state routing protocol technology.” In simpler terms, the group was charged with developing a solution that: •
Uses shortest path routing
•
Works at Layer 2
•
Supports multi-hopping environments
•
Works with an arbitrary topology
•
Uses an existing link-state routing protocol
•
Remains compatible with IEEE 802.1 Ethernet networks that use STP
The result was a protocol called TRILL. Although routing is ordinarily done at Layer 3 of the ISO protocol stack, by making Layer 2 a routing layer, protocols other than IP, such as FCoE, can take advantage of this increased functionality. Multi-hopping allows specifying multiple paths through the network. By working in an arbitrary topology, links that otherwise would have been blocked are usable for traffic. Finally, if the network can use an existing link-state protocol, solution providers can use protocols that have
FCoE Handbook
Page 43
already been developed, hardened, and optimized. This reduces the amount of work that must be done to deploy TRILL. Core
Aggregation
Access
Servers
Figure 16. TRILL provides L2 multi-pathing. Just as important is what TRILL doesn’t do. Although TRILL can serve as an alternative to STP, it doesn’t require that STP be removed from an Ethernet infrastructure. Most networking administrators can’t just “rip and replace” their current deployments simply for the sake of implementing TRILL. So hybrid solutions that use both STP and TRILL are not only possible but will most likely be the norm at least in the near term. TRILL will also not automatically eliminate the risk of a single point of failure, especially in a hybrid architecture. The goals of TRILL are restricted to those explicitly listed above. Some unrealistic expectations and misrepresentations have been made about this technology, so it’s important to keep in mind the relatively narrow range of problems that TRILL can solve. Simply put, TRILL enables only two things: •
Multi-pathing for L2 networks
•
Multi-hopping that can benefit FCoE
Anything else that you get with a TRILL deployment is a bonus! FCoE Handbook
Page 44
THE TRILL PROTOCOL Like other protocols, TRILL solutions will have three components; a data plane, a control plane, and devices that implement the protocol.
TRILL Encapsulation TRILL encapsulation turns Ethernet frames into TRILL frames by adding a TRILL header to the frame. The new TRILL header (see Figure 17) is in exactly the same format as a legacy Ethernet header. This allows bridges (switches) that are not aware of TRILL to continue forwarding frames according to the rules they’ve always used. The source address used in the TRILL header is the address of the RBridge adding the header. The destination address is determined by consulting tables built by the linkstate routing protocol. A new EtherType is assigned to TRILL. Note also the HC (hop count) field, a 6-bit field that allows for 64 hops. The HC field is used to prevent the formation of loops on the VLAN or the premature discarding of frames. Header length: 64 bits Outer MAC DA Outer MAC SA Outer VLAN tag Etype = TRILL (TBA)
6 octets 6 octets 4 octets 2 octets 2 octets
V
2 octets 2 octets 6 octets 6 octets 4 octets 2 octets
Egress Rbridge Nickname Ingress Rbridge Nickname Inner MAC DA Inner MAC SA Inner VLAN tag Type/Length
R
M
OL
HC
Variable
Payload
4 octets
CRC
Outer MAC header
TRILL header Inner MAC header
• Nickname: auto
configured 16-bit local names for RBridges
• V = Version (2 bits) • R = Reserved (2 bits) • M = Multi-destination (1 bit) • OL = Options Length of TRILL options (5 bits)
• HC = Hop Count (6 bits)
Figure 17. Anatomy of the 64-bit TRILL header
FCoE Handbook
Page 45
Link-State Protocols As noted earlier, TRILL will use link-state protocols to form the control plane of TRILL. The purpose of the control plane is to distribute the VLAN configuration to all the RBridges on the VLAN. Link-state protocols also continuously monitor the VLAN configuration and adjust the configuration data base in the event of changes. The control plane also provides the algorithms used to calculate the shortest path between any two RBridges on the VLAN. Considering the fact that TRILL will be used in converged environments where storage and TCP/IP networks are deployed, you can expect that link-state protocols from both worlds will be utilized by TRILL
Routing Bridges Routing bridges are a new type of L2 devices that implement the TRILL protocol, perform L2 forwarding, and require little or no configurations. Using the configuration information distributed by the link-state protocol, RBridges discover each other and calculate the shortest path to all other RBridges on the VLAN. The combination of all calculated shortest paths make up the RBridge routing table. It is important to note that all RBridges maintain a copy of the configuration database, which helps reduce convergence time. When they discover each other, RBridges select a designated bridge (DRB), which in turn assigns a designation for the VLAN and selects an appointed forwarder (AF) for the VLAN. Although the DRB can select itself as an AF, there can only be a single AF per VLAN. The AF handles native frames on the VLAN.
MOVING TRILL DATA TRILL works by adding a new header to the beginning of an Ethernet frame. This new header is added to the frame when the first RBridge encounters it. Note that this can happen if a host is directly connected to an RBridge or if a non-routing Ethernet segment hands the frame off to an RBridge—either way. If the frame can remain in a non-routing Ethernet segment without ever touching an RBridge, then no header is added and it isn’t really necessary.
FCoE Handbook
Page 46
Using the original destination address as a key, a list of eligible next-hop RBridges is determined. This list contains the RBridges that could be the next step along all least-cost paths moving to the final destination. If more than one RBridge is in the list, a hash is used to distribute the traffic load and guarantee that all traffic in a single stream stays on the same path to avoid reordering overhead. The RBridge that results from this is placed in the TRILL header and the frame is sent on. MAC-RB3 MAC-RB4
MAC-RB1
MAC-RB1 RB3 RB1 MAC-T MAC-H
2. RB1 adds TRILL header and outer MAC header
RB3
3. RB3 removes TRILL header
RB1 MAC-T MAC-H Data
RB5
RB4
Data RB3 MAC-T MAC-H Data
RB1
RB2
4. Target receives frames
1. Host sends frames Host
Target
MAC-T MAC-H Data
Figure 18. TRILL adds a new header to the beginning of an Ethernet frame. The new TRILL header (see Figure 18) is in exactly the same format as a legacy Ethernet header. This allows bridges (switches) that are not aware of TRILL to continue forwarding frames according to the rules they’ve always used. The source address used in the TRILL header is the address of the RBridge adding the header. The destination address is determined by consulting the tables built by the link-state routing protocol.
FCoE Handbook
Page 47
When a frame with a TRILL header is received by an RBridge, the RBridge removes the header and examines the original source and destination addresses. It then creates a new TRILL header using the method described above and forwards the frame. The last RBridge receiving a frame prior to the delivery of the frame to either the destination or the local segment that connects to the destination removes the TRILL header and forwards the frame (see Figure 19). RB4
RB5
RB3 RB1
RB2
(AF VLAN 1)
Host
RB
(AF VLAN 2)
Target
TRILL Encapsulation
RB B
Figure 19. TRILL header added and then removed at the last RBridge.
SUMMARY TRILL is a new draft standard being created by IETF. The goal of TRILL is to create an L2 shortest path routing protocol to replace STP and enable L2 multi-pathing capability. The more resilient L2 will fulfill the needs of virtualized applications and data migration. It will also enable multi-hop capabilities for FCoE that will drive the expanded adoption of the new technology in converged network environments.
FCoE Handbook
Page 48
FCoE Handbook
Page 49
CHAPTER 5: DELIVERING DCB/FCOE It is inevitable that most new technologies will bring with them the need for new software and hardware implementation to serve as delivery vehicles for the new protocols. DCB and FCoE are no exception. Delivering server I/O consolidation based on DCB/FCoE requires new class of server adapters called Converged Network Adapters (CNAs). It also requires a new class of network switches to handle FCoE and support DCB features.
Converged Network Adapters (CNAs) In data centers today, servers use two separate I/O adapters for storage and networking activities. Host Bus Adapters (HBAs) handle storage-related chores and Network Interface Controllers (NICs) handle networking chores. NICs and HBAs are viewed separately by the server operating system and each uses a separate set of drivers and its own type of cables and connectors. But managing two different I/O domains is neither economical nor practical. Converging storage and network traffic onto a shared lossless Ethernet link makes sense, but requires the use of new CNAs that enable different types of I/O traffic to share a common link. As shown in Figures 20 and 21, CNAs are PCI adapters that physically look like 10 GbE adapters but are really very different. CNAs provide users with 10 GbE ports for connectivity to switches. The physical appearance of CNAs hides an internal architecture that consists of three building blocks: •
First, the FC module handles FC storage traffic and gives the CNA its HBA-like personality.
•
Then, the NIC, or 10 GbE module, handles networking tasks and gives the CNA its NIC-like personality.
•
Finally, an FCoE module handles all matters related to FCoE, such as encapsulating FC frames into Ethernet (FCoE) frames. The FCoE module is an internal entity that connects the FC and NIC modules.
Because the FCoE entity is internal to the CNAs and doesn’t have contacts with the PCI bus, FCoE is not exposed externally. As a result, the server operating system is not aware of FCoE, but views CNAs as adapters with
FCoE Handbook
Page 50
two identities or personalities. The server operating system sees the FC and NIC drivers and handles CNAs as if each contained a NIC and an HBA. In other words, the operating system view of the I/O world does not change with the use of CNAs. This is critical to ensuring non-disruptive introduction of FCoE into existing data center environments. It means that IT professional can continue to deploy applications using FCoE without modifications. They can continue to use management tools, which are not affected by the introduction of CNAs. Simply put, CNAs connect servers to FCoE switches. CNAs are responsible for encapsulating FC traffic into FCoE frames and forwarding them to FCoE switches over 10 GbE Ethernet links as part of converged traffic.
FCoE
10 GbE NIC 10 GbE
FC
PCIe
Figure 20. High level block diagram of Brocade 1020 CNAs. Ethernet drivers Physical
FC drivers Physical
Operating System Server
Figure 21. CNAs present two drivers to server operating systems.
FCoE Handbook
Page 51
DCB /FCoE Switches New DCB/FCoE switches are designed with to introduce converged I/O into data centers seamlessly without disruption to existing environment. From a physical standpoint, these new switches are different from traditional Ethernet or FC switches. They provide 10 GbE ports for server connectivity in addition to Fibre Channel ports for storage connectivity. Similar to CNAs, DCB/FCoE switches (see Figure 20) contain three major functional blocks dedicated to servicing FC, DCB, and FCoE. The Fibre Channel module is in fact a traditional FC switch with familiar FC services. The FC module is in fact a full switch and should be capable of connecting directly to FC storage SANs. The capabilities of the switch FC module are essential for ensuring seamless integration of FCoE into existing environments. Note that DCB/FCoE switches are Link Layer (L2) switches at heart and can be deployed as Ethernet switches. On top of the L2 layer are the Ethernet enhancements called DCB. And the switches also deliver FCoE and FC protocol support. In short, the DCB/FCoE switches can be considered multiprotocol L2 switches. The switch 10 GbE ports are used for server connectivity and receive incoming converged LAN and SAN traffic destined for either the corporate LAN network or the shared SAN storage. When they arrive at the FCoE switch, the traffic EtherType field is inspected by the FCoE entity onboard. When FCoE frames with FC storage data are detected, the FCoE entity deencapsulates the FC frames by stripping off encapsulation information, and then forwards FC frames to their destination over FC ports. LAN traffic is forwarded over the Ethernet ports to its intended destination.
FCoE Handbook
Page 52
The current generation of DCB/FCoE switches are first-hop devices. This means that they can’t route FCoE traffic to other switches. Current devices receive a stream of converged traffic, inspect it, and then divide the data into two separate streams, one for FC storage traffic and the other for LAN traffic. When the data leaves the switch it is either in the form of FC frames or Ethernet frames. FC port
FC port
FC port
FC port
FC switch
FCoE
Data Center Bridging switch DCB port
DCB port
DCB port
DCB port
Figure 22. High-level block diagram of the Brocade 8000 FCoE switch.
FCoE Handbook
Page 53
CHAPTER 6: FCoE IN THE DATA CENTER The benefits of FCoE and DCB are most visible to data center professionals: they can now consolidate server ports and the related cabling and switch ports. Server I/O consolidation relies on new DCB/FCoE technologies to connect servers to corporate LANs and shared Fibre Channel SAN storage using FCoE switches. You can see the advantages of FCoE and DCB deployment in data center operations and management: •
Cost of acquisition (CapEx). Data center managers will see reduced upfront capital outlays with FCoE and DCB, since they will need fewer server adapters (CNAs), fewer cables, and fewer switch ports (switches) compared to technologies used today. And in the future when 10 GbE costs go down, these cost savings will be even greater.
•
Operating costs (OpEx). Data center managers see higher and higher utility bills due to high power consumption and cooling and it is one of their highest operating expenditures. FCoE and DCB offer relief in this area, as having fewer hardware components installed in each server will result in lower power consumption and cooling needs. The same applies on the switch side. Other operating costs that will start to decrease with FCoE and DCB are maintenance, SKU management costs, and other asset management expenses.
FCoE Handbook
Page 54
•
Other cost savings. Data centers will realize significant savings when FCoE is deployed, since they will be able to continue using existing Fibre Channel management tools and not incur retraining costs. Less often cited are the time savings realized from dealing with simpler configurations and a much less cluttered environment—a result of reduced cabling and cable management. Troubleshooting and diagnostics can be performed more easily in environments in which technicians can identify and correct problems more quickly. The reality is that simpler cabling helps reduce the potential for human error.
As an encapsulation protocol, FCoE will perform its functions with some performance overhead above that of the native FC protocol that it encapsulates. In addition, FCoE represents the second attempt (after iSCSI) to converge storage data and LAN traffic over shared Ethernet links. The reality is that data centers with genuine need for high-performing 8 Gbps FC will question the benefits of sharing a 10 GbE link with LAN traffic. For that reason, it is expected that FCoE will most likely be deployed in environments currently using 4 Gbps FC and 1 GbE links. Like most new technologies, FCoE is a new technology that enterprises will first test and deploy in the part of their networks in which some risk can be tolerated before they expand the deployment to other areas. It is expected that in the near term, FCoE will find a home in new server deployments in Windows and Linux environments with virtualized tier 3 and some tier 2 applications.
FCoE Handbook
Page 55
Where Will FCoE Be Deployed? When you look at data center traffic, you can see tier 3 servers providing Web access and generating traffic primarily made up of TCP/IP data. CNAs can easily service this class of servers and related traffic. Tier 2 application servers with business logic applications tend to host applications of greater value to enterprises. These servers are normally connected to LANs and SANs, since their traffic is divided between storage and TCP/IP. Some tier 2 applications are good candidates for convergence and would realize great benefits from server I/O consolidation using CNAs. On the other hand, tier 1 database servers host database applications that support enterprises mission-critical business functions. It makes sense for businesses to deploy mature technologies for tier 1 servers and applications. Also, tier 1 applications have a genuine need for performance and processing power that makes them suitable for higher performance and reliable I/O technologies. It is unlikely that FCoE will find a home in tier 1 environments.
Tier 1 Servers Data Base Servers Sample Applications
Tier 2 Servers Business Logic, Applications
FCoE
Tier 1
Tier 2
Billing System
X
Inventory Systems
X
Research
X
X
E-mail
X
Test and Development
X
Tier 3 Servers Web Access
Tier 3
FCoE
X X
Figure 23. Model tiered data center and sample applications
FCoE Handbook
Page 56
Top of Rack The Brocade 8000 Switch is a DCB/FCoE switch that delivers server I/O consolidation to data centers. As noted earlier, such devices are basically L2 switches, so the first deployment model for the Brocade 8000 is a topof-rack (ToR) deployment functioning as an Ethernet switch. In this configuration, the Brocade 8000 performs as a standard Ethernet switch providing server connectivity and delivering 10 GbE performance, as shown in Figure 24.
Figure 24. Brocade 8000 top-of-rack Ethernet deployment.
FCoE Handbook
Page 57
The most likely deployment scenario for the Brocade 8000 is as top of rack in server I/O environments. In this configuration, the Brocade 8000 offers 10 GbE server connectivity and 8 Gbps Fibre Channel connectivity to shared SAN storage. Using the configuration shown in Figure 25, data centers can simplify server I/O environments with fewer cables and ports. And IT managers can take advantage of the benefits of convergence provided by the Brocade 8000.
Figure 25. Top-of-rack Brocade 8000 in a server I/O environment.
FCoE Handbook
Page 58
End of Row The Brocade FCOE10-24 Blade is a DCB/FCoE blade for the Brocade DCX or DCX-4S Backbone. It brings DCB/FCoE capabilities to the backbone platforms and enables end-of-row (EoR) convergence, shown in Figure 25. It uses a built-in FCoE hardware engine to deliver Fibre Channel data to SANs using external FC ports available on other blades on the Brocade DCX. With 24 x 10 GbE ports, the FCOE10-24 also enables high-performance server connectivity.
Figure 25. Brocade FCOE10-24 Blade in an end-of-row DCB/FCoE deployment.
FCoE Handbook
Page 59
View more...
Comments