Purchase the MEF-CECP Exam Today!
Home

Login/Register

Broadband World Forum MEA 2012

Recent Members

Online Users

  • Hieu Hieu
1 user(s) and 1699 guest(s) online | Show All
4 days ago
Kirby Russell and Eric Beissert are now friends 04:21 PM
1 week ago
Marlon Roa joined the group Hot Companies (public group) Jan 26
 
Follow us on Twitter
Capacity Magazine Business Briefing Edition
End to End Service Restoration Print E-mail
(8 votes, average: 4.75 out of 5)
Papers - Ethernet Academy Articles
Friday, 06 November 2009 19:21

End to End Service Restoration

By Kireeti Kompella and Hannes Gredler


Content Disclaimer

Executive Summary

Packet based transport solutions have arrived at the edge of the network. Historically, this part of the network has been built using SONET/SDH or ATM technology. The economics of packet based networks will make both SONET/SDH and ATM obsolete in the network build outs to come.

Whenever you introduce a new technology into a network, you must retain the benefits of the technology it is replacing. Among the missing features between legacy transport networks and packet based technologies are the ability to detect failures quickly, and the ability to provide restoration capabilities within 50ms. While path oriented solutions like SONET/SDH provide path restoration capabilities in the order of 50ms, they fail to restore end-to-end services like pseudowires, VPLS and VPNs in the order of 50ms.

This paper discusses general requirements for resilient system design and fast service restoration in an IP/MPLS network. Fast fault detection is a crucial component of fast restoration; there are several solutions to this problem, depending on the link type (e.g., for Ethernet one could use IEEE 802.1ag) and the network layer at which detection is desired (e.g., at the IP layer, one could use bidirectional forwarding detection (BFD)). This paper does not discuss fast detection, focusing instead on the restoration problem once a failure is detected. Many of the techniques for fast restoration (for example, loop free alternates within an IGP) have already been implemented. The others are well within the reach of IP/MPLS implementations assuming certain characteristics of the underlying platform (such as indirection in the routing platform’s forwarding information base). The paper also introduces a novel MPLS-based local repair concept using a context-based MPLS forwarding information base for restoring end-user services within of 50ms after any link or node failure.

 

Network Restoration Requirements

A packet-based multiservice network must be able to transport real-time traffic. The building blocks to achieve this – by greatly improving restoration capabilities – are at hand. In this section, the key requirements for network restoration will be highlighted.

 

Service Restoration

An MPLS network serves today both as transport and as service. In Figure 1, the dependencies of pseudowires (a popular MPLS service) are illustrated.

Figure 1: The pseudowire (service) is dependent on the LSP (transport)

 

Service providers have to take measures to protect both their own infrastructure (P routers, for instance, and links connecting PE and P routers) and the pseudowire service between customer endpoints. Both infrastructure and services need to be protected, because a service provider cannot make any assumptions about the nature of the carried traffic in the pseudowire segment. The pseudowire serves as a transport pipeline in the network of the customer that subscribed to the service between point 1 and point 2.

 

Local Repair and Pre-provisioning 

Implementing 50ms restoration time on a packet processing device is a daunting task. Consider Figure 2. This illustrates the propagation of a network event (e.g., a link-down signal) within a modern packet processing system.

Figure 2: A Typical Propagation of a Network Event in the Routing Engine and the Line Card (varies by vendor and system)

 

Most systems are highly distributed in nature, and spreading a network event like a downed interface, plus IGP and BGP recalculation, may add up to orders of 100s of ms. It is the routing software in the control plane that used to make route recalculations and initiate all forwarding state changes in the system. For meeting the 50ms boundary two prerequisites need to get established:

  • Pre-populating the FIB with a viable backup path

  • Establishing a shortcut path (local repair) for FIB updates enabling backup path switching

Reconsidering Figure 2, this means that after the link driver reports the link down message, the ukernel starts to activate the backup paths into the ukernel FIB immediately. This is local repair path. In parallel, the link-down signal is processed by all the software components in the system and ultimately the routing software will recalculate the entire network topology based on the new network state. However, using the shortcut path for FIB updates, traffic is restored within the 50ms boundary.

The term local repair for the shortcut FIB update path will be used throughout this document.

 

Enabling Constant Local Repair Time: Indirection

Figure 2 actually provides an idealistic view of the route update timing in the system. The underlying assumption is that not too many routes depend on the link that has been going down. The possibility of a linear dependency between the number of routes and the restoration time in the control path must be averted.

To guarantee 50ms restoration times, the FIB data structure in the ukernel and the forwarding ASICs need to provide one or more levels of indirection between interfaces and its dependent routes. Figure 3 illustrates this indirection.

Figure 3: Indirection in the FIB Data Structure

 

Rather than changing each route-to-interface pointer individually, all routes point to an indirection node within the data structure. Once an interface goes down, only one pointer (the one between the indirection node and the interface) needs to be updated. Indirection in the data path provides the constant time backup path activation on which local repair relies.

 

Choosing Node Protection over Link Protection

Service providers have diverse requirements for protection. For a large percentage of service providers, a link protection solution is considered sufficient, whereas the protection of entire network nodes would be considered desirable.

On the other hand, node protection seems to be more expensive in terms of computation and synchronizing label databases. In judging whether link or node protection is the optimal approach, one needs to first look at the micro properties of these mechanisms.

Link protection provides a backup path to the same neighbor - given that at least one further path reaching that very same neighbor exists. There is a hidden assumption in here: The Point of Local Repair (PLR) blindly hopes that the protected node does not become fully disconnected.

Node protection tries to completely bypass a neighboring router. There is no assumption here: a single link failure treats the entire network node as if it is lost. Node protection is thus more universal and independent compared to link protection: it does not make assumptions as to the reachability of a node in a backup case. In a later section you will see that this property is highly beneficial in protecting label switched path (LSP) endpoints.

 

End to End Service Protection: Building Blocks 

In the previous section, all the external visible requirements of a resiliency solution have been highlighted. Before describing the individual building blocks towards an end-to-end service restoration, a common network model plus taxonomy is defined.

 

Scaling the Network

Introducing service restoration capabilities in flat, small networks is a relatively simple thing. However, it becomes a large problem once it has to be done at a certain scale. Doing restoration in a 50-node continental network is a much easier problem to solve than doing it at a scale of 50,000 routers with a global span.

To seriously tackle rapid service restoration, one needs to assume that the network consists of multiple OSPF areas or IS-IS levels, and perhaps multiple autonomous systems (AS). In this paper, we refer to such a network as a multi-region network, where the term region may be used interchangeably for an OSPF area or an IS-IS Level within an autonomous system, or it may mean an AS. 

Partitioning the large network into multiple smaller regions is the scaling tool of choice here.

The art of scaling is to make the partitioned regions independent of each other, so that the incremental addition of regions does not negatively impact the overall service restoration capabilities and scaling properties of the resulting network.

In the next section, a generalized model for inter-region networking is introduced.

 

A Generalized Model for Multi-region Networking

Figure 4 illustrates a generalized model for partitioning a network into smaller regions. The large network has been split up into three regions. – Two edge regions for terminating services and a core region for providing transport between the edge regions. In this example each region is an autonomous system, connected together to form a multi-region network. The connectivity in this example is provided by Autonomous System Border Routers (ASBRs).

Figure 4: A Generalized Model for Partitioning a Network into Smaller Regions

 

Each multi-region network features four planes of routing information. The lowest plane contains the Local IGP and the Local transport LSP (provided by protocols like OSPF, IS-IS and LDP, RSVP). We refer to this plane as the Intra-region plane. The intra-region plane stays independent from all routing and topology information of the other regions. No attempt is made to leak any of the local intra-region information from one region to the other. That decision has an interesting property: The local IGPs and Local Transport LSPs can perform a local repair in the knowledge that the repair will not affect inter-region traffic. This is a key requirement for end-to-end service restoration.

The “glue” that provides integration of the intra-region plane with the rest of the system is the Inter-region Transport plane. The purpose of this plane is to provide MPLS connectivity between service endpoints (PE routers) when these endpoints are in different regions. This plane is independent from the Intra-region plane in the sense that only service endpoints that require connectivity to other service endpoints, plus inter-region border routers, need to take part in this plane.

The top level plane is the Service plane. This plane conveys service-related routing information. All the service-terminating routers need to take part in this plane. If the inter-AS model is Option B or Option C, the ASBR routers may take part in the service plane as well.

In order to reduce forwarding state for inter-region Transport LSPs, an optional plane called the Aggregation Transport plane is introduced. The sole purpose of this plane is to reduce forwarding state of the region border routers. In a large network consisting of up to 100,000 PE routers, the use of the Aggregation Transport plane may enable forwarding state reduction by three orders of magnitude for region border routers such as ASBRs and ABRs.

Notice that in Figure 4 there are no protocol names tied to the respective planes. This model is generic enough to support anylabel distribution protocol. This can be inter-AS RSVP, labeled BGP, or hierarchical LDP. Thus, providers can incrementally transition an existing network to this model. Unless noted otherwise, the examples in this paper assume that labeled BGP is used as the label distribution of choice to connect the various regions together.

Figure 4 illustrates a multi-region network, where each region is an autonomous system, and the regions are connected by ASBRs. Note that as per our definition of a region earlier, a region may also be a given level or area in single AS. Figure 5 illustrates the single-AS multi-region network, where each region is an IS-IS level or an OSPF area.

Figure 5: A Single AS Multi-region Network

 

The most obvious difference between Figure 5 and Figure 4 is that in Figure 5, the two ASBRs present in Figure 4 get folded into a single ABR present in Figure 5. What remains is the independence of the four MPLS planes.

The biggest change to existing popular deployment practice is to disable the leaking of IGP prefixes and LDP FECs between the regions. In fact, the label distribution protocol for the local transport LSP is divided into two independent instances - a transit-facing instance and an edge-facing instance.

For the next sections, Figure 4 serves as a reference model. On top of that reference model, we will describe protection for inter-region LSPs with various individual protocol stack combinations.

However, before describing an inter-region protection mechanism, we need to introduce a local intra-region protection method. The next section introduces such an intra-region protection method called “Loop Free Alternates” (LFA). Later we will illustrate the use of LFA for the protection of inter-region LSPs.

 

Intra-Region FRR: Loop Free Alternates

Common IGPs (like OSPF and IS-IS) only calculate the best path, or alternatively a set of equal cost paths between a given source/destination router pair. Most implementations also support equal cost multi-path (ECMP) routes.There is no reason why an equal cost multipath (ECMP) route could not be used as a mechanism for connectivity restoration when one of the paths becomes unavailable.Expanding on that model, it turns out that even less than equal cost routes (LECMP) can be used for this purpose. AS long as the LECMP route does not cause a forwarding loop, it can be used. In order to avoid forwarding loops, a router needs to execute additional calculations to verify that an LECMP route does not generate a forwarding loop, and may therefore be elected as a viable backup route. A LECMP route that does not cause forwarding loops is called a Loop Free Alternate.

 

LFA Coverage Gap

Based on the actual network topology, a certain number of loop-free LECMP routes for a given source/destination router pair may exist. However there also may be coverage gaps where no loop-free alternate route does exist.  If a service is required to meet a particular SLA, the network can be evaluated for LFA coverage gaps before implementation, and adjustments to the topology can be made to ensure the service meets the given performance requirements. If appropriate adjustments cannot be made (for example, due to insufficient capacity), services with stringent availability requirements may not be admitted.

There are two techniques to fix a topology to provide 100% LFA coverage:

  • Add additional links to the topology
  • Add additional label switched paths to the topology

Adding physical interfaces comes at a certain cost; however, this may be desired if the backup case might drive certain parts of the network into congestion. Adding an LSP to the topology comes almost for free, given that there is enough spare capacity for the backup case.

Also, adding LSPs produces instantly better coverage because an existing TE shortcut computation can forward traffic down these extra LSPs. An SPF implementation using TE LSPs for coverage extension should be configurable to use TE LSPs for primary and backup next hops.

Special care needs to be taken that the local forwarding next hop of the TE LSP does not share fate with a primary next hop for a route. Otherwise, the use of the TE LSP as a backup will itself fail.

Adding TE LSPs on demand is a reactive measure. In addition to configuring the LSP, one needs to make sure that LDP routes are learned and resolved over the LSP. This requires configuring a targeted LDP session, which is again a reactive configuration measure at both ends of the LSP. In the next section, the problem of lack of LFA coverage will be illustrated. Furthermore we will introduce a simple extension for auto provisioning LSPs in order to get full LFA coverage.

 

Automated LFA Coverage Surge

Figure 6 shows a basic network that runs LDP as a local label transport protocol. This is a classical example of a network that does not have an LFA for the {PLR, R1} source destination pair. The shortest path between the PLR and R1 routers is via R4 at a cost of 9. Router R5 cannot be the LFA for R1 because it will bounce traffic back to the PLR. How could an automated shortcut setup plus automated LDP learning work?

Figure 6: Need to Find a Loop Free Alternate

 

First, the PLR needs to compute the list of next-next-hops (NNH) for a node failure of R4. The NNH list can be extracted the link-state database and the result is {R2, R3 and R5}. R5 is eliminated from the list because it is a direct neighbor.

Next, two RSVP LSPs to R2 and R3 are signaled. Because R2 and R3 are not direct neighbors, the label to FEC mappings must get discovered. This is being done by signaling a targeted LDP session to R2 and R3. Of course, R2 and R3 need to accept those targeted sessions. This “anonymous connect” capability is signaled using an IGP capability code. For security purposes, “anonymous connect” targeted LDP sessions should be restricted to specific source IP loopbacks, as advertised in the IGP.

Finally, the backup routes are programmed into the FIB. All LDP routes which go over R4 get an associated backup next-hop that contains the corresponding RSVP LSP plus the LDP label that was learned form the LSP end.

 

Label Switched Path Endpoint Protection

In the previous section, a method for protecting nodes and links that are transit with respect to a given LSP has been highlighted. MPLS-based services need a way of protecting the node that is an egress (tail) of a given label switched path. This requirement is especially visible in a multi-region network; such a network may have a large number of LSPs that terminate at boundary nodes connecting the regions.

Consider the simple network in Figure 4. There is a total of 8 LSP termination points. First, there are PE routers that terminate LSPs for the top-level service plane. Next, there are LSPs in the inter-region planes that are terminated at the two service end (PE) routers. Finally, there are LSPs in the intra-region planes that are terminated at ABRs for each region.

In the following section we will explain in a step by step fashion the principle of label switched path endpoint protection.

 

Basic Principles and Theory

Figure 7 shows a simple multi-region network that consists of two regions. The network uses BGP as the service protocol between PE routers, and labeled BGP as the label distribution protocol between regions. Thus, there are inter-region transport LSPs between PE4 and PE1, between PE4 and PE2, and between PE4 and PE3 that are established with labeled BGP.

In addition, within each region there are intra-region LSPs established with LDP. Specifically, in AS2 there are intra-region LSPs from PE4 to ASBR3, and from PE4 to ASBR4. Thus, ASBR3 and ASBR4 act as endpoints for these intra-region LSPs.

We will use this sample network to illustrate the basic problem of LSP endpoint protection, and demonstrate how it could theoretically be solved with local repair using backup context, similar to the label-to-FEC mappings that JUNOS uses in its support of loop free alternates.

As part of the process for setting up inter-region LSPs, once ASBR 3 receives (via EBGP) from ASBR1 label bindings for FECs associated with the loopback addresses of routers PE1, PE2, and PE3, ASBR3 allocates labels for these FECs, and then advertises <FEC, label> bindings into IBGP inside its own AS. These bindings are shown in the table by ASBR3 in Figure 7.

ASBR4 is going to follow the same course of action. An important point to keep in mind is that ASBR3 allocates these labels on its own, without any coordination with ASBR4; likewise, ASBR4 allocates these labels on its own, without any coordination with ASBR3.

Figure 7: A Multi Region Network Requiring LSP Endpoint Protection 

 

The challenge here is how to provide a node protection for ASBR3. Let’s assume that the ingress PE router (PE4) has made a decision to use the inter-region LSP that goes through ASBR3.

We will assign a router between PE4 and ASBR3 the job of being the point of local repair (PLR). Let’s further assume that the PLR by some “magic” knows that ASBR4 is backing up ASBR3. Thus, when ASBR3 fails, we want the PLR to forward to ASBR4 all the traffic carried over the intra-region LSP whose endpoint is ASBR3.

However, doing this could result in either misrouting, or black holing, or both. This is because the labels used by the inter-region LSPs that are carried over that intra-region LSP are meaningful only to ASBR3.

To illustrate this point, let’s assume that after the failure of ASBR3 the PLR would start to forward to ASBR4 all the traffic carried over the intra-region LSP whose endpoint is ASBR3. One of the inter-region LSPs that are carried over that intra-region LSP is the LSP associated with FEC 10.1.1.1/32: that LSP has label 16 (the label assigned by ASBR3). When ASBR4 receives a packet with this label, ASBR4 will just drop the data, as its LFIB has no entry with 16 as an incoming label.

There are now two possibilities to amend this situation:

  • Make the PLR learn about the LBGP labels and label swap the two top labels of the label stack or
  • Make ASBR4 understand ASBR3 allocated labels. For letting ASBR4 forward labels that it did not assign, it needs to create an additional forwarding context in order to work around any possible label collisions.
  • Once it receives backup traffic, ASBR4 needs to identify the traffic as such, and perform label lookup according to its respective backup forwarding table.

Throughout this document we refer to ASBR3’s original label allocations and LFIB as the ASBR3 native forwarding context.

Figure 8 illustrates how the ASBRs learn about the correct forwarding context.

Figure 8: How an ASBR Might Learn About the Correct Forwarding Context (Native or Backup)

 

ASBR3, in addition to its loopback address, is configured with a construct called a context identifier. When the feature is implemented, this might be an anycast IP address and it is used to identify a context.

The primary router “owns” the context identifier (this is indicated by the black dot). This context identifier is used for the incoming label mapping that ASBR3 advertises into IBGP for the native forwarding context on ASBR3. Thus, ASBR3 sets a BGP Next Hop in these advertisements for this context identifier. We say that ASBR3 “owns” this context identifier.

ASBR4 is also configured with the same context identifier. However, on ASBR4 this context identifier is not used for the incoming label mapping that ASBR4 advertises into IBGP – it is not used for the native forwarding context on ASBR4. Therefore, ASBR4 does not set a BGP Next Hop in these advertisements to this context identifier. We say that ASBR4 does not “own” this context identifier.

Both ASBR3 and ASBR4 advertise a route to the context identifier into the IGP.

If the ASBR owns the context identifier (e.g., ASBR3 in this case), then it advertises a route to the context identifier with an IGP metric of 0. If it does not own the context identifier (ASBR4 in this case) then it advertises a route to the context identifier with an IGP max-metric of -1. In addition, both ASBRs advertise into LDP <FEC, label> mapping for the FEC of the context identifier.

Ownership of a context identifier matters to identify the native, as opposed to the backup, forwarding context. For example, assume that IP address 10.0.0.3 is used as the context identifier “owned” by ASBR3. If traffic is sent to an MPLS LSP associated to the FEC 10.0.0.3, then if this traffic arrives on ASBR3 it will be forwarded using ASBR3’s native forwarding context. However, if this traffic arrives on ASBR4, then it will be forwarded using the backup context that ASBR4 maintains for backing up ASBR3.

As long as the backup forwarding context on ASBR4 is the same as the native forwarding context on ASBR3, there is no correctness issue for the PLR, provided that the PLR marks this traffic with the correct context label, so that ASBR4 would forward it using the backup forwarding context (rather than its native forwarding context).Now the interesting question is how the backup context is populated.

 

Populating the Backup Context

Figure 7 shows an example of inter-AS option C. The PE router gets three sources of label information: the local intra-region transport LSP, the inter-region transport LSP, and the service LSP. ASBR3 ensures that its advertisements for routes and labels associated with inter-region LSPs get their BGP Next Hop set to the context identifier owned by ASBR3. As a result of the proper route resolution at the ingress PE, the ingress PE selects ASBR3 as the egress ASBR from PE’s own AS. The PE forms a proper three-label stack, and uses it to send the data.

ASBR4 is configured to provide node protection for ASBR3, and specifically to protect all the (inter-region) LSPs that transit through ASBR3. To populate the backup forwarding context, ASBR4 uses BGP routing advertisements by ASBR3. Specifically, it uses the routing advertisements whose BGP Next Hop is equal to the context identifier that is “owned” by ASBR3 (but is also configured on ASBR4, and is associated with the backup forwarding context on ASBR4). When ASBR4 receives such a route, it verifies if it has an outgoing label mapping for the FEC of the received route. If it does, then it uses the label in the route and the outgoing label mapping to populate its backup forwarding context. Figure 9 illustrates this process.

Figure 9: Creating the Backup Forwarding Context

 

With the populated backup forwarding context, ASBR4 has now all the information that it needs in order to forward traffic downstream in case of ASBR3 failure. The prerequisite for all this to work is that the PLR sees ASBR4 on the backup path to the context identifier owned by ASBR3.

 

LFA on the PLR

There are two possibilities for making the PLR see that ASBR4 is on the backup path to the context identifier owned by ASBR3. This simplest is to make the aggregate link cost between the PLR and ASBR3 the same as that between the PLR and ASBR4.

Sometimes this is not possible, due to enforcing a certain traffic distribution. An alternative is to enable an LFA calculation such that LECMP paths to that context identifier may become available. As long as there are disjoint paths toward the two ASBRs, the LFA computation will find out that (irrespective of unequal IGP metrics), ASBR4 is a viable backup path for the context identifier owned by ASBR3.

 

Data Plane Requirements

The data plane operation for processing a packet in backup context is illustrated in Figure 10.

Figure 10: Data Plane Requirements for Processing Packets in Backup Context

The left side of Figure 10 shows a packet with three labels; an unknown payload is being processed at the backup router. The first lookup is an MPLS lookup in the global label space (mpls.0). Since the top level label is the context label, then after popping the top label ASBR performs a lookup in the backup forwarding context (backup-mpls.0), as determined by the context label.

The associated action is to POP off the context label and perform a subsequent MPLS lookup to process the inter-region label. The associated action for the second lookup is to swap the inter-region label of the backup context and forward the packet using the MPLS link layer (there is still the service label underneath) out on a given interface. This recursive lookup may require additional capabilities in forwarding hardware.

The right half of Figure 10 shows the processing of native IP routes in the backup forwarding context; this illustrates the process when an LSP endpoint is being protected. Here, the lookup of the backup label, plus the service label, results in a third lookup in order to find out about the IP Next Hop. Note that not all hardware may support mixed multiple, recursive lookups of MPLS and IP.

In the next section, the necessity to run an LFA computation on the PLR is explained.

 

Protocol Extensions

For the above procedures, no further protocol extensions are required. Providing protection for LSP endpoints is a purely local matter that does not require any further protocol extensions. The only requirement is for the backup router to receive routes and label advertisements from the master router. This can be achieved by direct control-plane sessions between a master/backup router pair.

If the exchange of routes and labels between a master/backup router pair is done through a third party router like a route reflector, then some protocol extension on the route reflector for reflection of second best paths may be required.

In the next sections different deployment scenarios will be illustrated to demonstrate the generic nature of our LSP protection approach. In our initial example, we already described how our approach could be used as a protection solution for inter-region transport LSPs terminating on an ASBR. In the next example, we will demonstrate that our approach is not just applicable to transport LSPs but also to service LSPs.

 

Example: ASBR Running RSVP and LBGP

Figure 11 illustrates a simple transport network using RSVP as a transport protocol. For RSVP we will apply a similar technique as we did for LDP. All that needed to be done for LDP was to make the PLR believe that ASBR4 was a transit for ASBR3. From a physical topology point of view, ASBR4 is never a transit for ASBR3; applying MPLS terminology, it is more of a FEC-transit in the sense that it forwards the data at the originally intended egress point.

Figure 11: Transport Network Using RSVP

 

The ingress PE has an intra-region transport LSP. Its FEC is the IP address 10.0.0.3 which is the context identifier owned by ASBR3. There is one further LSP that protects the interface between ASBR3 and the PLR router. This is just an ordinary facility protection LSP. Given this simple physical topology, the C-SPF computation will find no viable backup path for the facility protection LSP.

As soon as ASBR4 is configured with the context identifier owned by ASBR3, ASBR4 is going to create a forwarding adjacency with this context identifier as the IP address of the far end of that forwarding adjacency. In order to not disturb the IP control plane, this forwarding adjacency is advertised with an infinite IGP metric, such that it will never be selected a path for the IP plane. The TE-metric of the forwarding adjacency will be set to the actual IGP cost between ASBR4 and ASBR3. After ASBR4 advertises the forwarding adjacency then the backup path computation of the PLR will succeed. Note that the forwarding adjacency is a unidirectional link originated by ASBR4. In the normal IGP SPF calculation only bi-directional links are supported. However in the Constrained SPF calculation for MPLS label switched paths also unidirectional links may pass MPLS traffic.

Next the PLR will signal a LSP with an ERO via ASBR4, and terminating at the context identifier owned by ASBR3 (10.0.0.3). As soon as ASBR4 find itself on the ERO list, and also that the LSP terminates on the context identifier associated with the backup forwarding context maintained by ASBR4, ASBR4 will allocate a label which points to the backup forwarding context that ASBR4 maintains for ASBR3.

The procedure of snooping and splicing inter-region transport and service LSPs is equal to the LDP example.

Once the link between the PLR and ASBR3 goes down, the PLR will switchover all LSPs going over that link by pushing them into the facility protection LSP. The PLR does not know that this protection LSP terminates not on ASBR3, but on ASBR4, and specifically, on the backup forwarding context that ASBR4 maintains for ASBR3.When ASBR4 processes the top level it will do a recursive lookup, POP one label and SWAP the next one off the stack and forward the packet further downstream.

 

Example ABR (labeled BGP for intra-AS)

Labeled BGP is usually used for establishing inter-AS connectivity. What is being suggested here is to use labeled BGP intra-AS to solve the problem of scaling the number of PEs. It is theoretically possible to haul several thousand of prefixes through the IGP, however deployment experience has shown that the ABR functionality may become a bottleneck at some point.

Figure 12 illustrates an LBGP over LDP network. There are three local transport regions running LDP as local transport protocol. There is a two level IS-IS network and (contrary to common practice) LDP database leaking between the transit region and edge region is turned off. Route leaking for IP routes can be enabled or alternatively a summary route can be generated. In order to provide inter-region (here inter-area) connectivity, labeled BGP is deployed.

This works surprisingly simply: the ABRs just need to be configured as route reflectors such that they can pass labeled BGP routing information between its clients. In addition inter-ABR connectivity is provided by deploying either a full mesh of labeled-BGP sessions or some further route-reflectors in the Level 2 for a number of ABR routers beyond 200.

The route reflector will do a next-hop self which causes a new label getting advertised and sufficient FIB state being set up. Note that nexthop self causes a forwarding hierarchy to be built up. AS soon as we have forwarding hierarchies in the control and data plane the PFE can perform local repair actions by simply flipping the indirection pointer as illustrated in Figure 3.

Note that for every egress PE a FIB state needs to be programmed at ABR1 and ABR2. In Section “Label Aggregation & FIB state” a common scheme for reducing the number of FIB entries is introduced that is applicable for labeled BGP for intra-AS connectivity.

Figure: 12

 

Finally both the L-BGP inter-region label plus the local transport LSP towards ABR1 and ABR2 are received by PE2.

For a better incremental adoption of L-BGP it might be desirable to support a merging of planes functionality at the egress ABR. Consider PE2 does not yet understand L-BGP but vanilla LDP. ABR1 and ABR2 merge the inter-region label transport plane into the local transport plane of the right-hand edge region. Now the edge is kept untouched while the core is transitioned towards fast restoration.

 

Example 2547 VPN service

Figure 13 illustrates a basic 2547 VPN service. The goal is once again to protect against a node failure of the label switched path termination endpoint. The label switched path termination endpoint in this example is the service LSP as originated by router PE3. The PLR shall make the switchover decision without taking part in the Inter-region service LSP plane.

Figure: 13

 

On PE4 a label-association for PE3 is configured. Both PE3 and PE4 advertise LDP label bindings for their anycast IP FECs. On PE3 the FEC to 10.0.0.3 gets associated with its native forwarding context. On PE4 the advertised FEC 10.0.0.3 gets associated with the backup forwarding context. Both PE3 and PE4 advertise the customer route 172.16/16 plus Route distinguisher and a local assigned label. PE3 rewrite the BGP nexthop to its owned anycast IP 10.0.0.3. PE4 snoops PE3 announcement and populates its backup context by splicing its local IP next hops towards 172.16.0.0/16 with PE3 advertised VPN label.

Now assume the link between the PLR router and PE3 goes down. PE4 has been computed as a Loop Free Alternate for 10.0.0.3 and is sending the traffic destined to PE3 with the backup context label for PE3 on top to PE4. After stripping off the backup context label, PE4 will do another label lookup and figure that it needs to forwarding the traffic out on its link towards CE1 and popping off the label stack. Finally the traffic arrives at CE1.

So far only examples using LDP as local transport protocol have been illustrated. Our label switched path endpoint protection solution is not limited to a specific local transport protocol. In the next example LDP will be exchanged by RSVP as local transport protocol.

 

Deployment Scenarios

The most intuitive deployment scenario will likely be a classical 1:1 protection scheme. However our scheme is not limited to just 1:1. It is a true M:N protection scheme supporting all possible combinations. In this section some additional deployment scenarios will be highlighted.

Figure: 14a

Figure: 14b

 

Figure 14a illustrates a M:N protection scheme. Consider for example capacity reasons, where connection between two regions needs to be supported by more than a pair of routes. It is not required that each border router fully backs up each other border router for this region border. In order to save backup FIB context FIB space, each border router just backs up the next-border router such that a protection chain is formed. Each border router is now 100% protected against node failure. If one want to protect against multiple ABR failures then additional forwarding contexts need to be pre-provisioned. However securing a network against multiple node failures is comes expensive.

Figure 14b also illustrates a scheme where a dedicated backup router (PE4) is protecting a set of PE routers. This is a common anticipated scenario where PE routers running older hardware and software may have no support for label switched path endpoint protection and a router capable of all this backs up all the deployed PE routers.

 

Label Aggregation & FIB state

Before making a case for label-aggregation, let’s first evaluate scaling properties of the generalized model for ABR as per Figure 15. At first we do not run a protocol in the aggregation plane.

Figure: 15

 

The sample network consists of 200 regions with 500 PE routers each. In total there are 100.000 PE routers in the network. Each ABR is terminating all inter-region labels into his area. This causes a total number of 100.000 LFIB states to be programmed into the ABR routers FIB. In the generalized model the ABR router becomes the focal point for inter-region routes as a FIB state has to be set up irrespective if any of the local connected PE routers used that path. No other router in the generalized model carries as many routes as the border routers. Although the PE router does receive all the inter-region routes, only a few are likely to be used. The number of FIB state of the PE router will be determined by the number of Service endpoints plus the number of service routes.

In Figure 3 we have expressed the necessity of full indirection for route entries in the forwarding hardware. However this support may lack for deployed hardware. Without full indirection support today’s deployed hardware can fast re-route in the order of thousands of FIB state entries within a 50ms interval. From a convergence point of view it becomes therefore desirable to reduce the amount of FIB state on the border routers.

The key to solve FIB scaling problems at the border routers is to introduce a label-aggregation plane. In Figure 16 simple aggregation scheme for inter-region routes is illustrated.

Figure: 16

 

Once ABR1 receives the PE inter-region routes from its PE routers it rewrites the next-hop to self. ABR2 does not rewrite the next-hop to self as it does not want to create forwarding state for the inter-region route. In order to make the ABR1 next-hop reachable for other regions an additional route is injected into the aggregation plane. ABR1 injects its own loopback, this time the loopback gets rewritten (and a FIB state is generated) once passed down to attached regions.

PE2 needs to fold now 4 labels into a label stack. The top label is the local transport LSP for getting to ABR2. The next label is the aggregation label pointing to ABR1. The next label is the inter-region label pointing to PE1. Finally the innermost label representing the service on PE1 is pushed onto the label stack.

The reduction in FIB state on the border routers becomes remarkable. Rather than programming 100.000 route entries for reaching all the PEs, only FIB state corresponding to the number of border routers in the network are required. As per the example this would be 500 routers. Assuming dual homing to he transit region one needs to account for 1000 routes. Even with the deployed hardware, 50ms for a 100.000 PE network can be achieved today.

One interesting property of the optional aggregation plane is that it can be commissioned and de-commissioned in an incremental fashion. Once a given ABR hardware does support full FIB indirection, the Aggregation plane is decommissioned by turning on a next-hop rewrite of all the inter-region transport routes.

 

PE-CE protection

A frequently encountered failure scenario is pictured in Figure 17. The PLR router, which is the provider PE3 router shall protect against a link failure towards CE1. The first solution is to consider multipath routes across address families.

In our example the primary route is basic unicast route. The BGP session over which that route is learned terminates inside the customer vrf. The backup path via PE4 has been converted from a basic unicast route to a VPN unicast route, carrying additional attributes like route distinguisher and extended communities for ease of policy control.

Figure: 17

 

The PLR needs now to program a local repair action for the backup path. If the link between the PLR and CE1 goes down, the backup path kicks in. The traffic is now backhauled through the core to PE4 where the packet is de-capsulated and forwarded to CE1. For link-protection failures, plain multi-path routing is an adequate solution.

Consider now a node-failure of CE1. The PLR will switch to its backup path and relay the traffic to PE4. Once the backup traffic arrives at PE4, due to the link failure the already arriving backup traffic will be flipped back to PE3 (the first PLR). The result is a FRR inferred forwarding loop.

A more sophisticated, prone to node-failure solution is illustrated in Figure 18. The backup path scheme integrates with the label switched path endpoint protection solution illustrated in the previous section.

Figure: 18

 

PE4 listens to PE3 outgoing VPN announcement in order to splice the incoming routes with the local next hops in the vrf to build the backup table. The label switched path endpoint protection relies upon that the traffic is marked either as primary or backup traffic. The sample principle is applied here as well. When the PE3, CE1 links goes down the backup path is sent over the SP core down to PE4. The transport LSP being use for that purpose is PE4 primary loopback address. In order to identify the traffic as backup traffic another label (the backup context label) will be pushed before the transport LSP to relay traffic from PE3 to PE4. The backup context label represents PE4 announcement of IP 10.0.0.3 the shared anycast virtual IP address.

On its way from PE3 to PE4 the penultimate router (router P) will pop the top level transport label off the label stack which unveils the backup context label. PE4 will identify the traffic as backup traffic and perform a lookup in its backup forwarding context.

Going back to the example of a CE1 node failure, the identified backup traffic will now simply be dropped because there are no of course no backup routes in PE4 backup forwarding context. Now no FRR inferred forwarding loop is building up.

 

Example 2547 Carrier’s Carrier VPN

Carriers Carrier or recursive VPN deployment are very popular for transit service providers. Node-protection of a Top level service provider already has been highlighted in the basic 2547 VPN example. The only real difference is that instead of IP routes, labeled IP routes are protected. AS per Figure 19 a label switched path termination endpoint of Customer A (PEA3) shall be protected.

Figure: 19

 

The toolkit to protect router PEA3 is again a combination of the Label switched path endpoint protection scheme plus PE/CE Fast reroute. Keep in mind that in a recursive VPN deployment the CE is also the PE for the next recursion level. As per 19 consider that PEA3 needs to be node-protected. Like with plain 2547 VPNs this is achieved using a backup context plus (10.0.0.3) multipath routes into ASBR3, ASBR4. In order to protect for a link-failure of the CEB1 router a backup context (11.0.0.3) plus multipath routing is required for the PEA3 and PEA4 router pairs. Again the backup context makes sure that a double failure like a node failure of CEB1 does not cause a forwarding loop of backup traffic.

 

Conclusion

In this white paper we have introduced a set of tools to provide a deterministic, local-repair, constant-time service protection solution that can be applied at any point in an MPLS network. If all elements in a network deploy the tools outlined in this paper then a set of chained local-repair elements turns into an end-to-end service protection solution. All described tools just affect local behavior of the router, meaning the solution can be deployed without any protocol extensions, reducing the deployment time significantly.

 

About the Authors: Kireeti Kompella, Hannes Gredler

Trackback(0)
Comments (0)add
You must be logged in to a comment. Please register if you do not have an account yet.

busy
Last Updated on Monday, 09 November 2009 10:21