In multicast switches, the accommodation of multicast traffic in multiple queues per input buffer reduces the throughput degradation caused by head-of-line (HOL) blocking. Such an arrangement, called multicast virtual output queuing (MC-VOQ), is very promising in theory but can only be implemented in practice with heavy approximation. Complete avoidance of the HOL blocking problem would in fact require a distinct queue for each fanout set possible, leading to an exponential growth of the number of queues needed with the switch size. If only a limited number of queues can be used per input buffer, criteria must be identified for setting the number of queues, for associating the fanout sets with the individual queues, and for scheduling the transmission of packets out of the queues.
This paper presents an analytical model for the investigation of saturation throughput and packet delay in MC-VOQ multicast switches. The model relies on the assumption of Poisson-distributed uniform input traffic and random queuing and scheduling policies. Extensive simulation experiments validate the results of the analysis for large switch sizes.
As the Internet has evolved from a convenience to a mission-critical platform for conducting research, education, and business, the spread of multicast applications such as distributed interactive simulations, distance learning, and video-on-demand is rapidly growing. As a result, there is an increasing demand for high-performance packet switches that efficiently support multicast traffic.
The distinguishing feature of a multicast switch is its ability to deliver a multicast packet to multiple output ports. A multicast packet switch must be capable of replicating packets, so that all destinations of a multicast packet can receive their respective copies. We focus on crossbar-based switch fabrics, because of their intrinsic multicast capabilities [3] and [7]. We only address crossbars with speedup one (in a crossbar, the speedup is the capacity ratio between the fabric ports and the switch ports).
While virtual output queuing is a widely accepted queuing scheme for unicast traffic, it is not practical for multicast switches in its exhaustive form, which offers one queue per possible fanout set [5] and [10]. One alternative queuing scheme for multicast switches consists in the allocation of a single FIFO (first-in first-out) queue at each input port to handle all multicast traffic that transits through the port [9]. The performance of such multicast switches was studied by Hayes et al. [4] and Hui [5]. While the former focuses on small-sized switches and the latter focuses on large-sized switches, both studies assume one FIFO queue at each input port and the random packet selection policy with fanout splitting. The performance of multicast switches with one input queue per input buffer with no fanout splitting was studied in [1] and [8]. Although the single-queue scheme can be easily implemented due to its simplicity, head-of-line (HOL) blocking heavily impairs its performance.
To overcome HOL blocking, window-based queuing was introduced [2], allowing packets behind the HOL one to be transmitted prior to the HOL packet. The increased scheduling space results in an approximate 10% throughput increase on average. A major price paid for this enhancement is the increased complexity of the input queues, which must have random access capabilities. Another approach to alleviate HOL blocking consists in the allocation of a plurality of multicast queues at each input port, so that the packet scheduler can choose among multiple HOL packets. Simulation studies for such multicast switches with non-exhaustive multicast virtual output queuing (MC-VOQ) were performed by Gupta and Aziz [3], Song et al. [10], and Bianco et al. [11]. Particularly interesting are the results presented in [11], which indicate that the number of input queues, the queuing policy that couples the destination masks with the available input queues, and the scheduling policy that sets the service order over the available HOL packets are all critical to the throughput performance of the switch. All such indications are obtained under gathered traffic conditions, i.e. under traffic scenarios where only a small number of input ports supply multicast traffic to the switch. Uniform distribution of the traffic feeds over all input ports makes the performance impact of different design choices almost irrelevant.
The contribution of this paper consists of a first step towards the analytical validation and possible generalization of the simulation results presented in [11]. We provide analytical characterization of the behavior of a large multicast switch when Poisson traffic is uniformly distributed over all input ports. Our results confirm that under non-gathered traffic conditions a small number of multicast queues is sufficient to maximize the saturation throughput, even if low-end queuing and scheduling policies are deployed. We show simulation results that match the analytical ones for reasonably large switch sizes. Our next step, to be presented in a follow-up paper, will be the extension of the model to the case of gathered traffic.
The rest of the paper is organized as follows. Section 2 illustrates the multicast switch architecture used in this paper. Section 3 introduces the initial model and a modified equivalent model. The saturation throughput analysis is given in Section 4. In Section 5, we derive the packet delay and service time. The theoretical and experimental results are jointly presented in Section 6. Finally, conclusions are given in Section 7.
Using the M/G/1 model and the queuing theory, we analyzed the performance of multicast switches with multiple input queues under the assumption of uniform traffic distribution over all inputs and outputs. We provided closed-form expressions for the saturation throughput, the average service time, and the average delay. We performed extensive simulations to verify the analysis. Our results show that the throughput and delay performance of the multicast switch improves as the number of queues per input buffer increases. In a large multicast switch, the performance indices converge to their asymptotic values when the number of input queues is relatively small. Thus, a small number of input queues per input buffer (no more than 10) is sufficient to achieve sub-optimal performance in large switches with uniform distribution of multicast traffic. The extension of our analytical model to more general traffic conditions will be the topic of a follow-up paper currently in preparation.