Accesso libero

A Path-Wise Scheme for Simpler Mesh-of-Tree Model in Network-on-Chip Designs

   | 24 mag 2023
INFORMAZIONI SU QUESTO ARTICOLO

Cita

Introduction

Because of its increased efficiency and the ongoing expansion in the number of computational and storage blocks integrated in a single chip, NoC has become the trend of high-performance microprocessors. Many studies are now focusing on NoC design, including topology and routing strategy. Mesh, cmesh, torus, and MoT [1] are some of the topologies that have been proposed. Mesh has gotten the most attention because of its lower node degree. However, due of its larger diameter, it has a higher power consumption.

Root, stem, and leaf routers are used in the MoT architecture, which has a smaller diameter and lower node degree. Root routers may perform intensive communication activities on the Internet of Things, which means they use more power and are more likely to become hotspots. We proposed the SMoT topology in this study, which would diminish this phenomenon while reducing power usage. We also suggested a path-wise routing method for SMoT, which spreads communication around the network by altering packet forwarding direction to reduce network latency.

Related work

Several recent studies have aimed to create a network with low latency and low power consumption. Node-Router Decoupling was suggested by Chen et al., who used power-gating techniques to reduce power consumption [2]. Ghosal, Prasun, and colleagues proposed a Level-2 Mesh topology with two tiers of linkages, one for long-distance communication and the other for short-distance communication [3]. When compared to 3D FMT, Viswanathan designed a revolutionary 3D structure with a 75 percent reduction in the number of vertical links [4].

To minimize network latency, Koibuchi et al. created random shortcuts by supplementing conventional topologies with random links [5]. By delivering packets through less crowded routers, Ebrahimi et al. presented an adaptive routing algorithm [6]. To decrease power dissipation, Ezz-Eldin et al. introduced a low leakage power switch using power supply gating and adaptive virtual channel technique [7].

Motivation

One of the most fundamental restrictions in NoC designs is power consumption. The linked network of an MIT Raw processor with 16 PEs consumes 36 percent of the total power consumption. The Tera-scale chip was created by Intel, and its associated power usage accounted for 40% of the total power consumption. As a result, NoC has a high-power efficiency. Meanwhile, as process technology advances, communication between cores will become a constraint in further improving performance. Take the Intel Single-Chip Cloud Computer for example (SCC). Each SCC chip has 48 Pentium processors that were coupled by a 46 mesh. It takes at most 10 hops to send a packet from the source to its target node.

As a result, we presented a more power efficient SMoT architecture in this study. We also proposed a path-wise routing method for SMoT to address the requirement for reduced network latency. Our approach allowed us to reduce the cost of communication, including electricity and latency.

Architecture of the Simpler Mesh-of-Tree

There are three types of routers in the MoT topology: root, stem, and leaf. Row or column routers are the stem routers. Two cores, one row router, and one column router are connected by each leaf router. Until the final level stem routers join the root routers, the stem routers are structured in a binarytree-like design.

Based on the original design depicted in Figure 1(a), we proposed the SMoT, which uses connections instead of root routers to minimize power consumption and latency. For starters, when a network's traffic increases, root routers can quickly become hotspots. The average link use of root and other routers is shown in Figure 2(a). Because other routers have more ports than root routers, packets that arrive at a particular router might pick an appropriate port to be received or sent, or just wait for a while in the buffers. Furthermore, if a significant number of cores are merged, the number of root routers will increase. Figure 2(b) depicts the root routers' growth as the number of cores grows.

Figure 1.

(a): 32-core MoT (1:leaf, 2:stem, 3:root); (b): 32-core SMoT

Figure 2.

(a) The average link use of root and other routers; (b) Depicts the root routers' growth

Path-wise scheme

For SMoT, we describe a path-wise routing strategy. When a packet arrives at a leaf router, calculate the congestion coefficient for each neighboring router, which shows the traffic surrounding this router, and select the next hop from the shortest path with the lowest congestion coefficient. This method can minimize network congestion while also providing the shortest path.

In our approach, different routing strategies are applied to the leaf and stem routers, and a few fundamental parameters are required. The number of row routers that a packet passes through on its way from the current node to the destination is represented by Δx, while the number of column routers that a packet passes through on its way from the current node to the destination is represented by Δy. In a M*N MoT structure (M and N denote the number of rows and columns respectively), we also employ a simpler addressing method.

Figure 3.

(a) The address for a core; (b) The address for a leaf router

The following is the procedure of the proposed algorithm for leaf routers:

Step1: Packet input.

Step2: Compute the optimal port for packet output.

Step2.1: Calculate the value of Δx and Δy: Δx=Des_addrrow_addrSrc_addrrow_addr \Delta x = Des\_add{r_{row\_addr}} - Src\_add{r_{row\_addr}} Δy=Des_addrcol_addrSrc_addrcol_addr \Delta y = Des\_add{r_{col\_addr}} - Src\_add{r_{col\_addr}}

Step2.2: If Δx≠0 and Δy≠0, which indicates the source and destination nodes are in separate rows and columns, choose one neighbor router as the output:

Step2.2.1: Compute the feasibility P1 and P2 that a packet is sent to the router in the next row or column: P1=Δx+PalgorithmΔx+Δy+2Palgorithm {P_1} = {{\Delta x + {P_{algorithm}}} \over {\Delta x + \Delta y + 2{P_{algorithm}}}} P2=Δy+PalgorithmΔx+Δy+2Palgorithm {P_2} = {{\Delta y + {P_{algorithm}}} \over {\Delta x + \Delta y + 2{P_{algorithm}}}}

Where Palgorithm denotes the total number of pathways between a packet's source and destination node, in our proposed SmoT model, Palgorithm is set to 2.

Step2.2.2: Calculate the congestion coefficient for neighbor routers other than the one where the packet was sent, using the equation below: Coni=Pi×Buf_SIZEout1+Buf_SIZEout22 Co{n_i} = {P_i} \times {{Buf\_SIZ{E_{out1}} + Buf\_SIZ{E_{out2}}} \over 2}

Where Coni is the produced congestion coefficient, Pi denotes the probability that a packet is sent to the row router or column router, i=1, 2; Buf_SIZEout1 and Buf_SIZEout2 reflect the number of available buffers in the current output port as well as the number of available buffers in the next available output port.

Step2.2.3: Choose a router with the optimal Coni as the output stop for the packet, along with its output port.

Step2.3: If Δx=0 and Δy≠0, which indicates the source and destination nodes are in identical rows, identify the relevant output port for the row router as the next stop.

Step2.4: If Δx≠0 and Δy=0, which indicates the source and destination nodes are in identical columns, identify the relevant output port for the row router as the next stop.

Step2.5: If Δx=0 and Δy=0, which indicates the source and destination nodes belong to a single router, if Core_id=0, forward the incoming packet to L_Core; if Core_id=1, forward the incoming packet to R_Core.

Step2.6: When two or more packets fight for the same output port, the packet with the larger number of hops wins.

Step3: Forward the packet via the determined output port.

Meanwhile, the algorithm for l-th stem routers is presented in Figure 4. Specifically, when a packet comes in, compute the target router in level l. if the present router's location is not the same as the address of the target router, select (l+1)-th router or l-th neighbor one as the next stop, with the condition of current router's level. If not, calculate the address of the target router in level l–1, forward the packet to L_Router or R_Router based on the target router.

Figure 4.

The flow diagram of routing algorithm for leaf routers

Experiment and evaluation
Experimental setup

In this section, we choose two metrics to evaluate the performance of SMoT, i.e., the power consumption and network latency. These two metrics are well adopted in previous works.

In the evaluation of performance, power consumption is a significant factor. It will be meaningful if we can lower it while obtain comparable performance. Overall, the power comsumption is computed as follows: Network_power=Pdynamic+Pstatic Network\_power = {P_{dynamic}} + {P_{static}}

The time it takes for a packet to travel from its source to its destination is known as network latency in NoC. One of the primary performance concerns we utilize here is the average latency.

Network_latency=Total_Cycles/Total_Flits Network\_latency = Total\_Cycles/Total\_Flits
The experimental environment

We conduct experiments with the Gem5 simulator, which is a public well-known platform. We choose the PARSEC dataset for evaluation following the the common practice in prior stuides. To be concrete, the details of experimental setup is depicted in Table 1.

Environmental setup

Processor configuration
CPU_TYPE Timing Timing
CPU_NUMS 16 64
NoC configuration
TOPOLOGY Mesh MoT SMoT
SCHEDULING_POLICY xy routing scheme deterministic routing scheme path-wise routing scheme
BENCHMARK PARSEC PARSEC PARSEC
Experimental results

The network power comparison is presented in Figures 5(a) and 5(b). As is observed, our proposed SMoT has apparently gained lower power consumption. Specifically, it reduces the power consumption by 5.39%–23.3% compared to the original MoT. We argue this is mainly due to two fundamental factors. On one hand, leveraging short paths between the subnets is able to decrease the number of hotspots, since most routers hold more ports compared to the root ones, packets would obtain more choices for paths. On the other hand, the power consumption is directly reduced with less routers.

Figure 5.

(a): Network power for 16 cores; (b): Network power for 64 cores

Meanwhile, the results of network latency is shown in Figures 6(a) and 6(b). It is noticed that the network latency of the proposed SMoT was decreased by 4.23%–27.28% than a typical MoT. We believe the reason is that the path-wise scheme improves the network efficiency and further decreases the latency caused by large number of hop counts.

Figure 6.

(a): Network latency for 16 cores, (b): Network latency for 64 cores

Overall, the above experimental results verify that the proposed algorithm obtains less power consumption and lower network latency while maintaining better performance compared to previous studies. On one hand, leveraging short paths between the subnets is able to decrease the number of hotspots, since most routers hold more ports compared to the root ones, packets would obtain more choices for paths. On the other hand, the power consumption is directly reduced with less routers.

Conclusions

The architecture design plays an important role in the performance of NoC applications. In this work, we propose a SMoT architecture along with a path-wise scheme which takes into account both network topology and the congestion coefficient. Experimental results demonstrate that our proposed model obtains better performance while reducing the power consumption and network latency effectively and efficiently.

eISSN:
2470-8038
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, other