Comparative Analysis of Heterogeneous Adders: Evaluating Performance across 12-bit, 14-bit, and 16-bit Configurations

Data and signal processors, along with controller-based systems, heavily depend on data manipulation for their functionality. Arithmetic operations, including addition, subtraction, multiplication, and division, are fundamental and widely utilized in various digital systems. These operations play a crucial role in architectures such as DSPs, microprocessors, microcontrollers, and data processing units. Among these operations, adders stand out as indispensable components in arithmetic circuits, particularly in processor-based systems [1].

To mitigate challenges in digital design, the implementation of efficient low-power and area design techniques becomes essential. Area-efficient designs not only result in smaller chip sizes but also contribute to lower costs and reduced device weight, making electronic devices more portable and convenient. The incorporation of these techniques aims to enhance the overall reliability of circuits by lowering the average power consumption. This reduction in power consumption not only improves circuit reliability but also diminishes cooling requirements, subsequently leading to decreased packaging and cooling costs [2].

Reference [3] demonstrates that the section-carry based carry look ahead adder (SCBCLA) design exhibits better optimization compared to the conventional carry look ahead adder (CCLA) design. Achieving superior optimization in design metrics is facilitated by the use of a heterogeneous Carry Look-Ahead (CLA) architecture when compared to a homogeneous CLA architecture [4]. The study in [5] focuses on the flexibility of reconfigurable design, exploring various combinations of adder variants to achieve low power consumption and a small area for enhanced design performance.

In this paper, diverse adder architectures are formulated with a focus on heterogeneity, and a comparative analysis is conducted, evaluating them based on area and power considerations. The comparison involves examining simulation results and implementing the designs through VIVADO 2017.1. The implementation of the heterogeneous adder is carried out using VHDL as the programming environment [6]. VHDL facilitates the integration of various abstraction levels within the same model, offering a versatile design flow for the heterogeneous adder structures.

The paper is planned as follows: Section II offers an overview of various related works, offering context and insights from existing research. Section III presents a concise explanation of the proposed model. Section IV details the performance evaluation, shedding light on the assessment criteria and results. Lastly, Section V serves as the conclusion, summarizing key findings and implications of the study.

2

Correlated Work

2.1

RCA: Homogeneous Adder

The construction of an RCA involves the sequential connection of full adders, wherein the carry output from one full adder serves as the input carry for the subsequent stage. A full adder, functioning as the fundamental building block, is pivotal in this design. Consequently, the creation of an n bit parallel adder necessitates the incorporation of n full adders. Termed an “RCA,” this architecture is characterized by the sequential propagation of carry bits to each successive full adder. While the layout of an RCA facilitates rapid design, its operational speed is comparatively modest. This is attributed to the dependency of each full adder on the calculation of the carry bit from the preceding full adder, resulting in a sequential and potentially slower computation process [7].

In Fig. 1, the n-bit inputs are denoted as A and B, C₀ represents the carry input, S signifies the n-bit outputs, and C_n represents the carry output. The intermediate carry signals, labeled as C₁, C₂, up to C_n-1 in the diagram, are integral components referred to as signals in VHDL code. These intermediate carry signals play a vital role in the internal workings of the adder, facilitating the accurate calculation of the carry output C_n based on the input and intermediate carry values.

The primary drawback of the RCA is its susceptibility to increased delay as the bit length grows. Consequently, the RCA becomes less suitable for the addition of a large number of bits. The principal factor contributing to this delay is the carry propagation mechanism. Consequently, it becomes crucial to compute the carry delay from input to output for effective performance evaluation. For an n-bit RCA, the delay for carry (T_c) can be intended as part of assessing the overall efficiency and speed of the adder circuit. 1 $T_{c} = T_{F A} ((A_{0}, B_{0}) To C_{0}) + (n - 2) x T_{F A} (C_{i n -} C_{o u t}) + T_{F A} (C_{i n -} S_{o u t} (n -1))$ {T_c} = {T_{FA}}\left( {\left( {{A_0},{B_0}} \right){\rm{ To }}{C_0}} \right) + (n - 2)x{T_{FA}}\left( {{C_{in - }}{C_{out}}} \right){\rm{ + }}{T_{FA}}\left( {{C_{in - }}{S_{out }}(n{\rm{ - 1}})} \right)

Here, T_FA denotes the delay of a full adder along the pathway from its designated input to its output [8].

2.2

CLA: Homogeneous Adder

A CLA enhances operational speed by minimizing the time required for carry bit determination. In contrast to an RCA, where each bit must wait for the previous carry to be calculated, the CLA computes one or more carry bits earlier than the sum, thereby reducing the waiting time for computing the results of higher-value bits [9–13]. Understanding the CLA involves manipulating Boolean expressions related to a full adder [14–16]. The key components in a full adder, the propagate “P_i” and generate “G_i” are expressed in equation (2), delineating their roles in the adder’s functionality. 2 $\begin{matrix} P_{i} = x_{i n} x o r y_{i n} c a r r y p r o p a g a t e \\ G_{i} = x_{i n} a n d y_{i n} c a r r y g e n e r a t e \end{matrix}$ \matrix{ {{P_i} = {x_{in{\rm{ }}}}x\;or\;{y_{in{\rm{ }}}}carry\;propagate} \cr {{G_i} = {x_{in}}\;and\;{y_{in{\rm{ }}}}carry\;generate} \cr }

Both the propagate and generate signals are dependent solely on the input bits, ensuring their validity after a single gate delay. Equation (3) provides the updated expressions for the sum output and carryout. These expressions encapsulate the modified relationships, reflecting the efficient computation achieved through the CLA design [17,18].3 $\begin{matrix} {S u m}_{o u t} = S_{i} = P_{i} {x o r}_{(i - 1)} \\ {C a r r y}_{o u t} = C_{(i + 1)} = G_{i} + P_{i} a n d C_{i} \end{matrix}$ \matrix{ {{\rm{Su}}{{\rm{m}}_{out}} = {S_i} = {P_i}{{{\mathop{\rm xor}\nolimits} }_{(i - 1)}}} \cr {{\rm{Carr}}{{\rm{y}}_{out}} = {C_{(i + 1)}} = {G_i} + {P_i}{\rm{ and }}{C_i}} \cr }

The equations illustrate two scenarios in which a carry signal will be produced: (a)

If both input bits, x_in and y_in, are 1.

(b)

If either x_in or y_in is 1, and simultaneously, the carry-in signal is also 1.

By implementing the overhead equations for a 4-bit adder, the expression will be as follows [17]: 4 $\begin{matrix} C (1) = G (0) + P (0) * C (0) \\ C (2) = G (1) + P (1) * C (1) = G (1) + P (1) * G (0) + P (1) * P (0) * C (0) \\ C (3) = G (2) + P (2) * C (2) = G (2) + P {(2)}^{*} G (1) + P {(2)}^{*} P (1) * G (0) + P (2) * P (1) * P (0) * C (0) \\ C (4) = G (3) + P (3) * C (3) = G (3) + P (3) * G (2) + P (3) * P (2) * G (1) + P (3) * P (2) * P (1) * G (0) + P (3) * P (2) * P (1) * P (0) * C (0) \end{matrix}$ \matrix{ {C(1) = G(0) + P(0)*C(0)} \cr {C(2) = G(1) + P(1)*C(1) = G(1) + P(1)*G(0) + P(1)*P(0)*C(0)} \cr {C(3) = G(2) + P(2)*C(2) = G(2) + P{{(2)}^*}G(1) + P{{(2)}^*}P(1)*G(0) + P(2)*P(1)*P(0)*C(0)} \cr {C(4) = G(3) + P(3)*C(3) = G(3) + P(3)*G(2) + P(3)*P(2)*G(1) + P(3)*P(2)*P(1)*G(0) + P(3)*P(2)*P(1)*P(0)*C(0)} \cr }

Similarly, the general expression can be expressed in equation (5) as follows: 5 $C_{i + 1} = G_{i} + P_{i} * G_{i - 1} + P_{i} * P_{(i - 1)} * G_{(i - 2)} + \dots + P_{i} * P_{(i - 1)} \dots P_{2} * P_{1} * G_{0} + P_{i} * P_{(i - 1)} \dots P_{1} * P_{0} * C_{0}$ {C_{i + 1}} = {G_i} + {P_i}*{G_{i - 1}} + {P_i}*{P_{(i - 1)}}*{G_{(i - 2)}} + \ldots + {P_i}*{P_{(i - 1)}} \ldots {P_2}*{P_1}*{G_0} + {P_i}*{P_{(i - 1)}} \ldots {P_1}*{P_0}*{C_0}

Fig. 2 illustrates how the structure of a CLA can be segmented into three main components: the propagate/generate generator, the sum generator, and the carry generator.

3

Proposed Model

In this study, the heterogeneous adders of 12-bit, 14-bit, and 16-bit configurations are crafted by diverse combinations of m-bit CLAs and n-bit RCAs, as outlined in Table 1. The design and evaluation are executed through VIVADO 2017.1, encompassing simulation, RTL synthesis, and implementation phases to generate power summary and utilization summary reports [19–22]. The escalating need for low-power components in diverse computing applications, as highlighted in [19], emphasizes flexibility and portability. This paper presents the implementation of heterogeneous adder designs with a focus on identifying the design with the lowest power consumption. Additionally, utilization summary reports are generated to compare area performance, aiding in the determination of the design with the least area utilization. The internal structure of the planned model is depicted in Fig. 3.

Table 1.

Possible alliances

12-Bit

m-bit CLA	n-bit RCA
4	8
6	6
8	4
10	2

14-Bit

m-bit CLA	n-bit RCA
4	10
6	8
8	6
10	4
12	2

16-Bit

m-bit CLA	n-bit RCA
4	12
6	10
8	8
10	6
12	4
14	2

4

Performance Evaluation

The primary challenge for any VLSI designer lies in creating systems that operate at higher speeds while consuming minimal power and occupying minimal physical area. Therefore, accurate performance estimation becomes indispensable in the design process. This involves predicting and optimizing factors such as speed, power consumption, and area utilization to achieve an optimal balance and meet the specific requirements of the given application or system.

4.1

Power Dissipation

Static CMOS gates are known for their low power dissipation in idle or static conditions. In static CMOS, power dissipation mainly consists of two components: static power dissipation and dynamic power dissipation.6 $P_{T o t a l} = P_{s t a t i c} + P_{d y n a m i c}$ {P_{Total}} = {P_{static}} + {P_{dynamic}}

Static dissipation: P_static = I_static V_DD

Dynamic dissipation: P_dynamic = αCV²_DD f

Here, activity factor (α) = 1 of a clock; meanwhile, it rises and falls in each cycle. In VIVADO tool, 7 $P_{d y n a m i c} = P_{S i g n a l s} + P_{L o g i c} + P_{I O}$ {P_{dynamic}} = {P_{Signals}} + {P_{Logic}} + {P_{IO}}

4.2

Delay Estimation

In digital design, critical paths refer to specific paths within a circuit that impose the most significant constraints on the overall timing performance. These paths have the longest propagation delays and, therefore, play a crucial role in determining the maximum achievable operating frequency of the circuit. For a RCA, 8 $t_{R i p p l e} = t_{p g} + (N - 1) t_{A O} + t_{x o r}$ {t_{Ripple}} = {t_{pg}} + (N - 1){t_{AO}} + {t_{xor}} where t_pg is the 1-bit generate/propagate gates delay, t_AO is AND_OR gate delay, t_xor is XOR delay final sum.

The delay of a CLA adder can be influenced by the concept of k-groups of n-bits each, where k represents a grouping of bits within the adder. The delay in a CLA adder often involves the calculation of the carry-out from each group of bits, and this can be expressed mathematically which is represented in (9) [23–25].9 $t_{C L A} = t_{P G} + t_{P G (n)} + [(n - 1) + (p - 1)] t_{A O} + t_{x o r}$ {t_{CLA}} = {t_{PG}} + {t_{PG\left( n \right)}} + \left[ {\left( {n - {\rm{1}}} \right) + \left( {p - {\rm{1}}} \right)} \right]{\rm{ }}{t_{AO}} + {t_{xor}} where t_PG(n) is AND_OR…. gate delay to calculate n number of generated signals.

4.3

12-bit Adder Presentation

\frac{\begin{array}{l} A = > 011101001011 \\ B = > 110010101001 \\ Cin = > + 1 \end{array}}{Sum = > 1001111110101}

{{\matrix{ {{\rm{A}} = > 011101001011} \hfill \cr {{\rm{B}} = > 110010101001} \hfill \cr {{\rm{Cin }} = > \quad \;\;\;\; + \quad \;\;\;\;1} \hfill \cr } } \over {{\rm{Sum }} = > 1001111110101}}

The simulation and synthesis of different combination of 12-bit heterogeneous adder is done by using VIVADO tool and the result is tabulated.

4.4

14-bit Adder Presentation

\frac{\begin{array}{l} A = > 3 B 3 D \\ B = > CB 3 A \\ Cin = > + 1 \end{array}}{Sum = > 1287 A}

{{\matrix{ {{\rm{A}} = > 3{\rm{B}}3{\rm{D}}} \hfill \cr {{\rm{B}} = > {\rm{CB}}3{\rm{A}}} \hfill \cr {{\rm{Cin}} = > \;\; + \;\;1} \hfill \cr } } \over {{\rm{Sum}} = > 1287{\rm{A}}}}

The simulation and synthesis of different combination of 14-bit heterogeneous adder is done by using VIVADO tool and the result is tabulated.

4.5

16-bit Adder Presentation

\frac{\begin{array}{l} A = > B 790 \\ B = > 8763 \\ Cin = > + 1 \end{array}}{Sum = > 13 EF 4}

{{\matrix{ {{\rm{A}} = > \;\;{\rm{B}}790} \hfill \cr {{\rm{B}} = > \;\;8763} \hfill \cr {{\rm{Cin}} = > \;\; + 1} \hfill \cr } } \over {{\rm{Sum }} = > \;\;13{\rm{EF}}4}}

The simulation and synthesis of different combination of 16-bit heterogeneous adder is done by using VIVADO tool and the result is tabulated.

4.6

Tabulation and Comparison

Power and area analysis of 12-bit adder with different combinations are signified in Table 2 & Table 3, respectively.

Table 2.

Power calculation of 12-bit adder

Types	Total Power (in W)	Dynamic Power P_D (in W)			P_D = P_S+P_L+P_I (in W)
Types	Total Power (in W)	Signals Power P_S	Logic Power P_L	IO Power P_I	P_D = P_S+P_L+P_I (in W)
12CLA	8.279	0.292	0.084	7.801	8.177
2RCA+10CLA	8.161	0.278	0.080	7.701	8.059
4RCA+8CLA	8.141	0.279	0.074	7.686	8.039
6RCA+6CLA	8.145	0.282	0.078	7.683	8.043
8RCA+4CLA	8.145	0.283	0.077	7.683	8.043
12RCA	8.280	0.293	0.084	7.801	8.178

Table 3.

Area analysis of 12-bit adder

Types	LUT (41000)	Slice (10250)	Cells	Nets
12CLA	12	4	29	97
2RCA+10CLA	11	5	61	48
4RCA+8CLA	10	5	46	80
6RCA+6CLA	10	4	38	72
8RCA+4CLA	10	5	30	64
12 RCA	12	4	12	49

From Table 2, it is noticed that the heterogeneous adder model designed using 4-bit RCA and 8-bit CLA has the least total on-chip power consumption of 8.141 W. In Table 3, the heterogeneous adder (6-bit CLA + 6-bit RCA) has the least no. of slices and, hence, the least area utilization.

Power and area analysis of 14-bit adder with different combinations are tabulated in Table 4 & Table 5, respectively.

Table 4.

Power calculation of 14-bit adder

Types	Total Power (in W)	Dynamic Power P_D (in W)			P_D = P_S+P_L+P_I (in W)
Types	Total Power (in W)	Signal Power P_S	Logic Power P_L	IO Power P_I	P_D = P_S+P_L+P_I (in W)
14CLA	9.865	0.523	0.087	9.146	9.756
2RCA+12CLA	9.510	0.352	0.092	8.959	9.403
6RCA+8CLA	9.489	0.349	0.092	8.942	9.383
8RCA+6CLA	9.483	0.345	0.089	8.942	9.376
10RCA+4CLA	9.482	0.345	0.089	8.941	9.375
4RCA+10CLA	9.495	0.341	0.086	8.944	9.371
14RCA	9.623	0.360	0.096	9.059	9.515

Table 5.

Area analysis of 14-bit adder

Types	LUT (41000)	Slice (10250)	Cells	Nets
14CLA	14	7	48	128
2RCA+12CLA	13	6	64	104
6RCA+8CLA	12	5	48	88
8RCA+6CLA	12	6	40	80
10RCA+4CLA	12	6	32	72
4RCA+10CLA	12	6	71	111
14RCA	14	5	14	57

From Table 4, it is observed that the 10-bit RCA + 4-bit CLA heterogeneous adder architecture design has the least total on-chip power consumption of 9.482 W. In Table 5, the heterogeneous adder (6-bit RCA + 8-bit CLA) has the least no. of slices and, hence, the least area utilization.

Power and area analysis of 16-bit adder with different combinations are given in Table 6 & Table 7, respectively.

Table 6.

Power calculation of 16-bit adder

Types	Total Power (in W)	Dynamic Power P_D (in W)			P_D = P_S+P_L+P_I (in W)
Types	Total Power (in W)	Signal Power P_S	Logic Power P_L	IO Power P_I	P_D = P_S+P_L+P_I (in W)
16CLA	10.971	0.422	0.118	10.319	10.859
2RCA+14CLA	10.848	0.410	0.108	10.218	10.736
4RCA+12CLA	10.827	0.408	0.104	10.202	10.714
6RCA+10CLA	10.830	0.410	0.108	10.200	10.718
8RCA+8CLA	10.833	0.415	0.106	10.200	10.721
10RCA+6CLA	10.828	0.412	0.105	10.200	10.717
12RCA+4CLA	10.832	0.416	0.104	10.199	10.719
16RCA	10.962	0.418	0.114	10.316	10.848

Table 7.

Area calculation of 16-bit adder

Types	LUT (41000)	Slice (10250)	Cells	Nets
16CLA	16	7	37	129
2RCA+14CLA	15	7	74	120
4RCA+12CLA	14	7	83	129
6RCA+10CLA	14	6	58	104
8RCA+8CLA	14	7	50	96
10RCA+6CLA	14	7	42	88
12RCA+4CLA	14	7	34	80
16RCA	16	6	16	65

From Table 6, it is observed that the 4-bit RCA + 12-bit CLA heterogeneous adder architecture design has the least total on-chip power consumption of 10.827 W.

In Table 7, the heterogeneous adder (6-bit RCA + 10-bit CLA) has the fewest slices, hence the least area utilization. Finally, Tables 8 and 9 summarize the best combinations for power and area analysis of 12-bit, 14-bit, and 16-bit heterogeneous adders, respectively.

Table 8.

Best power analysis of 12-bit, 14-bit & 16-bit heterogeneous adders

Types	Combinations	Total Power (in W)
12-bit	4RCA+8CLA	8.141
14-bit	10RCA+4CLA	9.482
16-bit	4RCA+12CLA	10.827

Table 9.

Best area analysis of 12-bit, 14-bit, and 16-bit heterogeneous adders

Types	Combinations	Slice
12-bit	6RCA+6CLA	4
14-bit	6RCA+8CLA	5
16-bit	6RCA+10CLA	6

5

Conclusion

The paper successfully presents the design and analysis of 12-bit, 14-bit and 16-bit heterogeneous adders, exploring several groupings of m-bit CLA and n-bit RCA adders. The criteria for selecting the optimal models involve choosing the one with the least power consumption, determined by the least dynamic power. Additionally, the model with the finest area utilization is nominated based on the minimum LUT count. In the case of 12-bit adders, the 8-bit CLA + 4-bit RCA combination of heterogeneous adder exhibits the least total on-chip power, and the 6-bit CLA + 6-bit RCA combination of heterogeneous adder demonstrates the least area utilization. In the case of 14-bit adders, the grouping of 4-bit CLA and 10-bit RCA in the heterogeneous adder shows the least total on-chip power and the combination of 8-bit CLA and 6-bit RCA in the heterogeneous adder exhibits the least area utilization. In the case of 16-bit adders, the grouping of 12-bit CLA and 4-bit RCA in the heterogeneous adder demonstrates the least total on-chip power, and the combination of 10-bit CLA and 6-bit RCA in the heterogeneous adder has the least area utilization. These results provide valuable insights into the power and area efficiency of different heterogeneous adder architectures for various bit widths, aiding in the selection of optimized designs based on specific performance criteria. Therefore, the proposed heterogeneous adders with the perfect combination of RCA and CLA can be implemented for high speed, less area, and low power operation in advanced digital system applications.

Language:: English

Publication timeframe:: 6 times per year
Journal Subjects:: Computer Sciences, Fundamentals of Computer Sciences, Theoretical Computer Sciences, IT-Security and Cryptology

Journal RSS Feed

Comparative Analysis of Heterogeneous Adders: Evaluating Performance across 12-bit, 14-bit, and 16-bit Configurations

Shasanka Sekhar Rout

Rajesh Kumar Patjoshi

Sarmila Garnaik

Ranjita Rout

Published Online: Feb 20, 2025

Page range: 136 - 145

DOI: https://doi.org/10.2478/ias-2024-0010

KeywordsCLA, DSP, Heterogeneous adder, RCA, RTL, VIVADO, Xilinx

© 2024 Shasanka Sekhar Rout et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Keywords
CLA, DSP, Heterogeneous adder, RCA, RTL, VIVADO, Xilinx