#### **International Journal of Research** Available at https://edupediapublications.org/journals e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 ## VLSI Design of a Novel Wallace Tree Multiplier for an FIR Filter Ahtesham Ali Mir & Towfeeq Fairooz mirali@tu.edu.sa & towfeeq.fairooz@gmail.com <sup>1,2</sup> Department of Computer Engineering, TAIF UNIVERSITY, K.S.A. Abstract: The Wallace tree multiplier is considered as faster, than a simple array multiplier and is an effective application of a digital circuit which multiplies two integers. A Wallace tree multiplier is a parallel multiplier that uses the carry save addition algorithm to reduce the quiescene. There are numerous researchers dealt on the design progressively more efficient multipliers. The main purpose is to accomplish higher speed and lower power consumption even while occupying condensed silicon region. Wallace tree multiplier are rapid multipliers, uses full and half adder .As far as range and power the execution of XOR-XNOR gates and MUX effective. The proposed Wallace multiplier technique is far superior compared to traditional method. In this paper we present FIR filter implementation of Wallace multiplier, as the Extension. **Keywords:** Wallace tree multiplier, Multiplexer, Full adder, Half adder. #### **I.INTRODUCTION** Multiplication is one amongst the more area consuming arithmetic operations in high-yielding circuits. As a result frequent research works were dealt with low power design for high speed multipliers. Multiplication involves two vital operations, the generation of the partial products and their sum, performed using two kinds of multiplication algorithms, serial and parallel. multiplication algorithms sequential circuits with feedbacks: inner products are sequentially produced and computed. Parallel multiplication algorithms often use combinational circuits and do not contain feedback structures. Multiplication of two bits produces an output which is twice that of the original bit It is typically expected to truncate the partial product bits to the expected accuracy to decrease territory cost. Fixed width multipliers, a subset of truncated multipliers, process only n most significant bits (MSBs) of the 2n-bit product for $n \times n$ multiplication and utilize additional correction/compensation circuits to decrease errors (truncation errors). This approach together considers the tree reduction, truncation, and rounding of the PP bits during design of fast parallel truncated multipliers so that the last truncated product fulfills the accuracy prerequisite. In our approach truncation error is not more than 1ulp (unit of least position), so there is no need of error compensation circuits, and the last output will be précised. - Merits: Compared to dada multiplication delay is very high in Wallace multiplication. - Power utilization in the Wallace multiplier is high. ## R #### **International Journal of Research** Available at https://edupediapublications.org/journals e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 A multiplier designed by using Wallace tree architecture is known as a Wallace multiplier. A typical Wallace multiplier and a reduced complexity Wallace multiplier are two architectures among them. In this paper design and performance analysis typical Wallace tree multiplier and a reduced complexity Wallace multiplier are examined. Performance analysis is carried out by using Xilinx ISE design. Figure 1: Wallace tree multiplier Structure From Fig.1. the 3 rows are summed using FAs and if there are 2 speck (dots) in a specific column, half adders are used. The emerged sum and carry signals from the half and full adders are passed to the next stage. Figure 2: Block diagram of Wallace tree multiplier Fig. 2 demonstrates the block diagram of a Wallace multiplier. In this multiplier design, generation of the partial product, accumulation of partial product and final addition are done in various stages. In the last stage, final addition is done. The number of rows of partial product in a particular stage can be expressed as $Ri+1 = 2(Ri/3) + Ri \mod 3$ where, Ri gives the groups or stages and R0 = N = number of bits. Let us consider an illustration the N bits multiplication, N2 AND gates are required to generate the partial product terms and the number to decrease stages is given by S. $s = log 2^N$ . ### II. CONVENTIONAL FULL ADDER In an ordinary Wallace multiplier, partial products are created first. Then these are gathered in different stages. The method is recasted until the point where last stage contains just two lines. A 4x4 bit traditional Wallace multiplier is arranged by using Xilinx. The flowchart for arranging a traditional Wallace multiplier is showed up in Fig.2. Design of the general Wallace multiplier is done in three phases: - i) Partial product generation - ii) Accumulation of partial product - iii) Final addition Fig 3: Conventional Wallace tree multiplier ### R #### **International Journal of Research** Available at <a href="https://edupediapublications.org/journals">https://edupediapublications.org/journals</a> e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 In the reduction phase the traditional Wallace tree multiplier uses full adder, high power consumption is caused due to bottleneck of full adder. In this paper the proposed and the existing multiplier designs are developed using Verilog HDL for 8 and 16 bits, respectively. The performance of the 8-bit proposed Wallace tree multiplier is verified through simulations using cadence tool. #### III. PROPOSED DESIGN MUX based full adder. The full adder is executed using 4:1 multiplexers as shown in fig. 3. MUX based full adder implementation in the reduction phase of a Wallace tree multiplier reduced Power. Figure 4: 4:1 MUX based Full Adder In order to reduce the power and area, the conventional full adder in reduction phase of Wallace tree multiplier is reintegrated by a modified full adder. It is noticeable that, one 4:1 MUX can be made using three 2:1 MUX. The critical path delay can be written as shown in figure 4 Delay = $NOT\Delta + 2 * MUX\Delta$ The Wallace tree multiplier can be made more effective by further reducing the critical path delay. The same can be accomplished by utilizing proposed full adder. Every multiplexer has been acknowledged by utilizing two transistors, for the acknowledgment of the full viper circuit. The main cause of power dissipation in any circuit is short circuit current, leakage current and logic transition. In this circuit there is no probability of direct path formation between source and ground. Figure 5: Proposed full adder In Wallace tree structure, the partial products are divided into certain specific levels. In each level, whenever there are three bits, full adders are used. Out of the three inputs, one input and its complement is provided as inputs to the first multiplexer. The other two inputs are given to XOR gate, the output of which will act as a select line to both the multiplexers. The inputs of the second multiplexer are the bits other than the carry bit. This unique way of designing leads to the decrement of the switching activity, which in turn decreases the power. Additionally, the critical path delay is also reduced compared to the existing designs discussed in literature, leading to increasing of speed. Operation of the proposed full adder can be clarified as - a) When B and C = 0 or 1 then sum = A; - b) When B = 0 or C = 1 vice versa then sum=A; - c) When B and C = 0 or 1 carry= B; ## R #### **International Journal of Research** Available at <a href="https://edupediapublications.org/journals">https://edupediapublications.org/journals</a> e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 d) When B = 0 or C = 1 vice versa then carry=A. #### IV. EXTENSION. Finite impulse response (FIR) filters are broadly used filters in DSP applications. This paper depicts a way to deal with implementation of low power digital FIR filter based on field programmable gate arrays (FPGAs). The advantages of the FPGA approach to digital filter implementation include, higher sampling rates than those are available from traditional DSP chips, lower costs than an ASIC for moderate volume applications, and more flexibility than the alternate approaches. Firstly, a single MAC unit is designed, with suitable geometries that give optimized power, area and delay. Then N no. of MAC units are controlled and designed for low power using a control logic that enables each stage at appropriate time. Multiply -Accumulator unit has become one of the most important building blocks in digital signal processing applications such as digital filtering, video coding, speech processing, and cellular phone. Filters are the most important parts of digital signal processing applications. Filters have two uses, one is signal separation and the other is signal restoration. Signal separation is used only when the signal is mixed with noise or other unwanted signals. Signal restoration is used only when the signal has been distorted. In general filtering is described by simple convolution operation such as $$y[n] = x[n] * h[n] = \sum_{k=0}^{n} x[n] * h[n-k] = \sum_{k=0}^{n} h[k] * x[n-k]$$ $$y[n] = x[n] * h[n] = \sum_{k=0}^{L-1} h[k] * x[n-k]$$ Where L is the length of FIR filter, h(n) is filters impulse response coefficients, x(n) is input sequence and y(n) is output of FIR filter. The above equations can also expressed in Z domain as $$Y(Z)=X(Z)H(Z)$$ ..... Fig. 6 FIR Filter Structure Structure of FIR filter is shown in Figure 6. Normally the filter structure has one delay element, one multiplier and an adder for each number of stages. This complete element is known as tap. The number of stages is depending upon the length of the filter. A Wallace tree multiplier is an efficient hardware implementation of a digital circuit that multiplies two integers. Wallace tree reduces the number of partial products and use carry select adder for the addition of partial products. Wallace tree is known for their favorable computation time, when adding multiple operands to two outputs using 3:2 or 4:2 compressors or both. Wallace tree guarantees the lowest whole delay. #### **International Journal of Research** Available at https://edupediapublications.org/journals e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 Fig. 7 .8 bit Wallace tree multiplier In fig. 7 blue circles represent full adder and red circle represent the half adder. Wallace tree has three steps:- - 1. Multiply each bit of multiplier with same bit position of multiplicand. Depending on the position of the multiplier bits generated partial products have different weights. - 2. Reduce the number of partial products to two by using layers of full and half adders. - 3. After second step we get two rows of sum and carry, add these rows with conventional adders. #### V. RESULTS ### Fir Wallace: Simulation | Name | Value | | 2,999,995 ps | 2,999,996 ps | 2,999,997 ps | 2,999,998 ps | 2,999,999 ps | |--------------|----------------|------------------|--------------|--------------|--------------|--------------|--------------| | Uni ci | 0 | | | | | | | | Un c2 | 0 | | | | | | | | Va a | 0 | | | | | | | | Uni c4 | 0 | | | | | | | | Va c5 | 0 | | | | | | | | Ve 6 | 0 | | | | | | | | Un σ | 0 | | | | | | | | Up as | 0 | | | | | | | | ▶ ¶ p1[7:0] | 00000000 | | | 0000 | 0000 | | | | ▶ ₱ p2(7:0) | 00000000 | | | 0000 | 0000 | | | | ▶ ₹ p3(7:0) | 00000000 | | | 0000 | 0000 | | | | ▶ 11[15:0] | 00000000000000 | | | 00000000 | 00000000 | | | | ▶ ₹ r2(15:0) | 00000000000000 | | | 00000000 | 00000000 | | | | ▶ 🦬 r3(15:0) | 00000000000000 | | | 00000000 | 00000000 | | | | ▶ ¶ r4[15:0] | 00000000000000 | | | 00000000 | 00000000 | | | | ▶ ■ r5(15:0) | 00000000000000 | | | 00000000 | 00000000 | | | | | | X1: 3,000,000 ps | | | | | | RTL #### **Design Summary** | | • | | | | | |-----------------------------------------------|------|-----------|-------------|--|--| | Device Utilization Summary (estimated values) | | | | | | | Logic Utilization | Used | Available | Utilization | | | | Number of Slices | 495 | 4656 | 10% | | | | Number of Slice Flip Flops | 171 | 9312 | 1% | | | | Number of 4 input LUTs | 940 | 9312 | 10% | | | | Number of bonded IOBs | 51 | 232 | 21% | | | | Number of GCLKs | 1 | 24 | 4% | | | #### Wallace tree: #### **Simulation** | Name | Value | 2,200 ns 2,400 ns 2,600 ns 2,800 ns | |-------------------|---------------|----------------------------------------| | ▶ <b>■</b> a[7:0] | 10101010 | 10101010 | | ▶ ■ b[7:0] | 10101011 | 10101011 | | ► ■ c[15:0] | 011100011000: | 0111000110001110 | | ► 😽 c1[7:0] | 01101110 | 01101110 | | ► € c2[7:0] | 01100100 | 01100100 | | ► ■ (3[7:0] | 01101110 | 01101110 | | ► ¶ c4[7:0] | 01100100 | 01100100 | | lle c5 | 1 | | | ▶ ₹ temp2[8:0] | 100011000 | 100011000 | | ► ₩ Cxhdl1[15:0] | 011100011000: | 0111000110001110 | | | | | | | 1 | | | | | | | | | | | | | | | | | | | | | X1: 3,000,000 ns | #### RTL **Design Summary** #### **International Journal of Research** Available at https://edupediapublications.org/journals e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 | Device Utilization Summary (estimated values) | | | | | | |-----------------------------------------------|------|-----------|-------------|--|--| | Logic Utilization | Used | Available | Utilization | | | | Number of Slices | 84 | 4656 | 1% | | | | Number of 4 input LUTs | 148 | 9312 | 1% | | | | Number of bonded IOBs | 32 | 232 | 13% | | | #### VI.CONCLUSION In this paper, a multiplexer using full adder of Wallace tree multiplier and XOR gate gets modified so that the area can be reduced. In reduction phase by implementing a modified full adder of Wallace tree an average power, area and delay is reduced, compared to existing methods respectively is achieved.FIR filter implementation of Wallace multiplier, as the Extension, result confirms that the proposed Wallace tree multiplier is suitable for low power and small area applications. #### REFERENCES - [1] Wallace, "A suggestion for a fast multiplier," IEEE Transaction on Electronic Computers, Vol. EC-13, pp.14-17. - [2] R. S. Waters, E. E. Swartzlander, "A reduced complexity Wallace multiplier reduction," IEEE Transactions on Computers, vol. 59, no. 8, pp.1134-1137, 2010. - [3] Shahzad Asif and Yinan Kong, "Low Area Wallace Multiplier," VLSI Design, vol. 2014. - [4] M. J. Rao, S. Dubey, "A high speed and area efficient Booth recoded Wallace tree multiplier for Fast Arithmetic Circuits," in Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics - (PrimeAsia), Hyderabad, India, 5- 7 Dec. 2012, pp.220-223. - [5] Karthick, S. Karthika and S. Valannathy, "Design and Analysis of Low Power Compressors," International Journal of Advanced Research in Electrical, Electronics and instrumentation Engineering, YoU, Issue 6, Dec. 2012. - [6] S. T. Oskuii, P. G. Kjeldsberg, "Power optimized partial product reduction interconnect ordering in parallel multipliers," NORCHIP, Aalborg, 19-20 Nov. 2007, pp.1-6. - [7] M. V. P. Kumar, S. Sivanantham, S. Balamurugan, P. S. Mallick, "Low power reconfigurable multiplier with reordering of partial products," in International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), Thuckafay, 21-22 July 2011, pp.532-536. - [8] P. Anitha, P. Ramanathan, "A new hybrid multiplier using Dadda and Wallace method," in International Conference on Electronics and Communication Systems (ICECS), Coimbatore, India, 13-14 Feb. 2014, pp.1-4. - [9] Yingtao Jiang, Abdulkarim Al-Sheraidah, Yuke Wang, Edwin Sha, and Jin-Gyun Chung, "A Novel Multiplexer-Based Low-Power Full Adder," IEEE transactions on circuits and systems, vol. 51, no. 7, July 2004, pp. 345-348. - [10] S. Murugeswari, S. K. Mohideen, "Design of area efficient and low power multipliers using multiplexer based full adder," in 2nd International Conference on Current Trends in Engineering and technology (ICCTET), Enathi, India, July 2014, pp.388-392. # R R #### **International Journal of Research** Available at https://edupediapublications.org/journals e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue-01 January 2018 [11]C. S. Wallace, "A Suggestion for a Fast Multiplier", *IEEE Trans. Electronic Computers*, vol. 13, no. 1, pp. 14-17, Feb. 1964. [12]E. Bloch, "The Engineering Design of the Stretch Computer", Proc. Eastern Joint IRE-AIEE-ACM Computer Conf., no. 16, pp. 48-58, 1959. [13]L. Dadda, "Some Schemes for Parallel Multipliers", Alta Frequenza, vol. 34, pp. 349-356, 1965. [14] Wallace C. (1964), 'A suggestion for a fast multiplier', iEEE Transactions on Electronic Computers, Vol. EC-13, pp. 14-17.