# Design of Digit-Serial FIR Filters Using GB Algorithm 

*N.MANASA<br>**V.VIJAYA<br>*M.TECH Dept of ECE, VAAGDEVI COLLEGE OF ENGINEERING<br>**Asso. prof Dept of ECE, VAAGDEVI COLLEGE OF ENGINEERING<br>***Asso. prof Dept of ECE, VAAGDEVI COLLEGE OF ENGINEERING

***G.BABU


#### Abstract

:

Many efficient algorithms and architectures have been introduced for the design of low complexity bit-parallel Multiple Constant Multiplications (MCM) operation which dominates the complexity of many digital signal processing systems. On the other hand, little attention has been given to the digit-serial MCM design that offers alternative low complexity MCM operations. In this paper, we address the problem of optimizing the gate-level area and delay in digit-serial MCM designs, for that purpose we introduce high level CSE and GB algorithms. Experimental results show the efficiency of the proposed optimization algorithms and of the digitserial MCM architectures in the design of digit-serial MCM operations and finite impulse response filters.


Index Terms-0-1 integer linear programming (ILP), digit-serial arithmetic, finite impulse response (FIR) filters, CSE and GB algorithms, multiple constant multiplications.

## 1.

## INTRODUCTION

FINITE impulse response (FIR) filters are of great importance in digital signal processing (DSP) systems since their characteristics in linear-phase and feed-forward implementations make them very useful for building stable high-performance filters [1]. These are mainly consists of multiplication of a vector of input samples with a set of constant coefficients is known as MCM operations.

Multiple constant multiplications (MCM) are typical operations in digital signal processing (DSP) as well as in the design of finite-impulse-response (FIR) filters, as shown in Fig. 1 (a).

Multiplications are expensive in terms of area and power consumption, when implemented in hardware. The relative cost of an adder and a multiplier in hardware, depends on the adder and multiplier architectures. For example, a $k \times k$ array multiplier has k times the logic (and area) and twice the latency of the slowest ripple carry adder. Since the values of the coefficients are known beforehand, the full flexibility of a multiplier is not necessary, and it can be more efficiently implemented by converting it into a sequence of additions/subtractions and shift operations are shown in fig. 1 (b).

For the shift-adds implementation of constant multiplications, a straightforward method, generally
known as digit based recoding [2], initially defines the constants in binary.


Figure 1(a): A multiplier-based MCM example


Figure 1(b): A multiplierless-based MCM example

Then, for each " 1 " in the binary representation of the constant, according to its bit position, it shifts the variable and adds up the shifted variables to obtain the result. As a simple example, consider the constant multiplications 29x and 43x. Their decompositions in binary are listed as follows:

$$
\begin{aligned}
& 29 x=(11101) \operatorname{bin} x=\mathrm{x} \ll 4+\mathrm{x} \ll 3+\mathrm{x} \ll 2+ \\
& x \\
& 43 x=(101011) \operatorname{bin} x=\mathrm{x} \ll 5+\mathrm{x} \ll 3+\mathrm{x} \ll 1 \\
& +x
\end{aligned}
$$

Which requires six addition operations as illustrated in Fig. 2 (a)

(a)

(b)

(c)

Fig. 2. Shift-adds implementations of $29 x$ and 43x. (a) Without partial product sharing [2] and with partial product sharing. (b) Exact CSE algorithm [5]. (c) Exact GB algorithm [8].

The algorithms designed for the MCM problem can be categorized in two classes: common sub expression elimination (CSE) algorithms [3]-[5] and graph-based (GB) algorithm [6]-[8].

The proposed algorithm that optimally solves this maximal sharing problem. This problem has been the subject of extensive research in recent years. Two key strategies have had a large impact in the optimization of MCMs. One is to consider not only adders, but also subtracter to combine partial
terms, thus increasing the opportunity for the sharing of common sub expressions.

The second is the usage of the Canonical Sign Digit (CSD) representation for the coefficients. This representation minimizes the number of non-zero digits; hence the maximal sub expression sharing search starts from a minimal level of complexity.

In a recent paper, Park [9] proposes the usage of a Minimal Signed Digit (MSD) representation for the coefficients. The MSD representation is obtained from the CSD representation by relaxing the requirement that there cannot be two consecutive nonzero digits. Under the MSD representation, a given numerical value can have multiple representations. However, in all of them, the number of non-zero digits is the same as the CSD representation. The algorithm proposed in [9] exploits the redundancy of the MSD representation by choosing the MSD instance that leads to a maximal sharing in the implementation of efficient FIR filters.

To the best of our knowledge, all previous solutions to this problem have been heuristic, providing no indication as to how far from the optimum their solution is. We
propose an exact algorithm that is feasible for many real situations. We model this problem as a Boolean network that covers all possible partial terms which may be used to generate the set of coefficients in the MCM instance. The inputs to this network are shifted versions of the value that serves as input to the MCM operation. Each adder and subtracter used to generate a given partial term is represented as an AND gate. All partial terms that represent the same numerical value are ORed together. There is a single output which is an AND over all the coefficients in the MCM. We cast this problem into a 0-1 Integer Linear Programming (ILP) problem by requiring: that the output is asserted,
meaning that all coefficients are covered by the set of partial terms found; while minimizing the total number of AND gates that evaluate to one, i.e., the number of adders/subtracters.

We have applied this algorithm to coefficients represented in binary, CSD and MSD representations. Note that the redundancy of the MSD representation can be readily incorporated in our model, where the equivalent MSD representations are
simply new inputs to the OR gate that generates a given coefficient.

Returning to our example in Fig. 2, the exact CSE algorithm of [9] gives a solution with four operations by finding the most common partial products $3 x=(11) \operatorname{bin} x$ and $5 x=$ (101)bin $x$ when constants are defined under binary, as illustrated in Fig. 2(b). On the other hand, the exact GB algorithm [6] finds a solution with the minimum number of operations by sharing the common partial product $7 x$ in both multiplications, as shown in Fig. 2(c). Note that the partial product $7 x$ $=(111) \operatorname{bin} x$ cannot be extracted from the binary representation of $43 x$ in the exact CSE algorithm [5].

## 2. Digit-

## Serial Arithmetic

In digit-serial arithmetic, data words are divided into digits, with a digit size of $d$ bits, which are processed in one clock cycle. The special cases of the digit-serial computation, called bit-serial and bit-parallel processing, occur when the digit size $d$ is equal to 1 and input data word length, respectively. The digit-serial computation plays an important


p-ISSN: 2348-6848<br>e-ISSN: 2348-795X<br>Volume 03 Issue 10<br>June 2016

role when the bit-serial implementations cannot meet delay requirements and the bitparallel designs require excessive hardware. Thus, an optimal tradeoff between area and delay can be obtained by changing the digit size parameter $(d)$. The fundamental digitserial operations were introduced in [10]. The digit-serial addition, subtraction, and left shift operations are depicted in Figure 3 when $d$ is equal to 3 . Notice from Figure 3(a) that in a digit-serial addition operation, in general, the number of required full adders (FAs) is equal to $d$ and the number of necessary D flip-flops is always 1. The subtraction operation (Figure 3(b)) is implemented using 2's complement, requiring the initialization of the D flip-flop with 1 and additional $d$ inverter gates with respect to the digit-serial addition operation.


Figure 3: The digit-serial operations when $d$ is 3: (a) addition operation; (b) subtraction
operation; (c) left shift by 2 times;(d)left shift by 4 times.


Figure 4: Bit-serial realization of shift-adds implementation of $29 x$ and $43 x$ given in Figure 2(c).

As an example on digit-serial realization of constant multiplications under the shift-adds architecture, Figure 4 illustrates the bit-serial implementation of $29 x$ and $43 x$ obtained by the exact GB algorithm [8] given in Figure 2(c). The network includes 2 bit serial additions, 1 bit-serial subtraction, and 5 D flip-flops for all the left shift operations. Observe from Figure 4 that at each clock cycle, one bit of the input data $x$ is applied to the
network input and one bit of the constant multiplication output is computed. Note that the digit-serial design of the MCM operation occupies significantly less area when compared to its bit-parallel design and the area of the design is not dependent on the bit-width of the input data. However, the
latency of the MCM computation is increased due to the serial processing. Suppose that $x$ is a 16 -bit input value. To obtain the actual output of $29 x$ and $43 x$ in the bit-serial network of Figure 4, 21 and 22 clock cycles are required respectively. Thus, necessary bits must be appended to the input data $x$, i.e., 0 s , if $x$ is an unsigned input or sign bits, otherwise. Moreover, in the case of the conversion of the outputs obtained in digit-serial to the bit parallel format, storage elements and control logic are required.

Note that while the sharing of addition/subtraction operations reduces the complexity of the digit-serial MCM design (since each addition and subtraction operation requires a digit-serial operation), the sharing of shift operations for a constant multiplication reduces the number of D flipflops, and consequently, the design area. Observe from Figure 4 that two D flip-flops cascaded serially to generate the left shift of $7 x$ by two can also generate the left shift of $7 x$ by one without adding any hardware cost.

GB algorithm can be applied for any coefficient pair combinations. Hence GB algorithm is used and number of operations is reduced drastically than other algorithms.


Fig5. Output for FIR filter with digit based recoding algorithm

Four filter coefficient 29,43,59,89 values are taken for digit serial FIR filter design. X(n) is taken as a input sequence and $\mathrm{Y}(\mathrm{n})$ is taken as output sequence.
3.

## SIMULATION RESULTS

 algorithm

Fig. 5 shows 4 tap FIR filter with without partial product sharing (Digit based recoding) algorithm and Fig. 6 displays 4 tap filter with CSE algorithm.


Fig8. RTL schematic for FIR filter with GB algorithm


Fig7. Output for FIR filter with GB algorithm

Fig. 7 displays 4 tap filter with GB algorithm. This simulation result was displayed by modelsim software. These are the simulation results displayed by modelsim software.

After completion of simulation process in Modelsim tool, synthesis process is takes place to calculate gate count and delay report. Fig. 8 shows the RTL schematic view of FIR filter with MCM architecture.
3.1. FIR FILTER DEVICE UTILIZATION REPORT


FIR filter with digit size $d=4$ was synthesized separately for device utilization in terms of gate count with each different algorithms. Fig 9 and Fig. 10 shows the device utilization report of FIR filter with GB algorithm. Table1

Device utilization summary:

Selected Device: 3s500efg320-4

| Number of Slices: | 87 | out of | 4656 | 18 |
| :--- | ---: | :--- | :--- | :--- |
| Number of Slice Flip Flops: | 36 | out of | 9312 | $0 \%$ |
| Number of 4 input LUTs: | 161 | out of | 9312 | $1 \%$ |
| Number of IOs: | 18 |  |  |  |
| Number of bonded IOBs: | 18 | out of | 232 | $7 \%$ |
| Number of GCLKs: | 1 | out of | 24 | 48 |

Partition Resource Summary:

Fig 9. Device utilization report
explains that the FIR filter with GB algorithm utilized less number of gate count and delay than other two algorithms.

## Table -1: Delay and Gate Count Comparison

| FIR Filter | Delay | DeviceUtilization |
| :--- | :--- | :--- |
| Normal <br> method | 14.203 | 323 |
| CSE <br> Algorithm | 15.528 | 321 |
| GB <br> Algorithm | 12.875 | 299 |

Thus the implementation of digit serial FIR filter was implemented with low complexity


MCM architectures using GB algorithm. Device utilization and delay values are compared for hardware implementation. Hence this MCM approach drastically reduces the system complexity, area and delay and FPGA hardware real time implementation has performed with spartan3 version. Future enhancement of this paper is to design MCM architecture with more coefficient pairs for FIR filter implementation.

## REFERENCES

[1] L. Wanhammar, DSP Integrated Circuits. New York: Academic, 1999.
[2] M. Ercegovac and T. Lang, Digital Arithmetic. San Mateo, CA: Morgan Kaufmann, 2003.
[3] R. Hartley, "Subexpression sharing in filters using canonic signed digit multipliers," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 43, no. 10, pp. 677-688, Oct. 1996.
[4] I.-C. Park and H.-J. Kang, "Digital filter synthesis based on minimal signed digit representation," in Proc. DAC, 2001, pp. 468-473.
[5] L. Aksoy, E. Costa, P. Flores, and J. Monteiro, "Exact and approximate
algorithms for the optimization of area and delay in multiple constant multiplications," IEEE Trans. Comput.-Aided Design Integr. Circuits

Syst., vol. 27, no. 6, pp. 1013-1026, Jun. 2008.
[6] A. Dempster and M. Macleod, "Use of minimum-adder multiplier blocks in FIR digital filters," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 42, no. 9, pp. 569-577, Sep. 1995.

## AUTHOR1:-

*N.MANASA completed her B.Tech GANAPATHY ENGINEERIMG COLLEGE in 2014 and M.Tech completed in VAAGDEVI COLLEGE OF ENGINEERING

## AUTHOR2:-

**V.VIJAYA working as Assoc. prof in Dept of ECE, VAAGDEVI COLLEGE OF ENGINEERING

## AUTHOR3:-

***MR.G.BABU working as Assoc. prof in Dept of ECE, VAAGDEVI COLLEGE OF ENGINEERING

