

# Efficient Design of Hybrid Lut /Multiplexer Fpga Logic

Kommisetti Navya, Mr.A.M.V.N Maruti, N.Chandrashekhar

M-Tech ,Dept of ECE, Khammam Institute of Technology and Science, Khammam.

Associate Professor, Dept of ECE, Khammam Institute of Technology and Science, Khammam.

Associate Professor& HOD, Dept of ECE, Khammam Institute of Technology and Science, Khammam.

## Abstract

Hybrid configurable logic block architectures for area-programmable gate arrays that include a mixture of lookup tables and hardened multiplexers are evaluated closer to the goal of higher logic density and place discount. Multiple hybrid configurable good judgment block architectures, each nonfractural and fractural with varying MUX:LUT logic element ratios are evaluated throughout benchmark suites (VTR and CHStone) the use of a custom tool waft which includes LegUp-HLS, Odin-II front-quit synthesis, ABC good judgment synthesis and era mapping, and VPR for packing, placement, routing, and structure exploration. [1] Technology optimizations that mapping concentrate on he proposed architectures also are carried out inside ABC. Experimentally, we show that for nonfracturable architectures, with none mapper optimizations, we obviously save as much as ~eight% region post place and path; both accounting for complicated logic block and routing region while retaining mapping depth. With architecture-conscious generation mapper optimizations in ABC, extra area is saved, put

up-location-and-path. For fracturable architectures, experiments display that simplest marginal profits are seen after region-and-path as much as  $\sim 2\%$ . For both nonfracturable and fracturable architectures, we see minimal impact on timing performance for the architectures with nice location-performance.

**Key words**: - Field-programmable gate array (FPGA), hybrid complicated good judgment block, multiplexer (MUX).

# 1. INTRODUCTION

Throughout the records of disciplineprogrammable gate arrays (FPGAs), research tables (LUTs) have been the primary logic detail realisecombinational (LE) used to good judgment. A K-enter LUT is universal and very bendy-capable of put in force any K -enter Boolean function. The use of LUTs simplifies generation mapping as the hassle is decreased to graph masking trouble. However, а an exponential location fee is paid as larger LUTs are taken into consideration. The fee of K among four and six is typically seen in industry and



academia, and this range has been confirmed to provide vicinity/performance а great compromise. Recently, a number of different works have explored opportunity FPGA LE architectures for overall performance improvement to shut the massive gap among application-specific integrated FPGAs and circuits (ASICs). In this paper, we advise incorporating (some) hardened multiplexers (MUXs) within the FPGA common sense blocks method of growing silicon vicinity as a efficiency and good judgment density. [2] The MUX-based totally logic blocks for the FPGAs have seen fulfillment in early commercial architectures, including the Actel ACT-1/2/3 architectures, and efficient mapping to these structures has been studied within the early 1990s. Owever, their use in business chips has waned, perhaps in part because of the ease with which logic capabilities may be mapped into LUTs, simplifying the complete pc aided layout (CAD) go with the flow. Evertheless, it is extensively understood that the LUTs are inefficient at implementing MUXs, and that MUXs are frequently used in common sense circuits. To underscore the inefficiency of LUTs enforcing MUXs, do not forget that a sixinput LUT (6-LUT) is essentially a 64-to-1 MUX (to pick out 1 of sixty four truth-table rows) and 64-SRAM configuration cells, yet it can simplest

recognise a four-to-1 MUX (four facts + 2 pick = 6 inputs). In this paper, we gift a six-input LE based totally on a 4-to-1 MUX, MUX4, that could realize a subset of six-enter Boolean common sense features, and a new hybrid complex common sense block (CLB) that consists of a aggregate of MUX4s and six-LUTs. The proposed MUX4s are small in comparison with a 6-LUT (15% of 6-LUT place), and may efficaciously map all 2, 3-enter functions and some 4, five, 6-enter capabilities. In addition, we discover fracturability of LEs-the ability to break up the LEs into multiple smaller elements-in each LUTs and MUX4s to boom common sense density. The ratio of LEs that must be LUTs as opposed to MUX4s is likewise explored in the direction of optimizing common sense density for each nonfracturable and fracturable FPGA architectures. To facilitate the structure exploration, we developed a CAD go with the flow for mapping into the proposed hybrid CLBs, created the usage of ABC and VPR , and describe technology mapping strategies that encourage the choice of good judgment capabilities that may be embedded into the MUX4 factors.

# 2.RELEGATED WORK

### 2.1Existing System

Recent works have shown that the heterogeneous architectures and synthesis



strategies could have a sizable impact on enhancing logic density and postpone, narrowing the ASIC-FPGA gap. Works by using Anderson and Wang with "gated" LUTs, then with asymmetric LUT LEs , show that the LUT elements found in commercial FPGAs offer useless flexibility. Toward stepped forward put off and region, the macrocell-based totally FPGA architectures have been proposed. [3] These studies describe extensive modifications to the traditional FPGA architectures, whereas the adjustments proposed right here build on architectures used in industry and academia . Similarly, and-inverter cones had been proposed as replacements for the LUTs, stimulated by using and-inverter graphs (AIGs) .Purnaprajna Ienne explored the opportunity of and repurposing the prevailing MUXs contained within the Xilinx Logic Slices. Similar to this work, they use the ABC priority cut mapper in addition to VPR for packing, region, and direction. However, their paintings is mostly postpone-based totally showing a median speedup of 16% the use of best ten of nineteen VTR7 benchmarks.

# **2.2Proposed System**

The MUX4 LE consists of a four-to-1 MUX with optionally available inversion on its inputs that allow the belief of any 2, three-input characteristic, some 4, five-enter capabilities,

and one 6-enter feature-a four-to-1 MUX itself with elective inversion at the data inputs. A fourto-1 MUX matches the input pin rely of a 6-LUT, making an allowance for fair comparisons with admire to the connectivity and intracluster routing. [4] Naturally, any two-input Boolean feature may be without difficulty implemented in the MUX4: the 2 feature inputs can be tied to the pick out lines and the truth desk values (logic-0 or logic-1) may be routed to the statistics inputs for that reason. Or alternately, a Shannon decomposition may be completed about one of the variables-the variable can then feed a select input. The Shannon cofactors will comprise at most one variable and may, consequently, be fed to the information inputs (the elective inversion may be wished). For three-input capabilities, keep in mind that a Shannon decomposition approximately one variable produces cofactors with at most variables. A 2nd decomposition of the cofactors approximately one among their two ultimate variables produces cofactors with at most one variable. Such single-variable cofactors can be fed to the records inputs (the elective inversion may be wished), with the decomposition variables feeding the select inputs. Likewise, functions of extra than four inputs can be implemented in the MUX4 so long as Shannon



decomposition with admire to any two inputs produce cofactors with at maximum one enter.

# **3.IMPLEMENTATION**

## Hybrid Complex Logic Block:

A variety of different architectures have been taken into consideration—the first being a nonfracturable structure. In the nonfracturable architecture, the CLB has 40 inputs and ten fundamental LEs (BLEs),

with every BLE having six inputs and one output following empirical data in prior work this nonfracturable CLB structure with BLEs that include an optional register. We range the ratio of MUX4s to LUTs within the ten detail CLB from 1:9 to five:five MUX4s:6-LUTs. The MUX4 detail is proposed to work in conjunction with 6-LUTs, growing a hybrid CLB with a combination of 6-LUTs and MUX4s (or MUX4 versions). The enterprise of our CLB and inner BLEs. For fracturablearchitectures, the CLB has 80 inputs and ten BLEs, with every BLE having eight inputs and two outputs emulating an Altera Stratix Adaptive-LUT . The same sweep of MUX4 to LUT ratios was additionally finished. Shows the fracturable structure with 8 inputs to every BLE that carries non-compulsory registers.[5] We compare fracturability of LEs as opposed to nonfracturable LEs within the context of MUX4 factors when you consider that fracturable LUTs are common in commercial

architectures. For example, Altera Adaptive 6-LUTs in Stratix IV and Xilinx Virtex five 6-LUTs may be fractured into smaller LUTs with a few barriers on inputs. The crossbar for fracturable architectures are large than the nonfracturable architectures for two motives. Due to the digital growth of LEs, a larger variety of CLB inputs are required, which will increase crossbar size. Since there are actually twice as many outputs from the LEs, those extra outputs need to also be fed back into the crossbar, also growing its length. Due to this disparity in crossbar length, fair comparisons cannot be made among fracturable and nonfracturable architectures. Therefore, in this paper, we compare nonfracturable hvbrid CLB architectures to а baseline LUT only nonfracturable architecture and we examine fracturable hybrid CLB architectures to a baseline LUT-simplest fracturable architecture. Sparse crossbars have been previously studied and in this paper, we version a 50% depopulated crossbar in the CLB for intracluster routing for each nonfracturable and fracturable architectures as compared with the preliminary publication that best modeled a full input crossbar.Extended dialogue on architecture modelling.

# Area Modelling:

MUX4 Logic Element: Initial estimates of the MUX4 detail confirmed that the MUX4 is ~10%



the region of a 6-LUT normal. A 4-to-1 MUX can be found out with three 2-to-1 MUXs. Hence, the MUX4 detail includes seven 2-to-1 MUXs, four SRAM cells, and 4 inverters in overall (see Fig. 1). [6] The non-compulsory inversion makes use of the 4 SRAM cells, whereas the rest of the LE configuration is finished thru routing. In addition, the intensity of the MUX tree is halved compared with the 6-LUT, which has six 2-to-1 MUXs on its longest paths. Conservatively, assuming regular pass transistor sizing and that the place of a 2-to-1 MUX and 6 transistor SRAM mobile are more or less equal, the MUX4 element has (1/16)th the SRAM location and (1/8)th the MUX location of a 6-LUT. These estimates were revised using transistor level modelling of the circuit blocks. Transistor-level optimization of the constituent circuit blocks of an FPGA calls for an information of the most efficient placedelay tradeoffs for every individual circuit block. This calls for extracting a representative crucial path, which is a course whose composition of blocks and topology might be much like the crucial path of a specific design. Extracting the consultant essential route lets in us to choose to what volume each individual block is timing crucial, which as a consequence establishes an area-postpone tradeoff desires for every block. This is in keeping with the transistor-degree

optimization device developed formerly . We use the results of prior paintings to establish the most fulfilling place-postpone tradeoff for 6-LUTs in a conventional island-fashion FPGA structure with usual architectural parameters. The ensuing 6-LUT postpone serves as a factor of reference for optimization for the circuits considered in this paper: in the hobby of maximizing vicinity discount at the same time as permitting performance to be maintained (ignoring the differences in cellular counts among mapping to a traditional LUT and the LEs proposed in this paper), we strive to match the put off of a 6-LUT at the same time as minimizing the vicinity of every of the variants of the MUX4 circuits. [7] Transistor degree modelling and optimizations were based on a predictive 22-nm excessive performance procedure, even as the place model supplied in earlier paintings changed into used to estimate the location of diverse circuit structures. With this system, we determined an area-postpone most beneficial 6-LUT has a place of 930 minimum-width transistors, and a worst-case postpone of 261 playstation . For the MUX4 cell and Dual MUX4 cellular, a minimal place and minimum postpone cellular became created. The minimum vicinity MUX4 cellular has a place of ninety five minimumwidth transistors and a put off of 204 playstation ; all transistors were



minimal-width in this case, and as the minimal area answer for this circuit was able to meet (and enhance upon) the worst-case put off target of a 6-LUT. Similarly, the Dual MUX4 cell has a place of 249 minimal-width transistors whilst meeting the worst-case delay requirement. However, we selected to use the minimum postpone layout for each the MUX4 and Dual MUX4 elements for the rest of the look at as there isn't always a widespread boom in location over the minimum place layout.

# **Select Mapping:**

Depending at the circuit, NaturalMux or MuxMap can be desired. In choose mapping, the circuit is first mapped the use of NaturalMux. Following from the discussion in Section III-D, we recognise that if a circuit's MUX4:LUT ratio is better than the architectural ratio, most location discounts are found out. Therefore, if the natural ratio of the circuit is higher than our target architectural ratio, we use this mapping. Otherwise, [8] if the natural ratio is lower than the architectural ratio, we rerun the mapping with the MuxMap mapper to inspire the choice of extra MUX4-embeddable LEs. Note that the technology mapping run-time is a small fraction of that required for placement and routing.



# Fig 1 Architecture Diagram

# 4. EXPERIMENTAL RESULTS



### Mux4 synthesis





#### **Mux4** simulation



### **Dual mux4 synthesis**



#### **Dual mux4 simulation**

### **5.CONCLUSION**

We have proposed a brand new hybrid CLB architecture containing MUX4 difficult MUX factors and proven techniques for efficaciously mapping to these architectures. Weighting of MUX4-embeddable capabilities with our MuxMap method combined with a pick out mapping method provided aid to circuits with low herbal MUX4-embeddable ratios. [9] We additionally furnished analysis of the benchmark suites postmapping, discussing the distribution of capabilities inside each benchmark suite. From our first set of experiments with nonfracturable architectures, vicinity discounts of up to 8% were seen for a four:6 MUX4:LUT architecture inside the CHStone suite with a 2: eight architecture most feasible for the VTR suites with ~five% place financial savings. Our 2dset of experiments with fracturable architectures confirmed that the ability of a fracturable LUT may be very powerful, reducing the impact of the MUX4 LEs, yielding smaller  $\sim 2\% - 3\%$  location financial savings over the VTR7 and CHStone benchmark suites with less competitive 2:eight and 1:9 architectures, respectively. Interestingly, we again located that extraordinary architectural conclusions can be made primarily based at the benchmark circuits hired in an structure examine, due to the fact CHStone benchmarks generally desired more aggressive MUX4:LUT structure ratios.[10] The CHStone benchmarks being excessive-level synthesized with LegUp-HLS additionally confirmed marginally better overall performance and this can be because of the way LegUp performs HLS on the CHStone benchmarks themselves. Overall, the addition of MUX4s to



FPGA architectures minimally impact FMax and show ability for improving logic-density in nonfracturable architectures and modest capability for enhancing logicdensity in fracturable architectures.

## **6.REFERENCE**

 J. Stephen Alexander Chin, Jason Luu,
 Safeen Huda, and Jason H., "The Hybrid LUT/Multiplexer FPGA Logic Architectures,"
 IEEEVOL. 24, NO. 4, APRIL 2016.

[2] Y. Hara, H. Tomiyama, S. Honda, and H. Takada, "Proposal andquantitative analysis of the CHStone benchmark program suite forpractical C-based high-level synthesis," J. Inf. Process., vol. 17,pp. 242–254, Oct. 2009.

[3] A. Canis et al., "LegUp: High-level synthesis for FPGA-basedprocessor/accelerator systems," in Proc. ACM/SIGDA FPGA, 2011,pp. 33–36.

[4] E. Ahmed and J. Rose, "The effect of LUT and cluster size on deepsubmicron FPGA performance and density," IEEE Trans. Very LargeScale Integr. (VLSI), vol. 12, no. 3, pp. 288–298, Mar. 2004.

[5] J. Rose, R. Francis, D. Lewis, and P. Chow, "Architecture of fieldprogrammable gate arrays: The effect of logic block functionalityon area efficiency," IEEE J. Solid-State Circuits, vol. 25, no. 5,pp. 1217–1225, Oct. 1990.

[6] H. Parandeh-Afshar, H. Benbihi, D. Novo, and P. Ienne, "RethinkingFPGAs: Elude the flexibility excess of LUTs with and-inverter cones,"in Proc. ACM/SIGDA FPGA, 2012, pp. 119–128.

[7] J. Anderson and Q. Wang, "Improving logic density through synthesisinspired architecture," in Proc. IEEE FPL, Aug./Sep. 2009, pp. 105–111.

[8] J. Anderson and Q. Wang, "Area-efficient FPGA logic elements:Architecture and synthesis," in Proc. ASP DAC, 2011, pp. 369–375.

[9] J. Cong, H. Huang, and X. Yuan,
"Technology mapping and architectureevalution for k/m-macrocell-based FPGAs," ACM Trans. Design Autom.Electron. Syst., vol. 10, no. 1, pp. 3–23, Jan. 2005.

[10] Y. Hu, S. Das, S. Trimberger, and L.He, "Design, synthesis and evaluation of heterogeneous FPGA with mixed LUTs and macro-gates,"in Proc. IEEE ICCAD, Nov. 2007, pp. 188–193.

# **Authors Profiles**



**KOMMISETTI NAVYA** pursuing M.Tech (VLSI system design)Degree from Khammam Institute of Technology and Science(JNTU



HYDERABAD), Ponnekal, Khammam, in (2015-2017) 2017, and B.Tech Degree from Medha Institute Of Science and Technology for Women (JNTU HYDERABAD), Saiprabhathnagar(Peddathanda), Khammam,in( 2011-2015) 2015, all in Electronics and Communication Engineering.



**Mr.A.M.V.NMaruti**Pursuing Phd From A.N.U, GunturCompleted M.Tech In ommunications From A.N.U,GunturWorking As Associate Professor In Kits, Khammam



**N.CHANDRASHEKHAR** completed his M-Tech with electronics and communication engineering. He has published more than five international journals. Currently he is a research Scholar in JNTU, Hyderabad and working as Associate Professor & Head of the department for ECE in Khammam Institute of Technology and Science.