# R

#### **International Journal of Research**

Available at https://pen2print.org/index.php/ijr/

e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue 20 September 2018

### A Bit-Plane Decomposition Row-Based Pipo Vlsi Architecture For Hevc

Vagu. Radha Haneesha & Vemu. Srinivasa Rao

<sup>1</sup>M.tech-Scholar, Dept of ECE, Shri Vishnu Engineering College of Women, Vishnupur, West
Godavari District, Bhimavaram, A.P, India

<sup>2</sup>Associate Professor, Dept of ECE, Shri Vishnu Engineering College of Women, Vishnupur,
West Godavari District, Bhimavaram, A.P, India

ABSTRACT: High Efficiency Video Coding (HEVC) is currently being prepares as the modern video coding standard of the video coding Experts and the International Standard Organization / International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group. VLSI Architecture is proposed for the HEVC encoder. The VLSI architecture is based on signed bit transform (SBT) matrix which contains only 0,1 or -1. These SBT matrices are very simple and have lower bit width and reduce number of addition operations because it contains many zero elements. So here adder reuse strategy can be used. Hence power consumption and area consumption are reduced. So the VLSI architecture can be synthesised with proper area and high speed. The proposed transform hardware architecture can process video data with higher speed and reduced area.

Keywords: High Efficiency Video Coding, Signed Bit Tansform, High Efficiency Video Coding.

#### I. INTRODUCTION

Recently, the High Efficiency Video Coding (HEVC) standard is the joint video project of Video Coding Experts Group (VCEG) the International and Organization for Standardization/ International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group (MPEG) standardization organizations which are working together partnership known the **Joint** as Collaborative Team on Video Coding (JCT-VC). HEVC is the latest video coding with a higher coding performance than other existing ones. Many novel algorithms are introduced in coding HEVC.

The  $32 \times 32$  transform is the most complex in the transforms of HEVC; thus, the improvement of the  $32 \times 32$  transform also can be efficient for the whole transform circuit.

Applying the proposed SBT algorithm to the transform architecture, instead of the integer transform matrix circuits, the SBT matrix circuits are implemented and the input data are transformed with each SBT matrix circuit, respectively. Due to the simple elements of SBT matrices, the bit widths of intermediate transformed data and output data are significantly reduced. The bit widths of SBT increase slowly as the intermediate data are processed stage by stage, which shortens the circuit delay and constrains the clock cycle to be smaller. Additionally there are so many zero elements in SBT matrices. The sparse characteristic of the SBT matrices can for reducing the operations in the transform process. Thus, the adders reuse method based on the element redundancy characteristic of SBT matrices for reducing the number of adders.

Many research works on transform implementation optimization for HEVC have been done in the past. Meher *et al.* proposed an efficient constant matrix multiplication scheme to derive parallel architectures of a transform for HEVC, which can support the real-time ultra HD video codec. In some simplification

### R

#### International Journal of Research

Available at <a href="https://pen2print.org/index.php/ijr/">https://pen2print.org/index.php/ijr/</a>

e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue 20 September 2018

strategies, such as the reuse of transform multiplier structure and implementation, were adopted for saving the hardware cost. The work presented a architecture that uses transform canonical signed digit representation and sub-expression elimination common technique to perform the multiplication with a shift-add operation. Based on these optimizations, the transform architecture is greatly simplified for practice application. However, with the increasing applications of high definition (HD) and ultra HD video coding, the higher processing capacity of codes is required. Thus, all modules in video code, including the transform, need to be further improved for real-time coding with low complexity.

The existing transform architectures consider how to reduce the number of arithmetic operators, such as addition and multiplication, more than the data bit width in the transform. In fact, the data bit width is also an important factor impacting on the circuit speed and area of VLSI architecture.

A circuit with a large bit width needs a larger number of fan-in or fanout of logic gate, and more MOS devices are required in the logic gate circuit. Thus, the capacitive load and resistance of the logic gate all increase with widening bit width. According the first-order resistance and capacitance (RC) circuit model theory, the delay of the circuit is related with RC. Large RC leads to long circuit delay. The circuit delay varying with the increasing input bit width in two typical CMOS processes. As for the adder, the carry chain is the critical path for the circuit delay, which is also dependent on the input and output bit width.

Each extra bit increasing will lead to larger delay. Thus, aside from the number of arithmetic operations, the bit width is the other optimization factor for fast transform architecture. In this brief, we propose a new VLSI architecture for the integer transforms of the HEVC standard for reducing the bit widths of data. The integer transform matrix is decomposed into several signed bit-plane transform (SBT) matrices that are used in the proposed architecture. Moreover, a number of adders are reused based on the redundant property of elements of bit matrices. With the bit matrix-based transform algorithm, the proposed VLSI transform architecture can process 32 pixels/cycle data throughput maximally with very high working frequency and proper area.

#### II. EXISTED SYSTEM

In order to narrow the bit width of intermediate transformed data, the bit decomposition algorithm decomposed the integer transform matrix into several SBT matrices.

Let  $d_{i,j}$  be the element in the ith row and jth column in the  $N \times N$  integer transform matrix  $D_N$ , i.e.,  $D_N = (d_{i,j})$ . If  $d_{i,j}$  is positive, the binary expression of  $d_{i,j}$  is  $(b_{K-1,i,j}, \ldots, b_{1,i,j}, b_{0,i,j})_2$ ,  $b_{k,i,j} \in \{0, 1\}$ . Then, there is the relation

$$d_{i,j} = \sum_{k=0}^{k-1} (sgn(d_{i,j}) b_{k,i,j} 2^k), b_{k,i,j} \in \{0, 1\}$$

where K is the number of binary significant bits of element  $d_{i,j}$ ,  $b_{k,i,j}$  denotes the kth bit of  $d_{i,j}$ , and sgn(\*) is the sign indication function that returns 1 for the positive value and -1 for the negative value. Thus above equation can also be rewritten as

$$di,j = \sum_{k=0}^{k-1} (b_{k,i,j} 2^k), b_{k,i,j} \in \{0, sgn(d_{i,j})\}$$

Above equation is the signed integer binarization.



#### **International Journal of Research**

Available at <a href="https://pen2print.org/index.php/ijr/">https://pen2print.org/index.php/ijr/</a>

e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue 20 September 2018



Fig 1. Hierarchical structure of signed bit transform (SBT)

Applying the signed bit transform (SBT) algorithm to the transform architecture, instead of the integer transform matrix circuits, the SBT matrix circuits are implemented and the input data are transformed with each SBT matrix circuit, respectively. Due to the simple elements of SBT matrices, the bit widths of intermediate transformed data and output data are significantly reduced.

Taking the  $32 \times 32$  1-D integer transform as an example, the increasing bit width of output data is only 5 b with the SBT algorithm, compared with the 11-b increasing of the straightforward integer transform. The bit widths of SBT increase slowly as the intermediate data are processed stage by stage, which shortens the circuit delay and constrains the clock cycle to be smaller. Although the delay of the integer transform circuit is reduced based on the bit transform algorithm, more adders are required due to more SBTs.

However, the bit widths of adders used in SBT are also so low that the addition operation is also very fast. Additionally, It can be observed that many zero elements are in the SBT matrix. The number of actually required addition operations is seldom due to the sparse SBT matrix the of according to rule matrix multiplication. The sparse characteristic of the SBT matrices can benefit for reducing the addition operations in the transform process.

#### III. PROPOSED SYSTEM

In signal processing data compression, source coding, or bit-rate reduction involves encoding information by utilizing fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. In lossless compression, there loss of information. Lossy no compression reduces bits by removing unnecessary less important or information. The process of reducing the size of a data file is referred to as data compression. In the context of data transmission, it is called source coding (encoding done at the source of the data before it is stored or transmitted) in opposition to channel coding.

Compression is useful because it reduces resources required to store and transmit Computational resources consumed in the compression process and, usually, in the reversal of the process (decompression). Data compression is subject to a space-time complexity tradeoff. For instance, a compression scheme require for video may expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (when using lossy data compression), and the computational resources required to compress decompress the data.

By exploitation of Loss less compression, the user won't loss any information or data from the image, picture or video. The

## R

#### **International Journal of Research**

Available at <a href="https://pen2print.org/index.php/ijr/">https://pen2print.org/index.php/ijr/</a>

e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue 20 September 2018

image quality also won't be improved by the Loss less compression. The process of Loss less compression is as follows: At initial stage the information of the image are remodeled into binary forms i.e, zero and one format (0,1). This binary information can splits into rows and columns. This can be mentioned as binarisation. During this method the binary digits can form like bits in a very sequence. These bits can forward to down as the regular bits.



Fig 2. Proposed System

These bits are ready to merge. Currently the information are united because the initial digit of the primary row with the primary digit of the primary column. During this method all the rows and columns are united. Then the binary data is prepared to compress. As we all know that, we are utilizing Loss less technique here to change the image. The image are reworked into binary information here just in case of Loss less compression. This can be very helpful Technique to achieve the precise image as we did like. In this technique, initially the binary data will be upgraded

into black and white format as we tend to shown with in the fig 3.

Then the image are going to be transmitted into binary knowledge i.e, zero and one format as we tend to shown in figure. Here we tend to square measure victimization 64 bit compression that is extremely helpful to the rework. Finally, compression can send the original data as output. The ultimate output comes with none loss of information within the image, as a result of we tend to used Loss less compression technique. For this point Loss compression less extremely is advantageous and really technique than Lossy compression technique.

A 1-bit Razor flip-flop consists of a main flip-flop, shadow latch, XOR gate, and mux. The shadow latch catches the execution result using a delayed clock signal, which is slower than the normal clock signal and the main flip-flop catches the execution result for the combination circuit using a normal clock signal. The path delay of the current operation exceeds the cycle period, and the main flip-flop catches an incorrect result if the latched bit of the shadow latch is different from that of the main flip-flop. To notify the system the Razor flip-flop will set the error signal to 1 to re execute the operation if any errors occur. To detect whether operation is considered to be a one-cycle pattern can really finish in a cycle we utilize Razor flip-flops. If not, the operation is reexecuted with two cycles. Although the reexecution may seem costly, due to the reexecution frequency is low then overall cost is low.

#### IV. RESULTS

Available online: <a href="https://pen2print.org/index.php/ijr/">https://pen2print.org/index.php/ijr/</a>
P a g e | 591



#### International Journal of Research

Available at <a href="https://pen2print.org/index.php/ijr/">https://pen2print.org/index.php/ijr/</a>

e-ISSN: 2348-6848 p-ISSN: 2348-795X Volume 05 Issue 20 September 2018



Fig 3. RTL Schematic

| ROW_BAND_HVC Project Status |                                                                                               |                                 |  |  |  |  |  |
|-----------------------------|-----------------------------------------------------------------------------------------------|---------------------------------|--|--|--|--|--|
| B-DW/WAYLsise               | Farser Irrara                                                                                 | PARTITORS                       |  |  |  |  |  |
| BON_BASED_HEVC              | Implementation State:                                                                         | Synthesismi                     |  |  |  |  |  |
| wolson+200g2+4              | - Errore                                                                                      | Piliffrort                      |  |  |  |  |  |
| INE 14.7                    | + Warmings:                                                                                   | 26100emoss (261 hest)           |  |  |  |  |  |
| Dalarced                    | - Routing Results:                                                                            |                                 |  |  |  |  |  |
| No. Defeat Lended           | « Timing Constraints:                                                                         |                                 |  |  |  |  |  |
| Date: Serves                | + Final Timing Score:                                                                         |                                 |  |  |  |  |  |
|                             | BHCHAVAKAMA size BICHS (SASSE) (MEYO) voidel+ Dog 2+1 BIC 14.7 Balanced Sixo Default Lenkshed | 0-0407/6544-600   Farine French |  |  |  |  |  |

| Detailed Reports              |         |                           |        |                      |                    |
|-------------------------------|---------|---------------------------|--------|----------------------|--------------------|
| Report Name                   | Status  | Generated                 | Erence | Warnings             | Erdos              |
| System Report                 | Current | Set 16. Net 12:45:39.0018 | 0      | 281 Warrens CR1 rend | 200 Info (200 res) |
| Tensioner Report              |         |                           |        | 100                  |                    |
| Map Report                    |         |                           |        |                      |                    |
| Plack and Route Report        |         |                           |        |                      |                    |
| Power Report                  |         |                           |        |                      |                    |
| Sent Date Steel, Steel Broads |         |                           |        |                      |                    |

Fig 5. Report

#### V. CONCLUSION

The emerging HEVC standard has been developed standardized and collaboratively by using the VLSI architecture. A fast integer transform VLSI architecture-based sparse signed transform (SBT) is proposed for real-time ultra HD video coding conforming to the HEVC standard. The integer transform matrix with high bit width is decomposed into several low bit width matrices based on matrix decomposition method. The circuit reuse strategy is used of SBT matrices to reduce number of adders in VLSI architecture. The proposed transform hardware architecture can process video data with higher speed and proper area compared with previous work.

#### VI. REFERENCES

- [1] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the H.264/AVC video coding standard," IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.
- [2] G. Sullivan, J.-R. Ohm, W.-J. Han, and T.Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
- [3] A. D. Darji and R. P. Makwana, "High-performance multiplierless DCT architecture for HEVC," in Proc. IEEE Int. Symp. VLSI Design Test, Jun. 2015, pp. 1–5.
- [4] T. Stockhammer, M. M. Hannuksela, and T. Wiegand, "H.264/AVC in wireless environments," IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 657–673, July 2003.
- [5] S. Wenger, "H.264/AVC over IP," IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 645–656, July 2003.
- [6] J. Ribas-Corbera, P. A. Chou, and S. Regunathan, "A generalized hypothetical reference decoder for H.264/AVC," IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 674–687, July 2003.
- [7] B. Bross, W.-J. Han, G. J. Sullivan, J.-R. Ohm, and T. Wiegand, High Efficiency Video Coding (HEVC) Text Specification Draft 9, document JCTVC-K1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC), Oct. 2012.
- [8] T. Stockhammer, M. M. Hannuksela, and T.Wiegand, "H.264/AVC in wireless environments," IEEE Trans. Circuits Syst. Video Technol., vol.13, no. 7, pp. 657–673, Jul. 2003.
- [9] A. Ahmed, M. U. Shahid, and A. Rehman, "N Point DCT VLSI Architecture for Emerging HEVC Standard," in Proc. VLSI Design, vol. 2012, Article 752024, pp. 1–13, 2012.
- [10] A. Fuldseth, G. Bjøntegaard, M. Budagavi, and V. Sze. (2011, Nov.) JCTVC-G495, CE10: Core Transform Design for HEVC: Proposal for Current HEVC Transform.