

# Design and Analysis Vlsi Architecture For Montgomery Modular Multiplication

S .Geetha Rani

M.Tech[ES], Rajeev Gandhi Memorial College Of Engineering And Technology Nerawada 'X' Roads, Kurnool District, Nandyal, Andhra Pradesh Email:- geetharani419@gmail.com

#### Abstract—

This task goes for using the modular multiplication (MM) with the large integers is the most basic and tedious operation. Montgomery multiplication is a quick calculation to figure the Montgomery item, changing the outcome out of Montgomery from yields the established modular item abdominal muscle mod N. the proposed multiplier gets and yield the data with binary portrayal and uses just a single level carry save adder (CSA) to stay away from the carry proliferation at every expansion operation. This CSA is additionally used to perform operand pre calculation and configuration change from the carry save organization to the binary portrayal, prompting a low equipment cost and delicate basic way delay to the detriment of additional clock cycles for finishing one modular multiplication. Another SCS based Montgomery MM calculation to decrease the basic way postponement of Montgomery multiplier. The multiplier utilized one level CCSA engineering and avoided the pointless carry save expansion operations to largely decrease the basic way delay and required clock cycle for finishing one MM operation. Montgomery modular multiplier can accomplish execution and critical power defer item change when contrasted and past plan.

Index Terms— Carry-save addition, lowcost architecture,Montgomery modular multiplier, public-key cryptosystem.

# **I.INTRODUCTION**

In numerous public-key crypto systems, modular multiplication (MM) with large integers is the most basic and tedious operation. Subsequently, various algorithms and equipment usage have been displayed to carry out the MM all the more rapidly, and Montgomery's calculation is a standout amongst the most surely understood MM algorithms. Montgomery's calculation decides the remainder just relying upon the minimum critical digit of operands and replaces the entangled division in customary MM with a progression of moving modular increments to create  $S=A \times B \times R-1 \pmod{N}$ , where N is the k-bit modulus, R-1 is the backwards of R modulo N ,and  $R = 2k \mod R$ N. Subsequently, it can be effectively actualized into VLSI circuits to accelerate the encryption/decryption process.

Three-operand expansion in the cycle circle of Montgomery's calculation requires long carry engendering for large operands in binary portrayal. To tackle this issue, a few methodologies in light of carry-save expansion were proposed to accomplish a huge speedup of Montgomery MM. In view of the portrayal of information and yield operands, these methodologies can be generally partitioned into semi-carry-save (SCS) system and full carry-save (FCS) procedure.



Shamir et al. [1] have proposed a vitality proficient FCS-based multiplier (signified as FCS-MMM42 multiplier) in which the unnecessary operations of the four-to (twolevel) CSA engineering are stifled to lessen the vitality dissemination and improve the throughput. Be that as it may, the FCS-MMM42 multiplier still experiences the high territory intricacy and long basic way delay. Different procedures, for example, parallelization, high-radix calculation, and systolic exhibit plan [2]- [6], can be joined with the CSA design to additionally upgrade the execution of Montgomery multipliers. In any case, these methods likely reason a large increment in equipment multifaceted nature and power/vitality scattering, which is unfortunate for compact systems with compelled assets.

Another technique is proposed SCS-based Montgomery MM calculation to lessen the basic way deferral of Montgomery multiplier. Furthermore, the downside of more clock cycles for finishing one multiplication is additionally enhanced while keeping up the benefits of short basic way deferral and low equipment intricacy.

Literature Review

Low power equipment outline for Montgomery Modular multiplication This paper depicts the plan and

usage of low power modular multiplier of RSA and equalizations its zone and speed. By making strides Montgomery modular multiplication algorithm, upgrading basic way and utilizing a few low power techniques, this paper accomplishes low power and high speed execution.

The plan is executed utilizing SMIC 0.13um CMOS process, the normal power utilization is 106uW at 13.56MHZ when executing 1024-piece operations, the region

is around 0.17mm2and the time to complete modular multiplication are 1412 clock cycles, such amazing property make it reasonable for RSA operation.

Novel Techniques for Montgomery Modular Multiplication Algorithms for Public Key Cryptosystems.

Augmentation of Montgomery multiplication algorithms in GF(p) are considered and broke down. The time what's more, space necessities of different cutting edge algorithms are exhibited. We propose Modified Montgomery Modular Multiplication Algorithms that diminishes the quantity of computational operations such as number of increments, memory peruses and composes engaged with the current algorithms, in this manner, sparing impressive time and zone for execution. Numerous plan cases has been tackled to demonstrate the hypothetical accuracy of proposed algorithms. Many-sided the quality investigation demonstrates that Modified

Coarsely Integrated Scanning (MCIOS) expend less space and time contrasted with other changed Montgomery Algorithms. To check the intelligent accuracy, the proposed MCIOS algorithm was executed in Xilinx Spartan3E FPGA. The aggregate memory for execution of 64 – bit operand is 135484 KB for MCIOS and 140496 KB for existing Coarsely Integrated Scanning (CIOS)

technique. The proposed algorithm can be changed to be reasonable for any selfassertive Galois field measure with nearly nothing changes.

Likewise the proposed algorithm can be created as engineering appropriate for



System on Chip (SoC) usage of Elliptic bend cryptosystem.

Along these lines, the system can be created as a 3D chip

# A. Montgomery Multiplication

The Montgomery modular item S of An and B can be gotten as  $S = A \times B \times R-1$  (mod N),where R-1 is the converse of R modulo N .That is,  $R \times R-1 = 1 \pmod{N}$ . Note that, the documentation Xi the ith bit of X in binary portrayal. What's more, the documentation Xi:j demonstrates a section of X from the ith bit to jth bit

#### B. SCS-Based Montgomery Multiplication

To keep away from the long carry proliferation, the halfway outcome S of moving modular expansion can be kept in the carry save portrayal. Note that the quantity of emphasess in Fig. 1 has been changed from k to k+2 to evacuate the last correlation and subtraction. Be that as it may, the configuration change from the carry-save arrangement of the last modular item into its binary organization is required. Figure1 demonstrates the design of SCSbased MM calculation proposed in [5] (meant as SCS-MM-1 multiplier) made out of one two-level CSA engineering and one arrangement converter, where the dashed line indicates a 1-bit flag. In [5], a 32-bit CPA with multiplexers and registers (meant as CPA FC), which includes two 32-bit inputs and produces a 32-bit yield at each clock cycle, was embraced for the configuration transformation. In this way, the 32-bit CPA FC will take 32 clock cycles to finish the organization transformation of a 1024-piece SCS-based Montgomery multiplication. The additional CPA FC likely enlarges the region and the basic way of the SCS-MM-1 multiplier.

Note that the select signs of multiplexers M1 and M2 in Figure 2 created by the control part are not appeared in Fig. 4 for straightforwardness. Be that as it may, the additional clock cycles for organize change are reliant on the longest carry spread chain in SS [k+2]+SC[k+2] and about k/2 check cycles are required in the most pessimistic scenario since two-level CSA design.

# I. Existed Montgomery Multiplication

Another SCS-based Montgomery MM algorithm is utilized to diminish the basic way postponement of Montgomery multiplier. Furthermore, the downside of more clock cycles for finishing one multiplication is likewise enhanced while keeping up the upsides of short basic way deferral and low equipment multifaceted nature.

# A. Basic Path Delay Reduction

The basic way postponement of SCS-based multiplier can be decreased by consolidating the upsides of FCS-MM-2 and SCS MM-2. That is, we can precompute D =B+N and reuse the one-level CSA engineering to perform B+N and the transformation. arrangement Figure 1 the adjusted SCS-based demonstrates Montgomery multiplication (MSCS-MM) algorithm and one conceivable equipment design, individually.

The Zero\_D circuit in Figure 4 is utilized to recognize whether SC is equivalent to zero, which can be refined utilizing one NOR operation. The Q-L. The carry spread expansion operations of B+N and the configuration change are performed by the one-level CSA engineering of the MSCS-MM multiplier through over and again executing the carry-save expansion (SS,SC) = SS+SC+0 until SC =0.



What's more, we likewise precompute Ai and qi in cycle i–1 so they can be utilized to instantly choose the coveted info operand from 0, N,B, and D through the multiplexer M3 in emphasis I. Subsequently, the basic way deferral of the MSCS-MM multiplier can be lessened into T MUX4+TFA.

B. Clock Cycle Number Reduction

To diminish the clock cycle number, a CCSA engineering which can perform one three-input carry-save expansion or two serial two-input carry-save augmentations is proposed to substitute for the one-level CSA design.



Figure 1:Modified SCS-based MM multiplier

On the bases of basic way defer clock diminishment, cycle number lessening, and remainder pre-calculation said over, another SCS-based Montgomery algorithm SCS-MM-New MM (i.e., algorithm appeared in Figure 4) utilizing one-level CCSA engineering is proposed to fundamentally decrease the required clock cycles for finishing one MM. As appeared in SCS-MM-New algorithm, stages 1-5 for creating ^ B and ^D are first performed. Note that since qi + 1 and qi + 2 must be produced in the ith cycle, the iterative list I of Montgomery MM will begin from -1 rather than 0 and the relating starting estimations of  $\hat{q}$  and  $\hat{A}$  must be set to 0.

The equipment design of SCS-MM-New algorithm, indicated as SCS-MM-New multiplier, are appeared in Figure 2, which comprises of one-level CCSA engineering, two 4-to-1 multiplexers (i.e., M1 and M2), one improved multiplier SM3, one skip indicator Skip D, one zero identifier Zero D, and six registers. Skip D is created to produce skip i+1, ^q, and ^A in the ith cycle. Both M4 and M5 in Figure 6 are 3-bit 2-to-1 multiplexers and they are substantially littler than k-bit multiplexers M1, M2, and SM3.



Figure 2: SCS-MM-New multiplier II. Proposed Montgomery Multiplication This expand the existing sytemby supplanting the CSA adder obstruct with cmos full adder in fig 3





Fig 3:Cmos Full Adder

The circuit of 14T adder is a One piece full adder cell is made of seven cmos inverters that are associated as appeared in fig 3. The proposed full adder is demonstrated to have the base power utilization and less power defer item.



Fig 4: cmos full adder used in MSCS-MM

The proposed design of Montgomery Modular Multiplication utilizing Cmos full adder, which comprises of one-level cmos full adder engineering, two 4-to-1 multiplexers (M1 and M2) one disentangled multiplier SM3, one skip finder Skip\_D, one zero indicator Zero\_D, and six registers. Zero locator Zero\_D is utilized to distinguish SC is equivalent to zero. The Skip\_D is made out of four XOR doors, three AND entryways, one NOR entryway, and two 2-to-1 multiplexers the skip indicator is utilized to recognize the pointless multiplication operations



Fig 5: SCS-MM New Multiplier with cmos full adder

# IV. EXPERIMENTAL RESULTS



Fig: 4-bit Modified SCS-based MM





Fig: 4-bit SCS-MM-New multiplier

| Montgomery<br>Multiplica-<br>tion | Power  | Delay | PDP      |
|-----------------------------------|--------|-------|----------|
| 1bit Mscs-                        | 18.23u | 110.6 | 2.016238 |
| mm                                | W      | E-12  | E-15     |
| 1bit new                          | 37.81u | 60.79 | 2.298469 |
| Scs-mm                            | W      | E-12  | 9E-15    |
| 4bit Mscs-                        | 219.3u | 77.79 | 1.70593E |
| mm                                | W      | E-12  | -14      |
| 4bit new                          | 124.4u | 142.6 | 1.76648E |
| scs-mm                            | W      | E-12  | -14      |
| Proposed 1bit                     | 19.16u | 37.84 | 0.725014 |
| Mscs-mm                           | W      | E-12  | 4E-15    |
| Proposed 1bit                     | 53.50u | 45.24 | 2.22034E |
| new scs-mm                        | W      | E-12  | -15      |

| COMPARISION | TARIF |
|-------------|-------|
|             | INDLL |

#### CONCLUSION

In the proposed strategy these difficulties are handled by presenting SSC MM new multiplier. And furthermore а straightforward and effective Montgomery multiplication algorithm to such an extent the minimal effort and that elite Montgomery modular multiplier can be executed. To lessen the additional clock cycles a configurable CSA (CCSA), which could be one full-adder or two serial halfadders is utilized. What's more, a system that can recognize and avoid the superfluous carry-save expansion operations in the onelevel CCSA design while keeping up the short basic way delay is created.

To additionally confirm the proficiency of the proposed outline, we blended those Montgomery modular multipliers SCS are utilized cmos based full adder diminish the power and postponement.

#### REFERENCES

[1] S.-R. Kuang, J.-P. Wang, K.-C. Chang, and H.-W. Hsu, "Energy-efficient highthroughput Montgomery modular multipliers for RSA cryptosystems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 11, pp. 1999–2009, Nov. 2013.

[2] J. Han, S. Wang, W. Huang, Z. Yu, and X. Zeng, "Parallelization of radix-2 Montgomery multiplication on multicore platform," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 12, pp. 2325–2330,

Dec. 2013.

[3] J. C. Neto, A. F. Tenca, and W. V. Ruggiero, "A parallel k-partition method to perform Montgomery multiplication," in

Proc. IEEE Int. Conf. Appl.-Specific Syst., Archit., Processors

, Sep. 2011, pp. 251–254.

[4] S.-H. Wang, W.-C. Lin, J.-H. Ye, and M.-D. Shieh, "Fast 5scalable radix-4 Montgomery modular multiplier," in Proc. IEEE Int. Symp. Circuits Syst., May 2012, pp. 3049–3052.

[5] A. Miyamoto, N. Homma, T. A oki, and A. Satoh, "Systematic design of RSA processors based on high-radix



Montgomery multipliers," IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol.
19, no. 7, pp. 1136–1146, Jul. 2011.
[6] G. Perin, D. G. Mesquita, F. L.
Herrmann, and J. B. Martins, "Montgomery modular multiplication on reconfigurable hardware: Fully systolic array vs parallel implementation," in Proc. 6th Southern Program. Logic Conf., Mar. 2010, pp. 61–

66. [7] P. Amberg, N. Pinckney, and D. M. Harris, "Parallel high-radix Montgomery multipliers," in Proc. 42nd Asilomar Conf. Signals, Syst., Comput., Oct. 2008, pp. 772–776.

[8] Y.-Y. Zhang, Z. Li, L. Yang, and S.-W. Zhang, "An efficient CSA architecture for Montgomery modular multiplication," Microprocessors

Microsyst., vol. 31, no. 7, pp. 456–459, Nov. 2007.

[9] H. Zhengbing, R. M. Al Shboul, and V.P. Shirochin, "An efficient architecture of 1024-bits crypto processor for RSA cryptosystem based on modified Montgomery's algorithm," in

Proc. 4th IEEE Int. Workshop Intell. Data Acquisition Adv. Comput. Syst., Sep. 2007, pp. 643–646.