Total views : 217

Design of Out-of-Order Floating-Point Unit


  • Department of ECE, SRM University, Chennai-603203, Tamil Nadu, India


Objective: Field Programmable Gate Arrays (FPGAs) are often used to accelerate hardware systems by implementing algorithms on hardware. This paper presents the design and implementation of a fully pipelined single-precision Floating-Point Unit (FPU) on a Spartan-6 FPGA chip. Methods: This paper presents a high-speed, modular design for improving the performance of such applications. While the proposed design is capable of performing basic arithmetic operations and square-root extraction, its modularity enables designers to add more functionality easily; or remove modules that they deem unnecessary for a particular application. Findings: The investigation shows that the adder and multiplier modules can be clocked at over 300 MHz and the top-module at over 200 MHz High operating frequencies were achieved by pre-computing possible values in earlier pipelining stages, then correcting results in later pipelining stages. It was also found that splitting longer operations in the critical path is a better alternative than processing the whole operation at once. Limiting “Max_Fanout”, an attribute provided by Xilinx XST tool, proved valuable in reducing delays on overloaded nets. Applications: This FPU would be a worthwhile addition as a floating-point extension in fixed-point processors for applications such as spectrum analyzers, 3D graphics, and audio processing units.


DSP48A1Multiplier, FPU, FPGA, High-speed Pipeline, Out-of-order Processing, Non-restoring Algorithm, Spartan-6, Single-Precision

Full Text:

 |  (PDF views: 216)


  • IEEE Standard for Floating-Point arithmetic. IEEE Xplore Digital Library; 2008 Aug 29. p.1–70.
  • Patil V, Raveendran A, Sobha PM, Selvakumar AD, Vivian D. Out of order floating point coprocessor for RISCVISA. 19th International Symposium on VLSI Design and Test(VDAT), Ahmedabad: India; 2015. p .1–7.
  • Xilinx Spartan-6DSP48A1SliceUserGuide [Internet]. [Cited 2016 Feb]. Available from:
  • Li Y, Chu W. Anewnon-restoring square root algorithmandits VLSI implementations. International ConferenceonComputerDesign(ICCD96), Austin, Texas: USA; 1996 Oct.p.538–544.
  • Koren I. Computer arithmetic algorithms, 2nd edn. A K Peters Ltd.; 2001.
  • Nesam JJJ, Sathasivam S. An efficient single precision floating point multiplier architecture based on classical recoding algorithm. Indian Journal of Science and Technolog. 2016 Feb; 9(5):1–7.DOI: 10.17485/ijst/2016/v9i5/87159.
  • Shengalet PA, Dahake V, Mahendra M. Single precision floating point ALU. International Research Journal of Engineering and Technology. 2015 May; 2(2):1–4.8. XilinxSpartan-6FamilyOverview [Internet]. [Cited 2016 Feb]. Available from:
  • Gollamudi PS, Kamaraju M. Design of high performance IEEE- 754 single precision (32 bit) floating point adder using VHDL. International Journal of Engineering Research and Technology. 2013 Jul; 2(7).
  • Liddicoat AA, Flynn MJ. High-performance floating-point divide. Proceedings of Euromicro Symposiumon Digital System Design, Warsaw; 2001 Sep. p.354–61.
  • Kwon TJ, Sondeen J, Draper J. Floating-point division and square root implementation using a Taylor-series expansion algorithm. 15th IEEE International Conference on Electronics, Circuits and Systems, ICECS; 2008. p.702–5.
  • Xilinx XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices [Internet]. [Cited 2016 Feb]. Available from:
  • Hassan SK, Monica PR. Floating point high performance low area SFU. Indian Journal of Science and Technology. 2015 Aug; 8(20):1–7.DOI: 10.17485/ijst/2015/v8i20/78367.
  • Ragunath G, Sakthivel R. Low - power and area - efficient square - root carry select adders using Modified XOR Gate. Indian Journal of Science and Technology. 2016 Feb; 9(5):1–8.DOI: 10.17485/ijst/2016/v9i5/87181.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.