Abstract
Background: Fast Fourier Transform (FFT) operation is one of the most important fundamental operations in the digital signal processing systems. In this paper, a new pipelined Radix-8 based Fast Fourier Transformation (FFT) architecture is designed for performing frequency transformation techniques. The objective of this paper is to improve the speed and to reduce the area, delay and power. Methods: Radix-8 FFT, which is used to improve the speed of functioning by reducing the computational path. In the proposed new architecture named as “Radix-8 combined SDF-SDC FFT”. In this architecture, the numbers of stages are reduced to 75%. The first two stages are designed using single-path-delay Commutator. The combined SDF-SDC FFT is to increasing the processing speed of architecture. In the proposed “pipelined Radix-8 combined SDF-SDC FFT” is designed in this article. Findings: In the existing Radix-8 SDF FFT, the number of stages is increased. In order to reduce the number of stages, Radix-8 combined SDF-SDC FFT has been proposed. The performance evaluation of Radix-8 combined SDF-SDC FFT architecture is determined through Very Large Scale Integration (VLSI) system design environment. In the VLSI system design, less area utilization, low power consumption and high speed are the main parameters. Hence, the main goal of proposed architecture is to reduce hardware complexity, power consumption and increasing both speed and throughput of the system. Applications: Mobile Ad-hoc Network (MANET), Orthogonal Frequency Division Multiplexing (OFDM) System, Telecommunication networking system are considered as the important applications of proposed radix-8 combined SDF-SDC FFT.

Keywords: Fast Fourier Transform (FFT), Single-path Delay Commutator (SDC) FFT, Single-path Delay Feedback (SDF) FFT, Very Large Scale Integration (VLSI)

1. Introduction
A Fast Fourier Transform (FFT) is a fundamental component of many multiple Digital Signal Processing (DSP) systems. The FFT plays an important role in different fields such as communication systems, biomedical applications, sensor and radar signal processing. The FFT processor is one of the high computational complexity modules in the physical layer of Orthogonal Frequency Division Multiplexing (OFDM). Various FFT processors have been proposed to meet the real-time processing requirements and to reduce hardware complexity, which are especially important for high throughput applications.

The pipelined FFT architectures can achieve a high throughput and low latency which are suitable for real-time applications. These pipelined FFT architectures can be classified as a Single path delay feedback (SDF) and Multi-path delay Commutator (MDC), according to the dataflow scheme. To improve hardware efficiency, the SDF architecture is used to share the same delay elements between butterfly inputs and outputs, but the architecture operates at a low throughput due to the single path. The MDC architecture is used to send correct data sequence into butterfly elements by using switches (commutators). In some real-time applications such as OFDM or Ultra-Wide Band (UWB) systems, where high throughput is
a requirement, it is important to be able to process the input samples in parallel.

FFT architectures can be roughly divided into two categories: memory-based and pipelined architectures. Memory-based architectures usually consist of a butterfly unit and certain number of memory blocks for providing low-cost designs. A pipelined architecture consists of multiple stages to provide higher throughput at the cost of more hardware. In a low power FFT architecture had been proposed for WLAN application, but it uses Radix-8 FFT structure. Radix-8 FFT is only suitable for input sequence lengths which are a power of eight. In a variable length low power FFT architecture is proposed using Radix-8 algorithm for an OFDM system.

In this paper, a proposed pipelined Radix-8 SDF-SDC FFT is designed to improve the performances of the FFT processor. The frequency transformation architecture which has combined advantages like less area utilization, high speed and low power consumption for 3G and 4G based wireless communication signals. In these architecture is developed for reducing the complexity in terms of number of stages, hardware slices and LUTs of the FFT processors.

2. Related Works

In 1 explained the efficient implementation of Radix-8 FFT algorithm. They addressed an efficient approach to implement the high-radix butterfly element. In this approach, they use pipelining technics to cascade the butterfly element and thus the silicon area for high radix butterfly is significantly reduced. This approach takes the both advantages of high throughput and small area. Therefore, the high-radix FFT processor can be efficiently realized using the proposed high-radix butterfly element.

In 2 described the implementation of 64-point FFT/IFFT by using Radix-8 algorithm. The efficient implementation of FFT/IFFT processor for OFDM application is presented. The processor can be used in various OFDM-based communication systems, such as worldwide interoperability for microwave access (Wi-Max), Digital Audio Broadcasting (DAB), Digital video Broadcasting-Terrestrial (DVB-T). They adopt single-path delay feedback architecture to eliminate the Read only Memories (ROM’s) used to store the twiddle factors, the proposed architecture applies a reconfigurable complex multiplier to achieve a ROM-less FFT/IFFT processor and to reduce the truncation error.

In 3 explained a new radix-2/8 FFT algorithm for Length-\(q \times 2^m\) DFTs. It is based on the use of a mixture of radix-2 and radix-8 index maps. The numbers of arithmetic operations are required in the radix-2/8 FFT algorithm. The operations such as data transfer, address generation, and twiddle factor evaluation or access to the lookup table, which also contribute significantly to the execution time of the algorithms, are all substantially reduced in the algorithm. The algorithm has been expressed in a simple matrix form, which facilitates an easy implementation of the algorithm and allows for an extension to the multidimensional case.

In 4 described word length estimation for memory efficient pipeline FFT/IFFT processors. They introduced a method to minimize the memory requirement in the pipeline FFT processor. This method aims at reducing the power consumption. With the variable data and coefficient word length, they can determine the best trade-off for the memory size and other process elements. This method allow designer to control the estimation process for dedicated approaches and applied to the pipeline architectures in order to lower memory requirement.

3. Radix-8 FFT Architecture

Radix-8 FFT algorithm which was surveyed to improve the speed of functioning by reducing the computation, it can be obtained by changing to base to 8. For a same number if base increases the power will reduce. FFT algorithms using higher radix can be designed by decomposition of the frequency domain samples into more groups at the cost of more complicated control. A radix-8 butterfly can also be realized by cascading three radix-2 stages, which is called radix-23 algorithm. Radix-\(r\) FFT can easily derived from DFT by decomposing the N point DFT into a set of recursively related \(r\)-point transform and \(x(n)\) is power of \(r\). In Radix-8 algorithm the \(r\) is 8. The DIT Radix-8 FFT recursively partitions a DFT into eight quarter-length DFTs of groups of every eighth sample. The outputs of these shorter FFTs are reused to compute many outputs, which greatly reduce the total computational cost. The Radix-8 Decimation-In-Time and Decimation-In-Frequency Fast Fourier Transform (FFTs) gain their speed by reusing the results of smaller, intermediate computations to compute multiple DFT frequency outputs. Figure 1 shows the basic structure of Radix-8 butterfly unit. A radix-8 FFT algorithm can be constructed by following the same techniques used to convert the classical radix-2 and radix-4 algorithms.
In Figure 2 shows the signal flow graph for Radix-8 butterfly unit. The butterfly unit is designed to perform basically the radix-8 DIF FFT algorithm and also it can compute radix-4 or radix-2 DIF FFT algorithm. It computes the radix-8 FFT in all computational stage.


Single-path Delay Feedback FFT architectures have the most efficient memory utilization for pipelined FFT processors. The pipeline architecture utilizes the different arrival time of data input and processed data. The first half data input is saved in memory, so the delayed input processed with the remaining input in butterfly unit. The output data fed back to input of butterfly unit through buffer for further processing. Architecture of Radix-8 Single path delay feedback (R8-SDF) FFT is shown in Figure 3. Single path delay feedback uses the registers more efficiently by storing one output of each butterfly in feedback shift registers.

Single path delay Commutator improves the utilization of the butterfly elements by modifying the butterfly elements. However, increases the memory requirement. The number of words which are required to be stored is 3N/2, 3N/8…6. Commutator is used to convert the one form of signal to another form of signal. SDC has also reached the minimum requirement for both multiplier and adder. SDC architecture where the programmable complex multiplier and the memory for storing the twiddle factors are allocated in every column. Architecture of single path delay Commutator is shown in Figure 4.

5. Proposed Radix-8 Combined SDF-SDC FFT Architecture

In this paper, architecture of Radix-8 combined Single path Delay Feedback –Single path Delay Commutator (Radix-8 SDF-SDC) FFT is designed to reduce the hardware utilization, power consumption and to improve the speed of the processors Compared to radix-2 and radix-4 FFT, Radix-8 FFT has reduced the number of stages and also gives the better performance of the architecture. SDC FFT has more number of single delay commutators within one stage. In these proposed architecture, both SDC and SDF architectures are presented for improving the architectural performances in terms of VLSI main concerns.
The numbers of stages are reduced in this method. In Figure 5, shows the 8 point radix-8 combined SDF-SDC FFT. In the first stage of FFT, Single path Delay Feedback is used and the second and third stages are designed by using multipath delay Commutator. In this paper, an efficient variable-length radix-8 FFT architectures for OFDM systems. When compared to SDF structure, SDC architecture has more computational paths to perform FFT function. Delay Feedback and Delay Commutator are combined to provide FFT calculation. The disadvantage of combined SDC-SDF FFT structure is large number of delay path to produce the frequency transformation signals at appropriate clock periods.

The 8-point Radix-8 Combined SDF-SDC FFT architecture is done in only one stage. In Normal Radix-2 and Radix-4 FFT are done in more number of stages. Therefore, the number of stages and computational paths are reduced in the proposed method. The Radix-8 single path delay feedback FFT is done in the existing method. Compared to existing method, the proposed method is greatly reducing the area and power consumption.

6. Results and Discussion

Radix-8 SDF-SDC FFT architecture is designed and simulated by using Modelsim 6.3C tool. The simulation result of 8-point Radix-8 SDF-SDC FFT is shown in Figure 6. The synthesis result has been validated by using Xilinx 10.1i (Package: pq208, Family: Spartan-3, Device: Xc3s200) design tool. The performance of proposed method has been improved in terms of less silicon area utilization, high speed and lower power consumption than the existing method.

<table>
<thead>
<tr>
<th>Types/VLSI concerns</th>
<th>Number of Occupied Slices</th>
<th>Total Number of LUTs</th>
<th>Power (W)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Existing Method</td>
<td>333</td>
<td>574</td>
<td>0.717</td>
</tr>
<tr>
<td>Proposed Method</td>
<td>324</td>
<td>535</td>
<td>0.691</td>
</tr>
<tr>
<td>Percentage Reduction%</td>
<td>2.7%</td>
<td>6.79%</td>
<td>3.62%</td>
</tr>
</tbody>
</table>
The performance evaluation of existing Radix-8 Single path Delay Feedback (SDF) FFT and proposed Radix-8 combined Single path Delay Feedback –Single path delay Commutator (SDF-SDC) are analyzed and compared in Table 1. And also performance evaluations are graphically illustrated in Figure 7. The proposed method offers 2.7% reduction in Slices, 6.79% reduction in LUTs and 3.62% reduction in power consumption. When compared to existing method, the proposed radix-8 combined SDF-SDC FFT gives better performance of the architecture.

7. Conclusion

We designed an efficient approach to perform the high-radix butterfly elements. The Radix-8 combined SDF-SDC FFT is proposed in this paper. In this method, we use pipelining techniques to cascade the butterfly elements and thus the silicon area for high radix butterfly is significantly reduced. The advantages of this method are high throughput and small area. Therefore, the proposed method offers 2.7% reduction in slices, 6.79% reduction in LUTs and 3.62% reduction in power consumption. This type of FFT architectures can be used in major of the communication systems like LTE, wireless modems. When compared with existing method, the power and area of this proposed architecture is less. In Future, the Radix-8 FFT architecture can be designed for real valued signals and the higher point FFT processor can be used for analysis of different algorithms with pipelined structures.

8. References