Generic Mixed-Radix FFT Pruning
Linkai Wang,Xiaofang Zhou,Member,IEEE,Gerald E.Sobelman,Senior Member,IEEE,and Ran Liu
Abstract—Compared with traditional Fast Fourier Transform (FFT)algorithms,FFT pruning is more computationally efficient in those cases where some of the input values are zero and/or some of the output components are not needed.In this letter,a novel pruning scheme is developed for mixed-radix and high-radix FFT pruning.The proposed approach is applicable over a wide range of FFT lengths and input/output pruning patterns.In addition,it can effectively employ the benefits of high-radix FFT algorithms that have lower computational complexity.
Index Terms—Mixed-radix,fast fourier transform,pruning.
I.I NTRODUCTION
F AST FOURIER TRANSFORM(FFT)algorithms[1],[2]
play a critical role in improving the feasibility of using the Discrete Fourier Transform(DFT)in a wide range of applica-tions.In most FFT algorithms,it is assumed that each compo-nent of the input data vector is involved in the computation and that each component of the output data vector is required.
How-ever,for some applications,such as spectrum sensing in cogni-tive radio[3],modulation and demodulation in Orthogonal Fre-quency Division Multiple Access(OFDMA)systems[4]and DFT-based channel estimation[5],there exist cases where the input vector has many zero values and/or only a portion of the DFT output components are needed.Such situations can pro-vide opportunities for further optimization.
FFT pruning wasfirst proposed in[6]to address these types of situations.FFT pruning can be seen as a modification of a stan-dard FFT[7],in which the zero input values and the unnecessary output components are not involved in the computation.Early FFT pruning algorithms[6]–[10]were highly dependent on the input/output(I/O)pruning patterns.These algorithms benefit from their constrained pruning patterns and the resulting highly regular structure.However,in the case of recent communica-tion applications such as those mentioned above,pattern-ori-ented pruning algorithms lack sufficientflexibility to be opti-mally applied.
To satisfy these emerging new requirements,generic FFT pruning algorithms which are independent of the I/O pruning
Manuscript received October24,2011;revised December31,2011;accepted January09,2012.Date of p
ublication January13,2012;date of current ver-sion January26,2012.This work was supported by the National Natural Sci-ence Foundation of China(60876016),National Science and Technology Major Project of China(2011ZX03003-003-03)and State Key Laboratory of ASIC and System(11GF001and11MS003).The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Jian-Kang Zhang. L.Wang,X.Zhou and R.Liu are with the State Key Laboratory of ASIC and System,Fudan University,Shanghai201203,China(e-mail:wan-glinkai@fudan.edu;xiaofangzhou@fudan.edu;rliu@fudan.edu). G.E.Sobelman is with the Department of Electrical and Computer Engi-neering,University of Minnesota,Minneapolis,MN55455USA(e-mail:so-belman@umn.edu).
Digital Object Identifier10.1109/LSP.2012.2184283pattern have been studied.In[11],a generic FFT pruning algo-rithm wasfirst proposed for radix-2FFT pruning where tags on the nodes in the signalflow graph(SFG)are employed for con-ditional FFT processing.Reference[12]gives a tracing method for acquiring the pruned SFG and presents analytic algorithm complexity results forfixed-radix FFT pruning.Reference[13] uses a novel path demarcation method for radix-2FFTs and uses a path in each butterfly as the basic unit of computation instead of a full butterfly,thereby saving computing operations.Refer-ence[14]adopts a compact encoded table that is mainly com-posed of pr
uned SFG information about phase and group demar-cation,address offsets and butterfly types,which avoids condi-tional operations in FFT processing and obtains computing ef-ficiency on instruction-level parallel architectures.
The computational complexity of a pruned FFT is deter-mined by the computational complexity of the underlying FFT and the computational savings ratio of the pruning.In existing pruning schemes,only radix-2FFT pruning algorithms have been effectively implemented.The main obstacle in adopting higher-radix FFT pruning is decreasing efficiency due to coarser pruning granularity and/or higher intra-butterfly control overhead.In this letter,a new pruning scheme is pro-posed to overcome these limitations.It provides afine pruning granularity with low control overhead for high-radix FFTs by adopting a two-level pruning scheme.In addition,the proposed mixed-radix FFT pruning algorithm can be applied for a wide range of FFT lengths and I/O pruning patterns.
II.P ROBLEM F ORMULATION
A.Generic Mixed-Radix FFT Pruning
An-point DFT computes the following:
(1) In general,is the product of factors:
(2) An-point DFT can be implemented using a computation-ally efficient FFT algorithm[2].The FFT is composed of stages in which the-th stage consists of-point DFTs mul-tiplied by appropriate twiddle factors(TFs).For afixed-radix FFT,the value of is the same in all of the stages,while for a mixed-radix FFT at least one stage uses a different value of than the other stage(s).
When the input vector contains zero-valued elements and/or only some of the components of the output vector are needed,FFT pruning may be used to further improve the computing efficiency.An FFT’s I/O pruning pattern can be
1070-9908/$31.00©2012IEEE
Fig.1.(4,3,2)ORDIF SFG for a standard24-point mixed-radix FFT. defined with a logic matrix,as shown in(3).Each element in denotes whether the corresponding input data component is nonzero and each element in denotes whether the corresponding output data is needed or not.
(3)
B.Generic Mixed-Radix Pruned Signal Flow Graph
A signalflow graph gives a convenient description for FFT algorithms and also provides the basis for t
he FFT pruning techniques.The algorithm presented in this letter is based on the ordered input,digit-reversed output,decimation-in-frequency (ORDIF)FFT algorithm and its corresponding SFG.Fig.1 shows an example of the ORDIF SFG for a standard24-point mixed-radix FFT.Here,the FFT decomposition order is(4,3, 2),and for diagrammatic simplicity the TFs between the stages are not shown.
In Fig.1,at each stage of the SFG the butterflies can be di-vided into subgroups.The number of butterfly groups in the-th stage is
(4) The number of butterflies within each group in the-th stage is:
(5) If the butterfly groups and the butterflies(BFs)within each group are ordered from top to bottom,any given butterfly in the SFG can be denoted as
in which
are the stage,group and butterfly indices,respectively.
FFT pruning is implemented by pruning the corresponding SFG.Using the above SFG as an example,consider a case in which only thefirst half of the input points are nonzero and a comb-type p
attern of output values are required,specifi
cally
Fig.2.Example of a pruned(4,3,2)SFG for a24-point FFT.
the output components0,4,8,12,16,and20.The standard method for pruning the SFG is shown in Fig.2,in which only the boldfaced portion is required for the computations.From this figure,it can be observed that a large portion of the computing operations are eliminated.
Three factors play important roles in optimizing an FFT pruning algorithm:the computational complexity of the under-lying FFT,the pruning ,the computational savings obtained from the pruning)and the pruning granularity.In prior work[11]–[14],the pruning ratio is the primary means of optimization and only pruned radix-2FFTs are implemented. The difficulty of employing higher-radix FFT pruning for lower computational complexity within existing pruning schemes is due to a decrease in the pruning efficiency.This arises from the fact that when a higher-radix FFT is used,the pruning ratio will be decreased because of the coarser granularity or additional control overhead will be needed,either of which will offset the benefit from using a higher-radix.
III.N EW P RUNING A LGORITHM FOR H IGH-R ADIX
AND M IXED-R ADIX FFT S
To solve the problem of decreasing efficiency for higher-radix FFTs in existing pruning schemes,we propose a new generic pruning algorithm that can achieve a lower computational cost without increasing the pruning granularity.Compared with pre-vious radix-2FFT pruning methods,the proposed approach pro-vides increasedflexibility for a wider range of FFT lengths and I/O pruning patterns.
A.Example of the Proposed Pruning Scheme
Our approach uses radix-[15]and similar types of SFGs together with a two-level pruning scheme.Using the pruned 24-point FFT of Section II as an example,we can apply the decomposition scheme of Fig.3.As shown in thefigure,each radix-4butterfly in the outer FFT is decomposed into a two-stage(radix-2,radix-2)FFT and each radix-6butterfly in the outer FFT is decomposed into a two-stage(radix-3,radix2)FFT. Thus,we use the notation((2,2),(3,2))to describe the decom-position scheme of this SFG.Note that while the((2,2),(3, 2))SFG has the same topology as a direct(2,2,3,2)SFG,the
WANG et al.:GENERIC MIXED-RADIX FFT PRUNING 169
TABLE I
FFT P RUNING T EST C ASES T AKEN F ROM A CTUAL A
PPLICATIONS
Fig.3.A ((2,2),(3,2))pruned SFG for the 24-point
FFT.
Fig.4.Normalized computational costs comparison.
former has lower complexity because non-trivial complex mul-tiplications are only needed between the outer FFT stages.The proposed two-level pruning scheme maintains the fine pruning granularity
while employing the lower computing complexity of a higher-radix FFT without introducing intra-butter fly con-trol overhead.
B.Generic Mixed-Radix FFT Pruning
The proposed generic mixed radix FFT pruning is composed of the following four steps:
Step 1:Select the Underlying FFT Algorithm and Decom-position Scheme:An ORDIF FFT is adopted.However,the decomposition scheme for a general mixed-radix FFT is not unique.One reasonable guideline for choosing a higher-radix FFT decomposition scheme is for the higher-radix butter fly to be implemented with no or only a few nontrivial multiplications.Step 2:De fine the Pruning Granularity:A prime-radix full butter fly in the inner FFT is adopted as the basic pruning unit.This keeps the pruning granularity small without introducing any additional intra-butter fly pruning control overhead.
Step 3:Acquire the Pruned SFG:In a generic mixed-radix ORDIF SFG,for each input or output node in the butter fly
,its normal ordered index can be expressed as
(6)
where
and
denote the BF group index,the BF index
within group and the node index within BF ,respectively.The unique values of for any normal ordered index in each stage can be expressed as follows:
(7)
De fine the status of a node as a Boolean value that denotes whether or not a node in the SFG is involved in the compu-tation.The status of all nodes can be obtained from and
through (6),as described below.
For any BF with group index and BF index within group ,the status of each of its output nodes is the logical OR of the status of each of its input nodes.Thus,for the input pruning case,the status of all no
des in the SFG can be obtained from the input status vector .Similarly,for the case of output pruning,the status of all nodes in the SFG can be obtained from the output status vector (a digit-reversed [2]version of ).Finally,the status of nodes for both input and output pruning is obtained by logically ANDing the results for input and output pruning.
For the two-level pruning scheme,the pruned SFG tracing can be directly carried out on the fine-grained SFG.Then,for any BF in the outer FFT,it is involved in the computation if any of its inner BFs are involved.
Step 4:Computation of the Pruned FFT:The calculations required are derived from the pruned SFG and are carried out as follows:
1:for each computationally involved BF in the outer FFT 2:for each computationally involved BF in the inner FFT 3:Perform the BF processing 4:end for
5:Perform the TF MUL for the required inner BF outputs 6:end for
170IEEE SIGNAL PROCESSING LETTERS,VOL.19,NO.3,MARCH2012
IV.C OMPLEXITY C OMPARISON AND A NALYSIS Several FFT pruning examples taken from actual
applications are summarized in Table I and will be used as test cases for evaluating algorithm performance.
Standard radix-2and pruned radix-2FFTs are evaluated, as are our proposed mixed-radix-4/2and mixed-radix-8/4/2 pruned FFTs.The number of complex additions and complex multiplications are normalized to those for a standard radix-2 FFT,as illustrated in Fig.4.
Fig.4shows that compared to radix-2FFT pruning,the pro-posed mixed-radix FFT pruning implementations use a smaller number of complex multiplications while keeping the same number of complex additions.Their computational advantage is due to the fact that the proposed pruning algorithm decreases the number of stages requiring complex multiplications while employing the same pruning granularity as in radix-2FFT pruning.
V.C ONCLUSION
In this letter,we have proposed a novel FFT pruning algo-rithm which is applicable to high-radix and mixed-radix FFTs and which has lower computational complexity than previous approaches.By virtue of its enhancedflexibility,the algorithm can be applied to a wide range of FFT lengths and I/O pruning patterns.The effectiveness of the algorithm has been demon-strated using several practical pruning e
xamples taken from a range of applications of interest.
R EFERENCES
[1]J.W.Cooley and J.W.Tukey,“An algorithm for the machine compu-
tation of complex fourier series,”Math.Comput.,vol.19,pp.297–301, 1965.
[2]R.Singleton,“An algorithm for computing the mixed radix fast fourier
transform,”IEEE Trans.Audio Electroacoust.,vol.AU-17,no.2,pp.
93–103,1969.
[3]R.Airoldi et al.,“Energy-efficient fast fourier transforms for cognitive
radio systems,”IEEE Micro,vol.30,no.6,pp.66–76,2010.
[4]“Evolved universal terrestrial radio access(E-UTRA):Physical chan-
nels and modulation,3GPP,”Release8Dec.2008,TR36.211v.8.5.0.
[5]M.Morelli and U.Mengali,“A comparison of pilot-aided channel es-
timation methods for OFDM systems,”IEEE Trans.Signal Process., vol.49,no.12,pp.3065–3073,2001.
reference group[6]J.Markel,“FFT pruning,”IEEE Trans.Audio Electroacoust.,vol.19,
no.4,pp.305–311,1971.
[7]H.V.Sorensen and C.S.Burrus,“Efficient computation of the DFT
with only a subset of input or output points,”IEEE Trans.Signal Process.,vol.41,no.3,pp.1184–1200,1993.
[8]T.Sreenivas and P.Rao,“FFT algorithm for both input and output
pruning,”IEEE Trans.Acoust.,Speech,Signal Process.,vol.27,no.3, pp.291–292,1979.
[9]K.Nagai,“Pruning the decimation-in-time FFT algorithm with fre-
quency shift,”IEEE Trans.Acoust.,Speech Signal Process.,vol.34, no.4,pp.1008–1010,1986.
[10]H.Shousheng and M.Torkelson,“Computing partial DFT for comb
spectrum evaluation,”IEEE Signal Process.Lett.,vol.3,no.6,pp.
173–175,1996.
[11]R.G.Alves,P.L.Osorio,and M.N.S.Swamy,“General FFT pruning
algorithm,”in Proc.43rd IEEE Midwest Symp.Circuits and Systems, 2000.
[12]H.Zhong and W.Honghui,“A novel generic fast fourier transform
pruning technique and complexity analysis,”IEEE Trans.Signal Process.,vol.53,no.1,pp.274–282,2005.
[13]S.Singh and S.Srinivasan,“Architecturally efficient FFT pruning al-
gorithm,”Electron.Lett.,vol.41,no.23,pp.1305–1306,2005. [14]L.Min et al.,“Generic multiphase software pipelined partial FFT on
instruction level parallel architectures,”IEEE Trans.Signal Process., vol.57,no.4,pp.1604–1615,2009.
[15]H.Shousheng and M.Torkelson,“A new approach to pipeline FFT
processors,”in Proc.10th Int.Parallel Processing Symp.,1996.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。