Literature DB >> 23251230

Optimization and implementation of scaling-free CORDIC-based direct digital frequency synthesizer for body care area network systems.

Ying-Shen Juang1, Lu-Ting Ko, Jwu-E Chen, Tze-Yun Sung, Hsi-Chin Hsin.   

Abstract

Coordinate rotation digital computer (CORDIC) is an efficient algorithm for computations of trigonometric functions. Scaling-free-CORDIC is one of the famous CORDIC implementations with advantages of speed and area. In this paper, a novel direct digital frequency synthesizer (DDFS) based on scaling-free CORDIC is presented. The proposed multiplier-less architecture with small ROM and pipeline data path has advantages of high data rate, high precision, high performance, and less hardware cost. The design procedure with performance and hardware analysis for optimization has also been given. It is verified by Matlab simulations and then implemented with field programmable gate array (FPGA) by Verilog. The spurious-free dynamic range (SFDR) is over 86.85 dBc, and the signal-to-noise ratio (SNR) is more than 81.12 dB. The scaling-free CORDIC-based architecture is suitable for VLSI implementations for the DDFS applications in terms of hardware cost, power consumption, SNR, and SFDR. The proposed DDFS is very suitable for medical instruments and body care area network systems.

Entities:  

Mesh:

Year:  2012        PMID: 23251230      PMCID: PMC3501827          DOI: 10.1155/2012/651564

Source DB:  PubMed          Journal:  Comput Math Methods Med        ISSN: 1748-670X            Impact factor:   2.238


1. Introduction

Direct digital frequency synthesizer (DDFS) has been widely used in the modern communication systems. DDFS is preferable to the classical phase-locked-loop- (PLL-) based synthesizer in terms of switching speed, frequency resolution, and phase noise, which are beneficial to the high-performance communication systems. Figure 1 depicts the conventional DDFS architecture [1], which consists of a phase accumulator, a sine/cosine generator, a digital-to-analog converter (DAC), and a low-pass filter (LPF). As noted, two inputs: the reference clock and the frequency control word (FCW) are used; the phase accumulator integrates FCW to produce an angle in the interval of [0,2π), and the sine/cosine generator computes the sinusoidal values. In practice, the sine/cosine generator is implemented digitally, and thus followed by digital-to-analog conversion and low-pass filtering for analogue outputs. Such systems can be applied in many fields, especially in industrial, biological, and medical applications [2-4].
Figure 1

The conventional DDFS architecture.

The simplest way to implement the sine/cosine generator is to use ROM lookup table (LUT). However, a large ROM is needed [5]. Several efficient compression techniques have been proposed to reduce the lookup table size [5-10]. The quadrant compression technique can compress the lookup table and then reduce the ROM size by 75% [6]. The Sunderland architecture splits the ROM into two smaller memories [7], and the Nicholas architecture improves the Sunderland architecture to achieve a higher ROM-compression ratio (32 : 1) [8]. The ROM size can be further reduced by using the polynomial approximations [11-18] or CORDIC algorithm [19-27]. In the polynomial approximations-based DDFSs, the interval of [0, π/4] is divided into subintervals, and sine/cosine functions are evaluated in each subinterval. The polynomial approximations-based DDFS requires a ROM to store the coefficients of the polynomials and the polynomial evaluation hardware with multipliers. In the circular mode of CORDIC, which is an iterative algorithm to compute sine/cosine functions, an initial vector is rotated with a predetermined sequence of subangles such that the summation of the rotations approaches the desired angle [28, 29]. CORDIC has been widely used for the sine/cosine generator of DDFS [19-27]. Compared to the lookup table-based DDFS, the CORDIC-based DDFS has the advantage of avoiding the exponential growth of hardware complexity while the output word size increases [30-33]. In Figure 1, the word length of the phase accumulator is v bits; thus, the period of the output signal is as follows: where FCW is the phase increment and T denotes the sampling period. It is noted that the output frequency can be written by According to the equation above, the minimum change of output frequency is given by Thus, the frequency resolution of DDFS is dependent on the word length of the phase accumulator as follows: The bandwidth of DDFS is defined as the difference between the highest and the lowest output frequencies. The highest frequency is determined by either the maximum clock rate or the speed of logic circuitries; the lowest frequency is dependent on FCW. Spurious-free dynamic range (SFDR) is defined as the ratio of the amplitude of the desired frequency component to that of the largest undesired one at the output of DDFS, which is often represented in dBc as follows: where A is the amplitude of the desired frequency component and A is the amplitude of the largest undesired one. In this paper, a novel DDFS architecture based on the scaling-free CORDIC algorithm [34] with ROM mapping is presented. The rest of the paper is organized as follows. In Section 2, CORDIC is reviewed briefly. In Section 3, the proposed DDFS architecture is presented. In Section 4, the hardware implementation of DDFS is given. Conclusion can be found in Section 5.

2. The CORDIC Algorithm

CORDIC is an efficient algorithm that evaluates various elementary functions including sine and cosine functions. As hardware implementation might only require simple adders and shifters, CORDIC has been widely used in the high speed applications.

2.1. The CORDIC Algorithm in the Circular Coordinate System

A rotation of angle θ in the circular coordinate system can be obtained by performing a sequence of micro-rotations in the iterative manner. Specifically, a vector can be successively rotated by the use of a sequence of pre-determined step-angles: α(i) = tan−1(2−). This methodology can be applied to generate various elementary functions, in which only simple adders and shifters are required. The conventional CORDIC algorithm in the circular coordinate system is as follows [28, 29]: where σ(i) ∈ {−1, +1} denotes the direction of the ith micro-rotation, σ = sign⁡(z(i)) with z(i) → 0 in the vector rotation mode [34], σ = −sign⁡(x(i)) · sign⁡(y(i)) with y(i) → 0 in the angle accumulated mode [34], the corresponding scale factor k(i) is equal to , and i = 0,1,…., n − 1. The product of the scale factors after n micro-rotations is given by In the vector rotation mode, sin⁡ θ and cos⁡ θ can be obtained with the initial value: (x(0), y(0)) =  (1/K 1, 0). More specifically, x out and y out are computed from the initial value: (x in, y in) = (x(0), y(0)) as follows:

2.2. Scaling-Free CORDIC Algorithm in the Circular Coordinate System

Based on the following approximations of sine and cosine functions: the scaling-free CORDIC algorithm is thus obtained by using (6), (7), and the above. In which, the iterative rotation is as follows: For the word length of w bits, it is noted that the implementation of scaling-free CORDIC algorithm utilizes four shifters and four adders for each micro-rotation in the first w/2-microrotations; it reduces two shifters and two adders for each microrotation in the last w/2-micro-rotations [24, 34, 35].

3. Design and Optimization of the Scaling-Free CORDIC-Based DDFS Architecture

In this section, the architecture together with performance analysis of the proposed DDFS is presented. It is a combination of the scaling-free-CORDIC algorithm and LUT; this hybrid approach takes advantage of both CORDIC and LUT to achieve high precision and high data rate, respectively. The proposed DDFS architecture consists of phase accumulator, radian converter, sine/cosine generator, and output stage.

3.1. Phase Accumulator

Figure 2 shows the phase accumulator, which consists of a 32-bit adder to accumulate the phase angle by FCW recursively. At time n, the output of phase accumulator is ϕ = (n · FCW)/232 and the sine/cosine generator produces sin⁡((n · FCW)/232) and cos⁡((n · FCW)/232). The load control signal is used for FCW to be loaded into the register, and the reset signal is to initialize the content of the phase accumulator to zero.
Figure 2

The phase accumulator in DDFS.

3.2. Radian Converter

In order to convert the output of the phase accumulator into its binary representation in radians, the following strategy has been adopted. Specifically, an efficient ROM reduction scheme based on the symmetry property of sinusoidal wave can be obtained by simple logic operations to reconstruct the sinusoidal wave from its first quadrant part only. In which, the first two MSBs of an angle indicate the quadrant of the angle in the circular coordinate and the third MSB indicates the half portion of the quadrant; thus, the first three MSBs of an angle are used to control the interchange/negation operation in the output stage. As shown in Figure 3, the corresponding angles of ϕ′ in the second, third, and fourth quadrants can be mapped into the first quadrant by setting the first two MSBs to zero. The radian of ϕ′ is therefore obtained by θ = (π/4)ϕ′, which can be implemented by using simple shifters and adders array shown in Figure 4. Note that the third MSB of any radian value in the upper half of a quadrant is 1, and the sine/cosine of an angle γ in the upper half of a quadrant can be obtained from the corresponding angle in the lower half as shown in Figure 5. More specifically, as cos⁡γ = sin⁡((π/2) − γ) and sin⁡γ = cos⁡((π/2) − γ), the normalized angle can be obtained by replacing θ with θ′ = 0.5 − θ while the third MSB is 1. In case the third MSB is 0, there is no need to perform the replacement as θ′ = θ.
Figure 3

Symmetry-based map of an angle in either the second, third, or fourth quadrant to the corresponding angle in the first quadrant.

Figure 4

The constant (π/4) multiplier.

Figure 5

π/4-mirror map of an angle γ above π/4 to the corresponding angle π/2 − γ below π/4.

3.3. Sine/Cosine Generator

As the core of the DDFS architecture, the sine/cosine generator produces sinusoidal waves based on the output of the radian converter. Without loss of generality, let the output resolution be of 16 bits, for the sine/cosine generator consisting of a cascade of w processors, each of which performs the sub-rotation by a fixed angle of 2− radian as follows: For 8 ≤ i < 16 where σ(i) ∈ {1,0} representing the positive or zero subrotation, respectively. Figure 6 depicts the CORDIC processor-A for the first 7 microrotations, which consists of four 16-bit adders and four 16-bit shifters. The CORDIC processor-B with two 16-bit adders and two 16-bit shifters for the last 9 microrotations is shown in Figure 7.
Figure 6

The CORDIC processor-A.

Figure 7

The CORDIC processor-B.

The first m CORDIC stages can be replaced by simple LUT to reduce the data path at the cost of hardware complexity increasing exponentially. Table 1 depicts the hardware costs in 16-bit DDFS with respect to the number of the replaced CORDIC-stages, where each 16-bit adder, 16-bit shifter, and 1-bit memory require 200 gates, 90 gates, and 1 gate [36], respectively. Figure 8 shows the hardware requirements with respect to the number of the replaced CORDIC-stages [24]. Figure 9 shows the SFDR/SNRs with respect to the replaced CORDIC-stages [25]. As one can expect, based on the above figures, there is a tradeoff between hardware complexity and performance in the design of DDFS.
Table 1

The hardware costs in 16-bit DDFS with respect to the number of the replaced CORDIC stages (m: the number of the replaced CORDIC stages, 16-bit adder: 200 gates, 16-bit shift: 90 gates, and 1-bit ROM: 1 gate).

m 01234567
CORDIC processor requirement:
 CORDIC processor-A 75432100
 CORDIC processor-B99999998

Hardware cost:
 16-bit Adders4638343026221816
 16-bit Shifters4638343026221816
 ROM size (bits)4 × 168 × 1614 × 1626 × 1650 × 16102 × 16194 × 16386 × 16

Total gate counts134041114810084911683408012832410816
Figure 8

Hardware requirements with respect to the replaced CORDIC stages.

Figure 9

SFDR/SNRs with respect to the replaced CORDIC-stages.

3.4. Output Stage

Figure 10 shows the architecture of output stage, which maps the computed sin⁡θ and cos⁡θ to the desired sin⁡ϕ and cos⁡ϕ. As mentioned previously, the above mapping can be accomplished by simple negation and/or interchange operations. The three control signals: xinv, yinv, and swap derived from the first three MSBs of ϕ are shown in Table 2. xinv and yinv are for the negation operation of the output and swap for the interchange operation.
Figure 10

The output stage.

Table 2

Control signals of the output stage.

MSB's of ϕ ϕ xi nv yi nv sw ap cos⁡2πϕ sin⁡⁡2πϕ
000 0<2πϕ<π4 000cos⁡θ sin⁡⁡θ
001 π4<2πϕ<π2 001sin⁡⁡θ cos⁡θ
010 π2<2πϕ<3π4 011−sin⁡θ cos⁡θ
011 3π4<2πϕ<π 100−cos⁡⁡θ sin⁡⁡θ
100 -π<2πϕ<-3π4 110−cos⁡⁡θ −sin⁡⁡θ
101 -3π4<2πϕ<-π2 111−sin⁡θ −cos⁡θ
110 -π2<2πϕ<-π4 101sin⁡⁡θ −cos⁡θ
111 -π4<2πϕ<0 010cos θ −sin⁡⁡θ

4. Hardware Implementation of the Scaling-Free CORDIC-Based DDFS

In this section, the proposed low-power and high-performance DDFS architecture (m = 5) is presented. Figure 11 depicts the system block diagram; SFDR of the proposed DDFS architecture at output frequency F clk/25 is shown in Figure 12. As one can see, the SFDR of the proposed architecture is more than 86.85 dBc.
Figure 11

The proposed DDFS architecture.

Figure 12

SFDR of the proposed DDFS architecture at output frequency F clk/25.

The platform for architecture development and verification has also been designed as well as implemented to evaluate the development cost [37-40]. The proposed DDFS architecture has been implemented on the Xilinx FPGA emulation board [41]. The Xilinx Spartan-3 FPGA has been integrated with the microcontroller (MCU) and I/O interface circuit (USB 2.0) to form the architecture development and verification platform. Figure 13 depicts block diagram and circuit board of the architecture development and evaluation platform. In which, the microcontroller read data and commands from PC and writes the results back to PC via USB 2.0 bus; the Xilinx Spartan-3 FPGA implements the proposed DDFS architecture. The hardware code in Verilog runs on PC with the ModelSim simulation tool [42] and Xilinx ISE smart compiler [43]. It is noted that the throughput can be improved by using the proposed architecture, while the computation accuracy is the same as that obtained by using the conventional one with the same word length. Thus, the proposed DDFS architecture is able to improve the power consumption and computation speed significantly. Moreover, all the control signals are internally generated on-chip. The proposed DDFS provides both high performance and less hardware.
Figure 13

Block diagram and circuit board of the architecture development and verification platform.

The chip has been synthesized by using the TSMC 0.18 μm 1P6M CMOS cell libraries [44]. The physical circuit has been synthesized by the Astro tool. The circuit has been evaluated by DRC, LVS, and PVS [45]. Figure 14 shows the cell-based design flow.
Figure 14

Cell-based design flow.

Figure 15 shows layout view of the proposed scaling-free CORDIC-based DDFS. The core size obtained by the Synopsys design analyzer is 452 × 452 μm2. The power consumption obtained by the PrimePower is 0.302 mW with clock rate of 500 MHz at 1.8 V. The tuning latency is 11 clock cycles. All of the control signals are internally generated on-chip. The chip provides both high throughput and low gate count.
Figure 15

Layout view of the proposed scaling-free-CORDIC-based DDFS.

5. Conclusion

In this paper, we present a novel DDFS architecture-based on the scaling-free CORDIC algorithm with small ROM and pipeline data path. Circuit emulation shows that the proposed high performance architecture has the advantages of high precision, high data rate, and simple hardware. For 16-bit DDFS, the SFDR of the proposed architecture is more than 86.85 dBc. As shown in Table 3, the proposed DDFS is superior to the previous works in terms of SFDR, SNR, output resolution, and tuning latency [6, 17, 18, 26, 27]. According to the high performance of the proposed DDFS, it is very suited for medical instruments and body care network systems [46-49]. The proposed DDFS with the use of the portable Verilog is a reusable IP, which can be implemented in various processes with tradeoffs of performance, area, and power consumption.
Table 3

Comparisons of the proposed DDFS with other related works.

DDFSKang and Swartzlander, 2006 [23]Sharma et al., 2009 [26]Jafari et al., 2005 [17]Ashrafi and Adhami, 2007 [18]Yi et al., 2006 [6]De Caro et al., 2009 [27]This work,Juang et al.,2012
Process (μm)0.130.50.350.350.250.18
Core area (mm2)0.350.510.204
Maximum sampling rate (MHz)1018230106210100385500
Power consumption (mW)0.3431120.810.40.302
SFDR (dBc)905472.2809086.85
SNR (dB)677081.12
Output resolution (bit)17101412161316
Tuning latency (clock)3311
  7 in total

1.  Game theory for Wireless Sensor Networks: a survey.

Authors:  Hai-Yan Shi; Wan-Liang Wang; Ngai-Ming Kwok; Sheng-Yong Chen
Journal:  Sensors (Basel)       Date:  2012-07-02       Impact factor: 3.576

Review 2.  Modeling of biological intelligence for SCM system optimization.

Authors:  Shengyong Chen; Yujun Zheng; Carlo Cattani; Wanliang Wang
Journal:  Comput Math Methods Med       Date:  2011-11-24       Impact factor: 2.238

3.  Approximating ideal filters by systems of fractional order.

Authors:  Ming Li
Journal:  Comput Math Methods Med       Date:  2012-01-16       Impact factor: 2.238

Review 4.  Recent advances in morphological cell image analysis.

Authors:  Shengyong Chen; Mingzhu Zhao; Guang Wu; Chunyan Yao; Jianwei Zhang
Journal:  Comput Math Methods Med       Date:  2012-01-09       Impact factor: 2.238

5.  Characterization of healing following atherosclerotic carotid plaque rupture in acutely symptomatic patients: an exploratory study using in vivo cardiovascular magnetic resonance.

Authors:  Zhongzhao Teng; Andrew J Degnan; Umar Sadat; Fang Wang; Victoria E Young; Martin J Graves; Shengyong Chen; Jonathan H Gillard
Journal:  J Cardiovasc Magn Reson       Date:  2011-10-27       Impact factor: 5.364

Review 6.  Functional magnetic resonance imaging for imaging neural activity in the human brain: the annual progress.

Authors:  Shengyong Chen; Xiaoli Li
Journal:  Comput Math Methods Med       Date:  2012-01-26       Impact factor: 2.238

7.  On the existence of wavelet symmetries in archaea DNA.

Authors:  Carlo Cattani
Journal:  Comput Math Methods Med       Date:  2011-11-28       Impact factor: 2.238

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.