A modular multichannel spectrometer - design study

G. Comoretto<sup>1</sup>, A. D'Ambrosi<sup>2</sup>, R. Nesti<sup>1</sup>, A. Russo<sup>3</sup>, F. Palagi<sup>4</sup>

October 17, 2006

1) INAF - Osservatorio di Arcetri

- 2) Università di Firenze Dipartimento di Ingegneria Elettronica
- 3) Università di Firenze Dipartimento di Astronomia
- 4) INAF Istituto di Radioastronomia sez. di Firenze

INAF - Osservatorio di Arcetri

Arcetri Technical report 4/2006 Firenze, June 2006

## Abstract

The new generation of multichannel, wideband radio receivers require a spectrometer with a very wide input bandwidth (up to a few GHz), many simultaneous input channels (of the order of a few tens) and high spectral resolution (several thousand spectral points). With the currently available programmable devices such an instrument can be developed within a reasonable cost budget.

The Radioastronomy group has gained a considerable experience with these systems designing a digital tunable filterboard for the ALMA correlator. in collaboration with the Bordeaux Observatory. The same board can be reprogrammed to implement a working prototype of such an instrument.

In this report we analyze the specifications and possible performances of such a technological demonstrator, based on a few ALMA boards and a commercial fast analog-to-digital converter. The system will provide up to 16 channels with a band of up to 62 MHz each, and a spectral resolution of at least 1024 spectral points per channel, with polarimetric capabilities.

# 1 General document description

# 1.1 Introduction

The Radioastronomy laboratory of the Arcetri Observatory is involved in several projects concerning wideband, multi-feed receivers. In particular, a 7 channel, dual polarization receiver for the 18-26 GHZ frequency band has been developed in the framework of the European FARADAY project. These receivers require a broadband, multi-channel spectrometer to be developed as a backend.

To explore the current technologies in digital signal processing, a system based on the ALMA tunable filterboard (TFB) [4] has been designed. The system, although it is basically a low-cost technological demonstrator, has sufficient capabilities to provide an useful astronomical spectrometer for at least the first use of the 7 channel receiver.

The required system characteristics are:

- The system will interface directly to the antenna IF, with a bandpass filter to select an appropriate portion of the IF band, and an ADC using Nyquist sampling to convert the filtered portion to baseband (see fig. 2). Bandpass selection and fine tuning will be performed using a tunable digital receiver/filter.
- The maximum bandpass will be of the order of 60 MHz. Smaller bandpasses, with corresponding higher frequency resolution, will be available in binary steps, up to at least 1/128 of the maximum one (0.5 MHz).
- Out-of-band rejection will be at least 50 dB. Bands will be tunable in the full input range, with accuracy much less of the minimum band.
- The spectrometer provides at least 1024 spectral points. A polyphase filter stage is used to provide good insulation between adjacent spectral points (at least 50 dB)
- The system will be controlled through a standard TCP-IP socket. The integration will be performed on board, with appropriate buffering and synchronization, with a minimum integration time of the order of 0.1s (for on-the-fly mapping).
- Input sampling will be performed with at least 6 bits accuracy. Total power measurements on the input band and the filtered band will be available.
- At least 8 input channels should be available, to cover the 7 channel (one with dual polarization) receiver.
- If a new sampler with wider bandwidth is available (e.g. the 4GS/s ALMA sampler), it should fit in the system with minimum modifications. In this case, it should be possible to use a couple of ALMA boards to provide a total bandwidth of 2 GHz with 62 KHz resolution. An option with a commercial 500 MHz bandwidth sampler will also be considered.
- The system should be completely reprogrammable. All FPGA personalities should be easily alterable. It should be possible to be able choose between a few alternative personalities, to implement different physical instruments with the same hardware.

An important auxiliary constrain is that the cost, at least for the first two versions, has to fit in the very low current budget of the laboratory.

The group has gained considerable experience in digital system design developing, in collaboration with the Bordeaux Observatory and the NFRA, a tunable filterboard for the ALMA correlator. This board implements 32 digital filters, with a common input band of 2 GHz and 32 output signals of 62.5 MHz each. The board is basically a large array of field programmable gate arrays (FPGAs), and can be used for different applications by reprogramming these components. Both the experience (filter architecture, different filter topologies explored during the design phase) and the board itself can be used to implement a new spectrometer.

To minimize costs and technological risks, 3 systems with partially overlapping components have been designed:

- A single channel spectrometer, using the ALMA TFB pre-prototype. This system uses a 125 MS/s ADC and a complement of 4 Altera Stratix chips (3 Altera 1S40 and 1 Altera 1S10) to provide basic filtering, and a 1024 channel FFT spectrometer. The control is provided by a Rabbit 8 bit microprocessor.
- A 8 channel spectrometer, using the ALMA Stratix2 TFB board. The data conversion is performed in a custom ADC board, with 8 AD9480 Analog ADC chips (250 MS/s). The 16 Altera 2S40 FPGAs in the TFB board provide 8 independent tunable filters and spectrometers. The control is performed by a ICOP single chip industrial PC (equivalent to a 166 MHZ Pentium standard PC) under Linux.
- A scaled system of the previous one, with 4 Stratix2 TFB boards. This could provide up to 32 input channels, or increased spectral resolution. The main difference with respect to the previous version is an expanded backplane and power supply system.

In this report, we will describe the basic architecture of the system, (chapter 2), with a more detailed description of each building block. Then we will describe in more detail each implementation, and the components that are peculiar of each one (chapter 3). The estimate cost and timescale will be described in chapter 4, together with future plans.

### **1.2** Abbreviation and Acronymes

**ADC** Analog to Digital Converter

- ALMA Atacama Large Millimeter Array
- **BBC** Baseband Converter. A component (analog or digital) used to convert a portion of a radio band to a band starting at frequency zero. Also called a **SSB Converter**.
- **CMOS** Complementary Metala Oxyde Semiconductor: family of logic characterized by hig speed and low power comsumption.
- **CPLD** Complex Programmable Logic Device: a digital component that can be programmed to perform a limited set of operations. It is less complex than a FPGA, but programming memory is permanent, and is not cleared at power-off.
- **DDS** Direct Digital Synthesizer: a digital component that generates a sinusoidal tone with a frequency in an exact ratio to a given clock frequency. The frequency is usually programmable with high accuracy.
- **FFT** Fast Fourier Transform
- **FIR** Finite Impulse Response (filter): a class of filters obtained by direct convolution of the input signal with a finite length convolving function
- **FPGA** Field Programmable Gate Array: a digital component that can be programmed to implement a complex circuit. The program is stored in volatile memory, and the component must be reprogrammed at power-on
- **IF** Intermediate Frequency: the section of a radio receiver between the first mixer and the final processing section
- **TFB** Tunable Filter Board: a filter board developed for the ALMA correlator. It contains 16 large FPGAs, and is originally used to implement a bank of 32 fast digital filters
- **VHDL** VHSIC Hardware Description Language: A high level language, similar to a programming language, to describe digital hardware in an astract way. Another similar language is **Verilog**.

### 1.3 Glossary

**channel:** The signal produced by a separate radio receiver or IF section, and the portion of the instrument that processes it

band: Spectral portion of the signal that is analyzed by the instrument

- decimation: process of keeping one sample every N in a digital sequence, and discarding all the others. Can be performed after low pass filtering, to eliminate information that has become redundant.
- **personality:** (of a FPGA or CPLD) Description (as a set of programming bits) that specify the particular circuit implemented by a programmable device. The same device can be reprogrammed with different personalities to perform different tasks. FPGAs must have their personality downloaded at every power-on.
- **polyphase filter:** A filter in which the convolving function is different for different samples. As it is not time-invariant, it can be used only for particular purposes, usually for decimating filters.

spectral point: The individual element of the spectrometer output.

spectral resolution (or just resolution): The spacing between adjacent spectral points.

### 1.4 Related control Drawings

CORL-60.00.00-007-D-DWG CPLD2 interface schematic diagram. ALMA documentation

- CORL-60.01.04.01-001-C-DWG Tunable Filter Board Prototype schematic diagram. ALMA Documentation
- CORL-60.01.07.01-001-B4-DWG Tunable Filter Board AlteraII version schematic diagram. ALMA Documentation
- ARC-RADIO-SPT-01.02.001-A-DWG Multichannel spectrometer project: 2 board backplane schematic. Arcetri Observatory
- ARC-RADIO-SPT-02.01.001-A-DWG Multichannel spectrometer project: 8 channel ADC board schematic. Arcetri Observatory

# 2 General description

The basic design of the spectrometer is the same for the three versions. The differences regard mainly the number of input channels (that reflects in the system architecture, but not on the architecture of the single spectrometer), and the bandwidth of the ADC (that has some implications in the design of the digital filter).

This architecture is shown in fig. 1. The analog signal from the receiver IF is pre-filtered by a passive analog bandpass filter. This filter selects a band that is placed between two successive multiples of half the sampling frequency. Wide guard bands are used, to simplify the filter design, so the total useful band is roughly 1/4 the sampling frequency (see fig. 2). This band is converted to digital in the ADC, that must have a analog band sufficient to cover the used IF band. The used ADC have an input band of 350 MHz (AD9433) and 750 MHz (AD9480).

The sampled signal may be pre-processed by a real-to-complex conversion stage (used only with the 250 MS/s sampler), that converts a 250 MS/s real signal to a 125 MS/s complex one.

It is then processed by a digital receiver, composed by a local oscillator/mixer and a low pass filter. The complex local oscillator and mixer downconverts a selected portion of the signal to near zero frequency, and a programmable low-pass filter selects the desired output bandwidth. The output signal is complex (with negative and positive frequencies corresponding to different physical sky frequencies), and the output sampling rate is equal to the bandwidth.



Figure 1: General architecture of a spectrometer channel



Figure 2: Processing of the signal in the frequency domain. The RF signal (a) is filtered by an analog bandpass filter (b), sampled by the ADC (c), converted to complex in the preprocessing stage (d), mixed with a local oscillator (e) and filtered by a variable band filter (f). All frequencies are indicative only

This signal is convolved with an appropriate windowing function, in a polyphase filter block, and analyzed by a radix-2 FFT block. Each FFT unit can analyze two independent signals in parallel, with a resolution of at least 1024 spectral points each. Although 1024 points will be assumed in this report, spectrometer resolution of up to 4192 points should be possible.

The integration block computes the squared norm of the complex spectrum, and integrates it for a programmable time. The integration memory uses a double buffering scheme, where the signal is integrated in one buffer while the other is being read back by the control computer.

The control computer interfaces to all these elements using a standard interface developed for the ALMA correlator. The interface uses a 8 bit bidirectional bus to program and read back up to 256 locations in up to 16 chips in up to 16 boards. Board, chip and register addresses are specified sequentially on the 8 bit bus, and stored locally. The physical interface is implemented as a CPLD device in each board, that is programmed once with a standard personality. The interface is also used for programming each FPGA. The computer interface port is very simple, and can be implemented using a standard parallel port. It is however relatively slow, and is currently a bottleneck for reading back fast integrations.

All the elements described in the following sections have been implemented, or are currently being implemented, as design modules (*hardware cores*) in field programmable gate arrays (FPGAs). They are written in high level electronic design language (VHDL or Verilog), and are translated to physical

design by a specific FPGA synthesis and compiler software. They can be easily ported to different devices, combined together or splitted between devices, and can be considered as *hardware subroutines*. This approach allows for these components to be individually tested and optimized, even before the final hardware is available. In this way, it is possible to develop and test these elements using the first implementation of the spectrometer, and combining them later on in the other implementations.

# 2.1 Digital input

The radio signal is sampled by a commercial ADC chip, operating in the so called Nyquist mode, i.e. with a sampling frequency  $f_s$  much less than the signal frequency  $f_r$ . In this way, the signal is beated with an harmonic of the sampling signal, and downconverted to a band comprised between zero frequency and  $f_s/2$ . To operate correctly, this method requires that the ADC analog input stage has a bandwidth wide enough to accommodate the signal, that the sampling time window is much narrower than the reciprocal signal frequency  $1/f_r$ , and that the radio signal is band limited to a bandpass comprised between two successive multiples of  $f_s/2$ .

Nyquist sampling allows the instrument to be connected directly to the receiver IF stage, without the need for a SSB conversion stage, requiring only a bandpass filter. On the other side, a SSB stage is usually more flexible, as the total observable band of the present system (tuning range) is limited to  $f_s/2$  (62 or 125 MHz), that is usually much less than the available IF band (500 MHz to 2 GHz). For the reasons listed in the following chapter, the effective tuning range is usually smaller by roughly another factor of 2.

The problem can be solved using a high frequency converter, like the 4 GS/s ALMA digitizer. These devices are however quite expensive, and will be considered only in the section on future developments.

#### 2.1.1 ADC converter

The two ADC employed in this project have the following characteristics

| Model  | $f_s$ (MS/s) | Analog<br>BW (MHz) | N. bits |
|--------|--------------|--------------------|---------|
| AD9433 | 125          | 350                | 12      |
| AD9480 | 250          | 750                | 8       |

To easily interface with the VLBI terminal, a IF frequency around 250-300 MHz has been selected. Considering the typical performances of a passive LC filter, the following design goals have been considered:

| Model  | $f_s$  | Nominal    | Passband  | Stopband |  |
|--------|--------|------------|-----------|----------|--|
|        | (MS/s) | band (MHz) | (MHz)     | outside  |  |
| AD9433 | 120    | 240-300    | 255 - 285 | 225-315  |  |
| AD9480 | 250    | 250-375    | 280 - 345 | 220-405  |  |

To better match the IF frequency of the H2O line in the VLBI receiver, a sampling clock of 120 MHz (instead of 125 MHz) has been used in the first case. The resulting maximum bandwidth is 30 MHz and 62.5 MHz respectively in the two cases.

The two ADC are coupled to the incoming signal using a RF transformer, to remove any DC component and to provide better insulation of the input circuit. The clock signal is generated in a control FPGA using the internal PLL.

#### 2.1.2 Band limiting filter

The wide transition bands (of the order of  $f_s/4$ ) allow for a simple filter design.

For the prototype version, the bandpass filter has been designed on the basis of the five pole Chebychev prototype [2] with 0.3dB passband ripple. The filter is centered at 270MHz with about a 40MHz wide 3dB bandwidth.

For the 8 channel version, a bandpass filter in the 125-250 MHz range has been designed. The usable bandwidth in this case is approximately 75 MHz (145-220 MHz). If a BBC converter is available, with an output low-pass filter of 100 MHz, no fiter at all is necessary.

Surface mount lumped elements have been used and the filter has been fabricated in grounded coplanar waveguide technology by using a Duroid substrate.

In the prototype test phase, parasitic elements of the lumped devices were found to have an impact on the filter performances, as it was expected at this frequency. As a result, the filter frequency response curve was shifted about 20MHz down and its shape was distorted.

The solution adopted was to use variable capacitors in the series resonators and to reduce the nominal value of the design elements. This optimization leads to the circuit topology shown in Fig. 3.



Figure 3: Bandpass filter:  $L_1=240nH$ ;  $L_2=2.8nH$ ;  $L_3=1.8nH$ ;  $C_1=(1.4-3)pF$ ;  $C_2=120pF$ ;  $L_3=180pF$ .

The planar circuit was boxed in aluminum material and SMA connectors have been used to access the filter. Four filter devices have been fabricated and three of them are shown in Fig. 4



Figure 4: Photograph of the bandpass filters.

The four devices performances are fairly the same. All of them have an insertion loss of about 6-7dB in the passband and a typical transmission response is shown in Fig. 5.

The obtained passband is consistently shifted towards lower frequencies, and the usable band is reduced to about 250-280 MHz.



Figure 5: Bandpass for a typical filter

#### 2.1.3 Signal conditioning

The ADC output is represented by a real signal, with a data rate equal to the sampling frequency and a total bandwidth equal to  $f_s/2$ . Since subsequent processing is performed on complex quantities, it may be convenient to transform the data stream to complex format, at a reduced data rate. This is particularly convenient for the 250 MHz ADC, as the sampled data is presented in time multiplexed format, while the resulting complex stream would not need multiplexing. 125 MHz ADC can be easily processed as a real data stream, and is converted to complex format in the digital filter.

To transform a real stream to a complex one, the first stream is multiplied by  $\exp(it\pi/2)$ , where t is the sample time index. The resulting complex signal is then filtered by a lowpass filter, with a bandpass equal to half the input bandpass, and decimated by 2. A simple architecture to perform these steps is shown in figure 6. As the passband of the sampled data is roughly equal to  $f_s/4$ , a loose filter design is possible, with only a few FIR taps. Most taps have a very low value, allowing for an almost multiplier-less design. For example, a filter with a cutoff frequency of  $f_s/4$  can be implemented using a symmetric 11-tap FIR, with values listed in the table:

| Delay | Value | Delay | Value |
|-------|-------|-------|-------|
| 0     | 256   | 3     | -32   |
| 1     | 155   | 4     | -1    |
| 2     | 2     | 5     | 6     |

Most values can be implemented with simple shift (multiplication by a power of 2) and add of at most 2 elements. The only tap that requires a full multiplication is the one with delay equal to 1.

#### 2.1.4 Test signal generator

A programmable element in the data path can be used to implement a test signal generator. (fig. 7). A test *personality* has been developed to generate a pseudo-random noise, with an optional sinusoidal tone added. The tone frequency can be varied continuously in a programmable sweep, that can be used to directly measure a filter response. The Random Data Generator uses a linear feedback shiftregister to generate 32 uncorrelated bits every clock cycle. These bits are interpreted as a random quantity with uniform distribution. Adding together several 8 bit words from different generators, a pseudorandom



Figure 6: Real to complex conversion stage

quantity with approximately Gaussian statistic is generated. These four signals are added together in an arbitrary linear combination.



Figure 7: Programmable signal source. The output is a linear combination of the signal from the ADC, an uniform noise, a Gaussian noise, and a monochromatic line

### 2.2 Digital receiver and filter

The sampled signal, either real or complex, can be further filtered to reduce its bandwidth. In this way the spectrometer can analyze a subsection of the input band, with an arbitrary width and position.

The spectrometer must be able to analyze the radio signal with a spectral resolution up to 0.02 Km/s, to be able to adequately sample cold thermal lines. This corresponds to 1.5 KHz resolution at 22 GHz, and 700 Hz at 10 GHz. To reach these resolutions with a 1024 point spectrometer, the total bandwidth of 125 MHz must be reduced by a factor of 70 and 183, respectively. A filter with a variable decimation factor in the range of at least 1–256 is thus required.

The output data format is always complex, and thus represents a physical band equal to the output data rate. If  $f_{si}$  and  $f_{so}$  are resp. the input and output sampling rates, the output band extends to  $-f_{so}/2$  to  $f_{so}/2$ . If the input data format is real (as for the 125 MHz ADC), with a band extending from DC to  $f_{si}/2$ , the actual bandwidth reduction is equal to half the decimation factor  $f_{si}/f_{so}$ . The minimum decimation factor, used to observe the whole input band, is thus equal to 2. In this case, the local oscillator is set to  $f_{si}/4$ , to center the input band around frequency zero.

For complex input data format, a decimation factor of 1 could in principle be selected, effectively disabling the whole digital receiver. In this case, also the converson stage (local oscillator and mixer) is disabled. However, it has been proved very difficult to run the FFT engine at more than 62 MHz, and this therefore is the maximum bandwidth that can be analyzed at the moment. The minimum usable decimation factor is thus always 2.

A block schematic of the digital receiver is shown in fig. 8. The (complex or real) input is multiplied by a complex exponential  $\exp(2\pi i f_l)$ , where the frequency  $f_l$  is generated by a direct digital synthesizer (DDS) module, and filtered in a complex low pass filter of cutoff frequency  $f_f$ . The receiver thus selects a portion of the input signal of width  $2f_f$  and central frequency  $f_l$ .



Figure 8: Digital tunable filter. The band of interest is selected by mixing the input signal with a local oscillator frequency, and filtering the resulting signal by a programmable filter

The quality of the output band is determined by the filter shape. The adopted filter uses a FIR symmetric design, with a number of taps proportional to the decimation factor (input band/output band). For a half band filter (decimation factor of 2), 64 taps are used, resulting in a usable bandwidth equal to 96% of the nominal one, 0.15 dB in-band ripple, and about 50 dB of out-of-band rejection. The tap coefficients have been computed using the Remez algorithm.

It is possible to use higher decimation factors, in power of two, up to 256. The corresponding band is 488 KHz, with a resolution of 480 Hz. The design of the filter uses the same multipliers to process successive taps of the FIR structure (tap recirculation), so the filter complexity and structure does not change with increased decimation. The filter shape scales roughly with the bandwidth, apart from minor differences due to quantization of the tap coefficients. The actual bandshape corresponding to decimation factors between 2 (top, red) and 256 (bottom, green) are plotted in fig. 9. Filters for decimations between 32 and 256 have been optimized for stopband rejection, at the expense of a somewhat worse passband ripple, because in these bands the stopband is folded many times, and the associated noise is thus multiplied by the decimation factor.

The filter output level depends on the quantization, and on the shape of the input band. Therefore it is necessary to adjust signal level, in order to maintain optimum signal amplitude at the filter output. The expected signal variation due to the bandwidth changes are compensated automatically, while a gain control circuit allow to correct those due to spectral slope in the signal.

The filtered signal level can be monitored using a digital total power meter. The power meter result can be used to set the filter gain control.

## 2.3 FFT Block

The FFT block is the core of the spectrometer. It is based on the JFFT package, developed by Mock [3]. It is composed of  $log_2(N)$  stages, each one performing a butterfly computation on two sequences of length  $L = 2^n n = (N-1) \dots 0$ . The structure of each stage is shown in fig 11. As the butterfly multiplier processes two inputs at each clock cycle, it takes L/2 clocks to process one sequence. A second sequence is thus processed in the remaining cycles.

The FFT block output presents the two FFT scrambled together. All even and odd spectral points are produced on output channels X and Y respectively. Spectral points are in bit reversed order, with the spectrum for channels A and B in the first and last N/2 points respectively. These points will be sorted in the integration memory.



Figure 9: Bandshape of the digital filter, for decimation factor between 2 (red) and 32 (green). Left: whole band, right: passband

The FFT processor uses several counters, that must be synchronized for correct operation. This is performed using a synchronization pulse, that is active for one cycle every N. This pulse also divides the input data stream into *FFT frames*, and signals the first sample in a frame. The FFT block analyzes each frame independently, and produces one spectrum for each frame.

#### 2.3.1 Polyphase filtering

As seen in the previous chapter, a FFT processor analyzes finite segments (or frames) of the input data stream, of length N, producing a N point complex spectrum. Due to the finite length of the input sequence, the spectrum is convolved with a sin(x)/x function. Therefore each spectral point is heavily contaminated by the adjacent points, and a single unresolved line (e.g. a man-made interference) produces spurious responses that decay very slowly with the frequency offset.

A common way to overcome the problem is to multiply the input sequence with an appropriate



Figure 10: FFT block. The processor analyzes two independent signals, framed by a synchronization pulse. Each signal is pre-processed (windowed) in a polyphase filter, to control channel shape. The two complex spectra produced are squared, and integrated in a double buffer memory



Figure 11: Single stage of the FFT. The stage processes two 2M length blocks. For the first M cycles, it processes intput A, and for the next M cycles input B. The complete FFT is composed of  $\log_2(N)$  stages, with  $M = N/2 \dots 1$ 

tapering function, but to preserve the spectral resolution the function must be much longer than N. An efficient way to perform this multiplication is by using a small FIR filter before the FFT operation, with coefficients that depend on the position in the FFT sequence. The process is depicted in fig. 12. This filter is called a *polyphase filter*<sup>1</sup>

For a non multiplexed input stream, a simple implementation of this filter is shown in fig. 13. For our filter, a length L = 4N has been chosen. Filter parameters have been computed using a modified version of the Remez algorithm, truncating the tap coefficients to 9 bit. The resulting filter has a stopband rejection of 50 to 60 dB, a in-band ripple of 0.1 dB. and the 3 dB spectral resolution is about 1.2 spectral channels.

The response for an individual channel is shown in fig. 14. The dashed line is the full response (range on the right), the solid line is the zoomed in-band response (scale o the left), while dotted lines are the zoomed responses for the adjacent spectral points. The horizontal scale is in resolution points for the FFT.

#### 2.3.2 Integration block

The FFT outputs a sequence of complex spectral informations. The quantity of interest for astronomical observations is the power density spectrum, i.e. the averaged norm of the individual spectra. To compute this, the real and imaginary parts are squared, added together, and consecutive FFT output frames are integrated in a large RAM memory. The integration is thus always composed of an integer number of FFT frames.

#### 2.3.3 Polarization observations

If the two channels of the FFT processor represent the two independent polarizations of the radio signal, a full polarimetric spectrometer can be easily implemented.

As the corresponding channels are separated by N/2 samples, a FIFO block is required to realign them. Then, in addition to the products  $|X^2|$  and  $|Y^2|$ , the complex quantity XY must be evaluated and integrated. This last is provided as two real vectors, for the real and imaginary parts of XY respectively.

## 2.4 Interface bus

Each board and chip in the system is controlled using a standard bus developed for the ALMA correlator (fig. 15).

This interface is based on a 8 bit data bus, a 4 bit control bus, and a strobe signal. For speed and noise immunity, the strobe signal uses LVDS standard, while all other signals are standard 3.3V CMOS. The protocol is described in detail in Broadwell [1], and allows direct addressing of a series of registers in each chip in the system. Registers may have a width of one or more bytes, with the convention that

<sup>&</sup>lt;sup>1</sup>The term *poliphase fiter* is used also in many other situations and may be confusingly ambiguous.



Figure 12: Channel shaping. Input data stream is convolved with a tapering function. As the FFT is performed piecewise, the convolution is done on blocks of length N. The input signal is delayed by multiple of N samples, blocks between dotted lines are multiplied by the appropriate segment of the tapering function, and added together

a multi-byte quantity is written addressing several times the same register. It is possible to broadcast a value to multiple registers, to increase programming speed.

The interface is designed to be easily implemented in programmable chips (FPGA) and include the protocol to download the *personality* (the electronic design to be implemented) in these devices.

The communication protocol includes three addressing levels:

- The board, specified by driving the appropriate strobe signal
- The FPGA inside a board, specified by writing a register in the board interface
- The register inside a FPGA, specified by writing a *control register* in the device

Each board contains a interface chip (CPLD2), that is directly addressed from the bus. To address the other devices in the board, a set of registers must be programmed in the CPLD2 chip.

### 2.5 Control computer

The control computer must perform a very limited set of functions. It is connected to a standard Ethernet port, and to the interface bus described above, and its primary function is to translate the CPLD2 protocol to Ethernet commands.

The first release of the software is designed with the idea that every function will be performed in the remote (observer) computer. The remote program will directly access each control location in the FPGAs, and read back the results from these registers. This is appropriate for a system that is mainly devoted to development and testing.

As the system will move towards a more stable configuration, it will be possible to move higher functionalities (up to the complete management of a single integration) to the control computer. The low level functionalities, however, will be mantained for system control and debug.



Figure 13: Implementation of the tapering using a polyphase filter. Filter coefficients are stored in 4 ROM memories, and applied to samples separated by N clock periods. Delay is implemented with FIFOs

The first version of the control computer uses a Rabbit microprocessor. This device provides sufficient performance for initial testing, but the TCP/IP stack is rather slow and in particular is not fast enough to allow to download FPGA *personalities* using the Ethernet port. Integration data read-back takes typically one second, and is marginal even with a single spectrometer channel.

A more efficient version uses an industrial PC board based on a Virtex system-on-a-chip, with roughly the performance of a 166 MHz Pentium. The board works under a specific Linux operating system, and has a measured bandwidth in excess of 1 MB/s on the Ethernet port. The CPLD2 interface, however, is implemented using a standard parallel interface, that is limited to about 150 KB/s. This is sufficient for the required integration times, and allows *personality* download in reasonable times.

The board has sufficient on-board memory to store many different FPGA *personalities*, and can easily manage even a first display of the collected spectra, if required.

# 2.6 Control software

The control software is composed of several layers:

- A low level interface program, running on the control computer, that maps simple commands received on a TCP-IP port to commands on the CPLD2 interface. This program is also responsible for FPGA personality download, either from a local file or from the TCP-IP port.
- A corresponding low level layer on the user computer, that generates simple commands in response to a standard set of *methods*
- A set of objects, corresponding to the hardware modules described above. For example, an object represents a digital receiver, with appropriate modules for setting the bandwidth and the frequency.
- Some debug programs, that allow simple control over these objects.

These layers are implemented using object oriented programming, for ease of modelization and reusability. Using these objects as building blocks, together with a standard graphic user interface, it is possible to build larger programs controlling a specific hardware.

# 3 Implementation

# 3.1 Implementation details of the individual cores

The elements (cores) described in the previous chapter have been implemented and simulated using the Altera Quartus development system. All cores have been implemented and tested using the single channel version of the spectrometer described in chapter 3.2.

These elements are described using both hardware description languages (VHDL and Verilog) and schematic diagrams.



Figure 14: Response of the polyphase filter used. Dashed: full scale response (scale on right). Continuous: zoomed bandpass response (scale on left). Dotted: Adjacent channels response

### 3.1.1 Digital receiver

The digital tunable filter design is realized partially in VHDL and partially with an Altera Quartus graphic tool. Its first version, which presents 2 filter channels, is ported and tested on one of the 3 Altera Stratix FPGA 1S40F780C5 located on the TFB board, precisely on the closest one to the 1S10, used to generate test signal to send to the spectrometer.

The filter works at 120 MHz and it requires little resource usage. For two independent filters are required:

- 10% of 41,250 logic elements of the chip,
- 6% of the total memory bits (which amount at 3,423,744 bits)
- 70 of the 112 9-bit hardware multipliers presents on the chip.

It is realized with a complex mixer, a programmable local oscillator and a low pass filter, as shown in fig 6.

Based on the applications the output can be a real data stream at sampled data frequency double with respect to the pass band, or can be a complex stream at frequency equal to the pass band. The signal is sampled and converted by a complex mixer, driven by a local oscillator signal. The LO phase is undetermined, because it is not locked to an absolute reference time, while the frequency is locked to the external reference clock.

The data stream uses 1.8 V standard. The architecture is designed in a gerarchic way described by the top level VHDL file bbc\_vhd.vhd.

It is subdivided in:

• an interface with the CPLD2 bus, which implements the control register and the address decoding (file cpld2\_interface.vhd)



Figure 15: Structure of the standard CPLD2 ALMA interface

- an entity to generate bus activity signals (cs\_strobe.vhd)
- a PLL to generate the filter internal clock
- 2 bbc\_component, including each one a DDS, a mixer, a total\_power and the complex.FIR.

The DDS is implemented with a 16-bit accumulator and a 8-bit 128 word RAM, that stores the sinusoid table. The delay between filter taps is implemented using FIFO RAM blocks. The memory size (256 words) determine the maximum decimation factor in the filter. Positive and negative taps are added together before multiplication, to exploit filter simmetry.

### 3.1.2 Polyphase filter

The polyphase filter design is realized in VHDL and it is implemented in the second 1S40F780C5 Altera Stratix FPGA, with a resource usage (for two independent channels) of:

- only 2% of the total logic elements,
- 829440 (24%) memory bits,
- 8 hardware 9-bit multipliers.

The design architecture is composed of the component which implements the polyphase filtering (poly\_fir.vhd), while elements for the CPLD2 interface and for signal distribution are implemented using the same modules of the digital filter.

Two instances of poliphase filter are implemented, one for each data streams coming from the BBC unit, each one operating according to the scheme of fig. 10. The data stream is delayed in the first block by multiples of the FFT length N, using a component (decimation.vhd) implemented with a FIFO RAM of size 4N (component ishift\_ram) divided in groups of N samples each. The filter 9-bit coefficients are stored in four ROMs, and the signed sum is made with the VHDL component total\_sum.

### 3.1.3 FFT block

The two polyphase filter's outputs are sent to the FFT block, implemented partially in VHDL and partially in VERILOG and ported on the third 1S40F780C5 FPGA. The resource usage is:

- 5957 logic elements
- 471880 memory bits
- 72 9-bit hardware multipliers

The design is based on a development tool (JFFT) designed by Jeff Mock for SETI project.

The design computes two real time complex 1024 point Fast Fourier Transforms, that can operate on two indipendent sequences, because FFT computing time is N/2 clock cycles. FFT is complex, thus two real FFTs for each complex channel can be computed for a total of four parallelized FFTs. The FFT length N is fixed by design, limited by the memory ammount available on the chip (N = 1024 in the current design). The clock frequency can be modified, however, from the maximum, the sampling frequency of the input signal, down in binary steps, to obtain higher resolutions by decreasing the pass band.

The design computes the power density spectra of the signals and sends them to an integrator which can be able to store for a maximum time of 5 minutes. The integration is realized with two indipendent memories: the integration one and a dual port memory in which the data stored in the first memory is transferred and read in an asynchronous mode. The design schematic is shown in figure 7.

The FFT computation in realized with the verilog file fft.v, while the power computation is made with Altera Quartus library function altmult\_add and the integration with module integ.bdf designed with the Altera Quartus graphic tool.



## 3.2 Single channel version

Figure 16: Block diagram for the ALMA pre-prototype filterboard. The board contains one 125 MHZ ADC, one 1S10 and three 1S40 FPGA, and a CPLD2 interface

This first version of the system is based on a pre-prototype filterboard designed to test the concept of the tunable filterboard (TFB) for the ALMA correlator. This board (fig. 16) contains three Stratix EP1S40 FPGAs, one EP1S10 FPGA, one ADC (Analog AD9433, 125 MHz sampling clock), one DAC, and several auxiliary circuits.

The ADC and DAC are connected to the EP1S10, that is intended to be used to generate various test signals, and to analyze the processed data from the larger EP1S40. Wide data busses connect the chips together.

The system is available since June 2005, and is hosted, together with power supplies and a Rabbit control microprocessor, in a standard 6U Eurocard rack. The solution is not optimal from a mechanical

point of view, as the rack has a standard depth of 160 mm, while the board is an oversized 220 mm. The system is thus not easily transportable, although it works fine in the laboratory environment.

The system has been extensively used to implement and characterizes the components described in section 2. These components will be put together to implement a complete spectrometer, with a maximum observable bandwidth of 30 MHz, in the IF band from 255 to 285 MHz. The system has a single input, but can analyze two independent portions of the input band at higher resolution.

The EP1S10 signal implements a test generator, and receives the sampled data from the ADC. No real-to-complex conversion is performed. The data stream is then passed to the first EP1S40, that implements two independent digital receivers. Filtered data (that must share the same bandwidth) are then processed in the second FPGA, that performs polyphase filtering, and in the third, that implements the FFT block.

The system has not an independent clock generator, but must receive an external 120 MHz clock.

The mechanical structure has created difficulties for efficient forced ventilations of the FPGAs. Therefore dissipators have been glued to the FPGA bodies, allowing efficient thermal coupling with the airflow generated by a small fan.

## 3.3 The 8 channel version

The 8 channel version is based on the ALMA Stratix2 filterboard. It uses a larger Eurocard crate, with enough space to accommodate the large boards.

The Stratix2 board contains an array of 16 EP2S40 FPGAs, connected in a 4x4 matrix. Data inputs come from the backplane, using a 92 bit wide bus running at 125 MS/s.



Figure 17: Block diagram for the 8 channel spectrometer

The block diagram for this instrument is shown in fig. 17. The system requires a ADC board. A custom board, hosting 8 AD9480 250 MS/s converters, has been designed. Each couple of converters is connected to a XC3S200 FPGA, selected because it does not require Ball Grid Array interconnections. These devices adapt the 250 MS/s real format of the ADCs to a 125 MS/s complex data stream, that can be transmitted over the backplane bus. At most 6 bit per sample can be transmitted over the backplane. This allows for a much higher sensitivity with respect to the standard 2 or 3 bit sampling of radioastronomical spectrometers. For situations where a higher dynamic range is required, up to 8 bit per sample can be transmitted over up to 6 input channels.

A small backplane interconnects the two boards. It also hosts a 125 MHz module, for clock generation, four switching DC-DC converters, that generate the required low voltage power supplies and the computer interface.

The ALMA filterboard is reconfigured to implement 8 digital receivers, and 4 complete 2-input FFT spectrometers. The output of each ADC pair is fed to a single Stratix2 FPGA, that implements two

filters. The filter output is available to a string of three FPGAs, that will be use to implement a FFT block with the largest possible number of spectral points. To determine the optimal configuration of the chips some testing is required, but at least 1024 points per chanel can easily be obtained. Thermal dissipation is also a problem for complex designs running at high speed, and may limit the performance more than the available logic resources in the FPGA.

An industrial computer (Virtex System-on-a-chip) is used for system control. It is a standard PC, hosting a Linux OS, and is usually remotely controlled from an Ethernet socket. It has however a standard keyboard/monitor connection, can receive two USB devices (e.g. disks), and has sufficient computing power to host a complex control software.

The crate is wide enough to accomodate a second identical 8 channel backplane, that can be controlled using a second port of the control computer. Power supply and cooling have to be dimesioned differently, however, for this 16 channel version.

## 3.4 The 32 channel version

The same system can be easily extended to 32 channels, using 4 sampler and 4 TFB boards. The backplane is more complex, and thermal design is critical. Extensive experience with the 8 channel system is required to optimize its design.

| 0 | 5V<br>Power<br>Supply | 48V<br>Power<br>Supply | COMPU<br>USE | TER    | ADC<br>board | ALMA<br>TFB<br>board | ADC<br>board | ALMA<br>TFB<br>board | ADC<br>board | ALMA<br>TFB<br>board | ADC<br>board | ALMA<br>TFB<br>board | 0 |
|---|-----------------------|------------------------|--------------|--------|--------------|----------------------|--------------|----------------------|--------------|----------------------|--------------|----------------------|---|
|   |                       | On                     |              | Kbd    | 0            |                      | 0            |                      | 0            |                      | 0            |                      |   |
|   |                       | 011                    |              | Mouse  | 0            |                      | 0            |                      | 0            |                      | 0            |                      |   |
|   |                       |                        |              | Serial | 0            |                      | 0            |                      | 0            |                      | 0            |                      |   |
|   |                       |                        | Video        |        | 0            |                      | 0            |                      | 0            |                      | 0            |                      |   |
|   |                       |                        | Ethernet     |        | 0            |                      | 0            |                      | 0            |                      | 0            |                      |   |
|   |                       |                        |              |        | 0            |                      | 0            |                      | 0            |                      | 0            |                      |   |
| 0 |                       |                        |              |        | 0            |                      | 0            |                      | 0            |                      | 0            |                      | 0 |
|   |                       |                        |              |        |              |                      |              |                      |              |                      |              |                      |   |

Figure 18: Crate for the 32 cannel spectrometer

A tentative design for the crate hosting the 32 channel spectrometer is shown in fig. 18. The same structure, with only the first ADC and ALMA TFB board will be used for the 8 channel version.

# 4 Cost, timescale and future plans

The major cost of the system is in the Stratix2 boards. If an order can be placed during the fabrication stage for the ALMA correlator, 5 boards can be purchased for about 15 KEu. The ADC board development requires about 3 KEu, with a marginal cost of 500 Eu each for fabrication and components. The

control computer, rack, power supplies, fans have been purchased within running funds, for about 600 Eu.

Fabrication of the boards is expected to occur around the end of the year. Boards will be available in the first months of the next year. Backplane has been designed and fabricated, and components will be delivered within this year. ADC board design is in progress, and the final design is expected to be ready to be transmitted to the industry in November. Board fabrication and component procurement can occur at the same time, with assembly in Fall this year.

# References

- C. Broadwell (2004) "Alma Correlator Control Bus Manual" ALMA document CORL-60.00.00-020-B-MAN
- [2] G. Matthaei, L. Young, E. M. T. Jones Microwave filters, impedance-matching networks, and coupling structures, Artech House, Inc., North Bergen, NJ, USA, 1980.
- [3] J. Mock: "JFFT: A tool for generating synthesizeable verilog for streaming FFTs and polyphase filter banks" http://www.mock.com/setistuff/jfft
- [4] B. Quertier, G. Comoretto, A. Baudry, A. Gunst, A. Bos: Enhancing the Baseline ALMA Correlator Performance with the Second Generation Correlator Digital Filtercard, ALMA memo n. 476 (2003).

# Contents

| 1        | Ger                                            | neral document description      | 1  |  |  |  |  |  |  |
|----------|------------------------------------------------|---------------------------------|----|--|--|--|--|--|--|
|          | 1.1                                            | Introduction                    | 1  |  |  |  |  |  |  |
|          | 1.2                                            | Abbreviation and Acronymes      | 2  |  |  |  |  |  |  |
|          | 1.3                                            | Glossary                        | 3  |  |  |  |  |  |  |
|          | 1.4                                            | Related control Drawings        | 3  |  |  |  |  |  |  |
| <b>2</b> | Ger                                            | neral description               | 3  |  |  |  |  |  |  |
|          | 2.1                                            | Digital input                   | 5  |  |  |  |  |  |  |
|          |                                                | 2.1.1 ADC converter $\ldots$    | 5  |  |  |  |  |  |  |
|          |                                                | 2.1.2 Band limiting filter      | 5  |  |  |  |  |  |  |
|          |                                                | 2.1.3 Signal conditioning       | 7  |  |  |  |  |  |  |
|          |                                                | 2.1.4 Test signal generator     | 7  |  |  |  |  |  |  |
|          | 2.2                                            | Digital receiver and filter     | 8  |  |  |  |  |  |  |
|          | 2.3                                            | FFT Block                       | 9  |  |  |  |  |  |  |
|          |                                                | 2.3.1 Polyphase filtering       | 10 |  |  |  |  |  |  |
|          |                                                | 2.3.2 Integration block         | 11 |  |  |  |  |  |  |
|          |                                                | 2.3.3 Polarization observations | 11 |  |  |  |  |  |  |
|          | 2.4                                            | Interface bus                   |    |  |  |  |  |  |  |
|          | 2.5                                            | Control computer                | 12 |  |  |  |  |  |  |
|          | 2.6                                            | Control software                |    |  |  |  |  |  |  |
| 3 Ir     | Imp                                            | plementation                    |    |  |  |  |  |  |  |
|          | Implementation details of the individual cores | 13                              |    |  |  |  |  |  |  |
|          |                                                | 3.1.1 Digital receiver          | 14 |  |  |  |  |  |  |
|          |                                                | 3.1.2 Polyphase filter          | 15 |  |  |  |  |  |  |
|          |                                                | 3.1.3 FFT block                 | 15 |  |  |  |  |  |  |
|          | 3.2                                            | Single channel version          | 16 |  |  |  |  |  |  |
|          | 3.3                                            | The 8 channel version           | 17 |  |  |  |  |  |  |
|          | 3.4                                            | The 32 channel version          | 18 |  |  |  |  |  |  |
| 4        | $\cos$                                         | st, timescale and future plans  | 18 |  |  |  |  |  |  |