Main technical features of the WideX correlator

    M.T. Jan, 2008 Rev Jan 2010



Scope
By the end of 2006 the usable signal bandwidth of  the PdB interferometer jumped from 500 MHz to 4 GHz  per polarization. To match this change a correlator project was issued mid-2005, with the master design guideline being to put it on the sky in the shortest possible time. WideX stands for "Wideband Express". Classical lag architecture and 2-bit sampling was adopted. For several reasons, in 2007, the initial design specification has been upscaled by a factor 2 in resolution and by another factor 2 in baseline number. From a technical point of view, this has  changed the original cool design into something hotter. The installation on the PdB site took place February 2010.




Characteristics

Overview

The correlator uses bandpass sampling on input analog signals ranging from 2 to 4 GHz. These are provided by the IF processor and the optical IF optical transmission system.

The system is built after the diagram below.  It is based on a 2048-channel, 250 MHz correlator ASIC that dissipates 1.6 watts. The correlator can work on 8 antennas, although  the IF transport and processing is currently  limited to 6.





The total power dissipation (1.25kW per unit), is such that forced air cooling has been adopted. Cooled air is ducted from under the floor and blown thru the units.



Sampling heads

The 2-4 Ghz bands are digitized at 4 Gsp/s (2 bits) by the SHERIF module. No antialiasing filter is required, as the bands are already conditioned by the IF processor. The 4 GHz frequency and the 250 MHz frequency need a very accurate relative timing so they are distributed via coaxial cables on the front panel.

The signal level is adjusted to the proper  V/σ   by means of a built-in digital attenuator (32 dB, 0.5 dB steps). The three comparator outputs are encoded and delivered to the commercial demultiplexers. Unfortunately they come with their own built-in /16 counter and all  (8 modules,2 bits per module) need to be disciplined (see ALMA memo on this method) on the master 250 MHz clock.  A packet of 16 (2-bit) samples  is delivered every 4 ns to the delay board FPGA.


Delay boards

The delay board acts as a motherboard for the sampling heads. For individual testability, the sampling head is externally clocked  from the front panel. The data driver layers are clocked via the backplane distribution.
The  data format conversion system records the data at 4GHz in the FPGA internal RAM and plays it at 250 Msamples/sec on 16 physical streams. At this point, careful management of the write/read pointers provide the bulk delay system at zero extra cost (see presentation). The delay range is 16 microseconds, and the resolution is16 samples, or 4 nanoseconds. Subsample resolution is obtained by linear ponderation of the FFT in the real-time software. See the practical values for PdB.

A digital total power detector measures the energy of the 2GHz band by squaring the digital stream and integrating it upon 31.25 ms. Its value is present in every dump and is used to initialize the input attenuator  then to monitor the evolution of the receiver level. The classsical S-law is corrected for in the software.

Backplane and timings

The Backplane distributes the 16 streams from each of the 8 Delay boards to the 16 correlator boards. All the connections of the backplane are  point-to-point. Connections are impedance-matched coplanar differential lines carrying LVDS at 250 MHz. The clock signal is fanned out by multiple output buffers (one per client board, total 16+8)) located on the Readout board.
Each delay board has a 16-sample, two bits per sample, two lines per bit output (64 pins). Each correlator board processes the 8 inputs of one stream , two bits per input, two lines per bit ( 32 pins).




System-wide synchronization is achieved by accurate timing control of every hardware element.  This solution has been preferred to "automatic" alignment protocols. The basic clock period is 4 ns and the worst case transit time across the backplane is ~2ns  . In order to insure proper eye diagrams at every correlator board, all the 512 data paths of the 22-layer backplane are equalized in time by means of printed delay lines.  Similarly, the 4 GHz sampling head internal timing is  achieved by construction and requires no individual adjustment. The sampling head clocks are fed by external coax cables of critical length, carrying sinewaves at 4GHz and 250 MHz from the central unit clock generator. A couple of front panel coding wheels allow fine tuning  of the whole unit. Comfortable timing margins of +/- 0.9 ns are achieved.


Correlator chip


The correlator chip is a full 2-bit, 2048 channel of counterflowing architecture clocked at 250 MHz, designed by Barco-Silex under IRAM specification.



The chip is implemented in  0.12 micrometer gate array technology from NEC.


The signal trip (clocked @ 250 MHz)

The X  shifter inputs can be directed to the input pins or to static data should the readout system need troubleshooting.

Similarly the Y shifter inputs can be directed to input pins, static data, or to the end of the X-shifter in order to implement the autocorrelation of the X signal.

The X and Y signal outputs can  be connected either to the corresponding  signal inputs pins (repeat mode) , or to the end of the corresponding shifter(cascade mode).

When in repeat mode, the chip acts as a signal buffer that feeds the following chip. It adds one resynchronization layer in order to prevent the data timings drifting too much along the board.


The data trip (every 31.25 ms)

The product value is integrated in a 16-bit accumulator.At the end of an integration phase, a readout sequence is triggered, causing  its contents to be placed in a long shift register which is later pushed out serially.
The accumulator is cleared for the next phase.


Correlator board
 
The digitized sky signal is delivered in 16 streams of 250Ms/s. Every stream is completely processed (28 baselines) by one board. One chip processes one baseline. Autocorrelation may be processed by several chips (see maps) . Every chip uses the  built-in repeaters for data and clock signals, so point-to-point distribution is preserved within the board. There are 4 rows of  7 chips. The timing continuously shifts along the rows, from the connector to the end of the board. It is constant across the columns. This arrangement is known as "systolic timing". It has the inconvenient of accumulating the clock jitter every repetition layer, which can lead to problems on the last rows. As a result the last rows are  time offsetted by several clock periods but this does not alter the functionality. The generic scheme for chip interconnection is shown below.





The 16-bit integrators are pushed out serially at 65 MHz .The serial output  of one chip is connected to the serial input of a neighbour chip to form a very long, (917.504-bit) shift register that contains all the data of one card. The total dump time is 15 msec per board.


Readout board

The contents of the integrators and other correlator data are collected and formatted by an hardware (FPGA) interface and transferred to a dedicated Linux PC. An high speed optic fiber serial data link to the computer is hosted on this board. It is a CERN design named DDL  that has been found very adequate to this need. More detailed information here.
The board also hosts the PLL sampling clock generator and the adjustable clock distribution,  for this reason it is located at the center of the unit.
The FPGA drives various monitor and control circuits  and routes them across the correlator. The commands are executed at the next 32pps event after reception from the computer.



  Processing tasks performed by the software during operation


1/ Configuration
For the 8 inputs, the attenuators are set to their optimal value, according to the digital total power detector measurements.
2/ Every second
Every second the Dual LO2 module is set to stop the residual fringe rate (at the center) of both 2 GHz subbands. (see LO2 manual (PDF))
Every second the coarse delay is sent to the delay boards.
3/  32 times per second
The analog bandwidth (2GHz) and the baseline length (1km) make that the longest period during which the correlation can be considered stationary is 31.25msec. During this period :
The raw correlations from the correlator chips are serially transmitted to the readout card. After some formatting, this one immediately transfers them to the high speed optic link. There are 16  bits per channel, 2048 channels per chip, 28 chips  per board, and 16 boards per chassis. The total dump time is 15 msec,at an average data rate of 470 Mbits/sec. 

In the computer, and for each baseline:
-The corresponding channels emanating from the 16 boards are summed to form a 2048 point x 20-bit  cross-correlation function.
-The Van Vleck correction is applied.
-The Fourier transform is applied.
-The fractional delay linear slope is applied. Since the hardware can only deal with packets of 16 samples, the range has to be  +/- 8 samples or -2ns...+2ns. The minimum resolution is 31ps or1/8th sample.
-The data is integrated in one of  4 buffers, according to the Walsh function value for that sequence.
4/ Storage
The 4 unit  PC's have no hard disk drives. Every second  the data is transmitted from the PC to further software layers.

Tests

This machine totalizes  3,670016 millions of elementary  channels. For development and also for maintenance , the tests must be largely automated. A powerful test software has been developed, according to a very primitive test plan.


Visual Documentation
Detailed images of the inside of the machine can be seen here.