of the WideX correlator
M.T. Jan, 2008
Rev Jan 2010
|By the end of 2006 the usable
signal bandwidth of the PdB
interferometer jumped from 500 MHz to 4 GHz per polarization.
match this change a correlator project was issued mid-2005, with the
master design guideline being to put it on the sky in the shortest
possible time. WideX stands for "Wideband Express". Classical lag
architecture and 2-bit sampling was adopted. For several reasons, in
initial design specification
has been upscaled by a factor 2 in resolution and
by another factor 2 in baseline number. From a technical point of view,
this has changed the original cool design into something
hotter. The installation on the PdB site took place February 2010.
The correlator uses bandpass sampling on input analog signals ranging
from 2 to 4 GHz. These are provided by the IF
and the optical IF
The system is built after the diagram below.
It is based on a 2048-channel, 250 MHz correlator ASIC
that dissipates 1.6 watts. The correlator can work on 8
antennas, although the IF transport and
processing is currently limited to 6.
power dissipation (1.25kW per unit), is such that forced air cooling
has been adopted. Cooled air
is ducted from under the floor and blown thru the units.
The 2-4 Ghz bands are digitized at 4 Gsp/s (2 bits) by the SHERIF
antialiasing filter is
required, as the bands are already conditioned by the IF processor. The
4 GHz frequency and the 250 MHz frequency need a very accurate relative
timing so they are distributed via coaxial cables on the front panel.
The signal level is adjusted to the proper
means of a
built-in digital attenuator (32 dB, 0.5 dB steps). The three comparator outputs are encoded
and delivered to the
commercial demultiplexers. Unfortunately they come
built-in /16 counter and all (8 modules,2 bits per module)
to be disciplined (see ALMA memo
on this method
master 250 MHz clock. A packet of 16 (2-bit)
samples is delivered every 4 ns to the delay board FPGA.
The delay board acts as a motherboard for the sampling heads.
For individual testability, the sampling head is externally
clocked from the front panel. The data driver layers are
clocked via the backplane
The data format conversion
records the data at 4GHz in the FPGA internal RAM and
plays it at 250 Msamples/sec on
16 physical streams. At this point, careful management of the
write/read pointers provide the bulk delay system at zero extra cost (see
). The delay range is 16 microseconds, and the resolution is16 samples, or 4 nanoseconds. Subsample resolution is
obtained by linear ponderation of the FFT in the real-time software. See the practical
A digital total power detector measures the energy of the 2GHz band by
squaring the digital
stream and integrating it upon 31.25 ms. Its value is present in every
is used to initialize the input attenuator then
the evolution of the receiver level. The classsical S-law is
corrected for in the software.
Backplane and timings
The Backplane distributes the 16 streams from each of the 8 Delay
boards to the 16 correlator boards. All the connections of the
backplane are point-to-point.
Connections are impedance-matched coplanar differential lines carrying
LVDS at 250 MHz. The
clock signal is fanned out by multiple output buffers (one per client
board, total 16+8)) located on the Readout board.
Each delay board has a 16-sample, two bits per sample, two lines per
bit output (64 pins). Each correlator board processes the 8 inputs of
one stream , two bits
per input, two lines per bit ( 32 pins).
System-wide synchronization is
achieved by accurate timing control of every hardware
This solution has been preferred to "automatic" alignment protocols.
The basic clock
period is 4 ns and the worst case transit time across the backplane is
~2ns . In order to insure proper eye diagrams at
correlator board, all the 512 data paths of the 22-layer backplane are
in time by means of printed delay lines. Similarly, the 4 GHz
sampling head internal timing
is achieved by construction and requires no individual
adjustment. The sampling head clocks are fed by
external coax cables of
critical length, carrying sinewaves at 4GHz and 250 MHz from the
central unit clock generator. A couple of front panel coding wheels
allow fine tuning
of the whole unit. Comfortable timing margins of +/- 0.9 ns
correlator chip is a full 2-bit, 2048 channel
of counterflowing architecture clocked at 250 MHz, designed by
Barco-Silex under IRAM specification.
The chip is implemented in 0.12 micrometer gate array technology
The signal trip (clocked @ 250 MHz)
The X shifter inputs can
be directed to the input pins or to
data should the readout system need troubleshooting.
Similarly the Y
shifter inputs can be directed to input pins, static data, or to the
end of the X-shifter in order to implement the autocorrelation of the X
The X and Y signal outputs can be connected either to the
corresponding signal inputs pins (repeat mode) , or to
the end of the corresponding shifter(cascade
When in repeat mode, the chip acts as a signal buffer that feeds
the following chip. It adds one resynchronization layer in order to
prevent the data timings drifting too much along the board.
||The data trip (every 31.25 ms)
The product value is integrated in
a 16-bit accumulator.At the end of an integration phase, a readout
sequence is triggered, causing its
contents to be placed
in a long shift register which is later pushed out serially.
The accumulator is cleared for the next phase.
The digitized sky signal is delivered in 16 streams of 250Ms/s. Every
stream is completely processed (28 baselines) by one board. One chip
processes one baseline. Autocorrelation may be processed by several
chips (see maps
) . Every chip
uses the built-in
for data and clock signals, so
point-to-point distribution is preserved within the board. There are 4
rows of 7 chips. The timing continuously shifts along the
the connector to the end of the board. It is constant across the
columns. This arrangement is known as "systolic timing". It has the
inconvenient of accumulating the clock jitter every repetition layer,
which can lead to problems
the last rows. As a result the last rows are time offsetted
by several clock periods but this does not alter the functionality. The
generic scheme for chip interconnection
The 16-bit integrators are pushed out serially at 65 MHz .The
serial output of one chip is connected to the serial input of
neighbour chip to form a very long, (917.504-bit) shift register that
contains all the data of one card. The
time is 15 msec per board.
The contents of the
integrators and other correlator data are collected
and formatted by an
hardware (FPGA) interface and transferred to a dedicated Linux PC.
An high speed optic fiber serial data link to the computer is hosted on
board. It is a CERN design named DDL
that has been found very adequate to this need. More
detailed information here
The board also hosts the PLL sampling clock generator and the
adjustable clock distribution, for this reason it is located
at the center of the unit.
The FPGA drives various
monitor and control circuits and
routes them across the correlator. The commands are executed at the
next 32pps event after reception from the computer.
Processing tasks performed by the
software during operation
For the 8 inputs, the attenuators are set to their
optimal value, according to the digital total power detector
2/ Every second
Every second the Dual LO2 module is set to stop the
rate (at the center) of both 2 GHz subbands. (see LO2
Every second the coarse delay is sent to the delay boards.
3/ 32 times per second
The analog bandwidth (2GHz) and the baseline length
(1km) make that the
longest period during which the correlation can be considered
During this period :
The raw correlations from the
correlator chips are serially transmitted to the readout card. After
some formatting, this one
immediately transfers them to the high speed optic link. There are
channel, 2048 channels per chip, 28 chips per board, and 16
chassis. The total dump time is 15
msec,at an average data rate of 470
In the computer, and for each baseline:
-The corresponding channels emanating from the 16 boards are summed to
form a 2048 point x 20-bit cross-correlation function.
-The Van Vleck correction is applied.
-The Fourier transform is applied.
-The fractional delay linear slope is applied. Since
the hardware can only deal with packets of 16 samples, the
range has to be +/- 8 samples or -2ns...+2ns. The minimum
resolution is 31ps or1/8th sample.
-The data is integrated in one of 4 buffers, according to the
function value for that sequence.
The 4 unit PC's have no hard disk drives. Every
data is transmitted from the PC to further software layers.
3,670016 millions of
elementary channels. For development and also for maintenance
the tests must be largely automated. A powerful test software
developed, according to a very primitive test
Detailed images of the
inside of the machine can be seen here