Upload
trannhan
View
228
Download
1
Embed Size (px)
Citation preview
Update on DiFX at the KVN And: GPU Spectrometer-Correlator Activities
9th DiFX Meeting, University of Tasmania, Hobart16–18 November 2015
Jan Wagner (KASI)Korean VLBI Network (KVN) Group + ALMA Group
Korea Astronomy and Space Science Institute
Korean and Japanese VLBI Networks (KVN, VERA)
East Sea
KVN
VERA
KVN : Yonsei (Seoul), Ulsan, Tanma (Jeju)VERA : Mizusawa, Ishigaki, Iriki, Ogasawara
uv coverage
KVN+VERA: beam 1.4 mas@22GHz, 0.7 mas@43GHzKVN only: beam 6/3/1.5/1 mas@22/43/86/129 GHz
1
22GHz
43GHz
86GHz
129GHz
x2
x4
x6
ref.
The Korean Simultaneous-Multifrequency VLBI SystemHan et al. 2013
SNR~213(~90% of theoretical SNR)
Frequency phase transfer (e.g., increase coherence time)
2
Korea-Japan Correlation Center (KJCC)
H.R. Kim (DiFX) D.G. Roh (HW) J.H. Yeom D.K. Jung
C.S. Oh S.J. OhJ.S. Shin
Correlator routine operations, mainly HW Additional people partly in SW correl.
Jongsoo Kim computing cluster and distributed storage admin, DiFX testing, …
Taehyun Jung realtime e-VLBI, PCal, K-band geo-VLBI, …
MyselfDiFX and Mark6 go-to person, developer(?), 8 Gbps+, e-VLBI, PCal, KVN-VERA DiFX tests, pulsar, …
3
Korea-Japan Correlation Center (KJCC) – Daejeon Correlator
Storage / buffer Correlator“Station Units” VERA data playback
Mark5B/5C/6 data playback
4
Korea-Japan Correlation Center (KJCC) – Computing Cluster
Mainly storage upgrades since the DiFX meeting inKorea, expansions for ALMA, and Cosmology Group.
The main DiFX cluster has 35 x 16-core nodes
Newly expanded cluster also has:- 1 x 64-core IBM 8722C2K 55 TB local- 1 x 36-core node Xeon E5-2699 560 TB local- 5 x 64-core Dell PowerEdge R730 1 PB Lustre-FS- 1 x 16-core Dell PowerEdge T630; GPU computing
Local Mark5B/5C/6 connected via 10G or 1G.Nodes connected via IB. Partial FDR upgrade ongoing.
Cluster+KVN stations are on 10G links to KREONET.(But, remote correlation off stations still problematic)
Small part of HW/SW storageand DiFX cluster
5
DiFX at the KJCC
• Now ~400/yr DiFX-correlated KVN experiments since KVN start ~2011
• Types of experiments correlated in DiFXScience : AGN, AGN monitoring, stellar masers, spectral line, μQSO, pulsarsTest : real-time e-VLBI, 8 Gbps, Phase Cal, KVN-VERA DiFX, 230 GHz VLBI
6
#Experiments KVN or KaVA(KVN-VERA) #Hours KVN/KaVA/EVN
Obs. time ≈ correl. time
DiFX at the KJCC
• Software development related to DiFX• External : corrupt VDIF fixing, VEX cleanup, KVN-specific job preparations, …
• In DiFX : minor changes for KVN 200 MHz PCal
• In libraries: mark5access and vdifio Python bindings, Mark6sg library
6
Additions to DiFX : KVN PCal
KVN Phase Cal has 200 MHz spacing : minor modifications to DiFX.Added new general DiFX .PCAL plotting utility plotDiFXPCal.py.
7
Additions to DiFX : mark5access Python bindings
Python bindings : raw VLBI data can now be decoded quickly in Python. Primarily for data quality checks, and perhaps custom single-dish data processing.
import ctypes, numpy
import mark5access as m5lib
# Open the input file
m5file = m5lib.new_mark5_stream_file(fn, ctypes.c_longlong(offset))
m5fmt = m5lib.new_mark5_format_generic_from_string(fmt)
ms = m5lib.new_mark5_stream_absorb(m5file, m5fmt)
# Let mark5access decode a piece of data
nsamples = 8192
pdata = m5lib.helpers.make_decoder_array(ms, nsamples, dtype=ctypes.c_float)
rc = m5lib.mark5_stream_decode(ms, nsamples, pdata)
# Do something with the data of each recorded channel
ch_data = [ ctypes.cast(pdata[ii],ctypes.POINTER(ctypes.c_float* nsamples))
for ii in range(ms.contents.nchan) ]
for ii in range(ms.contents.nchan):
x = numpy.frombuffer(ch_data[ii].contents, dtype='float32')8
Additions to DiFX : mark5access Python examples
m5stat.py : for data quality checking, similar to m5bstate but allows ≥2-bit datam5spec.py : for data checking, a m5spec-like spectrometer with immediate plotting
9
Additions to DiFX : mark5access Python examples (2)
m5iacorr.py : 2nd-order autocorrelationi.e. intensity autocorrelation (a Hilberttransform example)
m5subband.py : extracts a spectral subbandinto a new VDIF file via filtering.
Spectrum of 2 MHz time domain data extracted from a 256 MHz wide bandrecording of H2O maser
10
Additions to DiFX : mark6sg library
• Offers open(), read(), close() interface for reading Mark6 scatter-gather scans.Throughput ~2 GB/s for 2 x 8-disk modules.
• Includes some library usage example programs: fuseMk6, mk6copy, m6sg_gather.
• Initially thought it might help to make new “DiFX Native Mark6 support” trivial.
(Un)Mounting disks of all attached modules$ m6sg_mount [-u]
Copying : library example vs. standard ‘gather’$ mk6copy -s test_KY_170-0829.vdif /dev/null
... copied: 27.04/303.0 GB Mark6 read rate: 2183.99 MB/s
$ gather /mnt/disks/[1-2]/[0-7]/test_KY_170-0829.vdif - | pv > /dev/null
... 57GB 0:00:39 [1.46GB/s]
File system –like access : library example vs. similar new ‘vdifuse’$ fuseMk6 /mnt/mark6sg
$ vdifuse -a ~/vdifuse-cachefile -xm6sg -xrate=12500 /mnt/vdifuse /mnt/disks/[1-2]/[0-7]
11
Background
• Initally• Started with JS Kim’s plans for GPU spectrometer for ongoing
Korean Focal Plane Array development for ASTE (near ALMA)
• Basic single-pol FFT spectrometer
• Expanded to• General Dual-Pol GPU FFT and GPU Polyphase Filter Bank spectrometer
• Candidates for wideband samplers (2—20 Gs/s) for ASTE and maybe VLBI
• KVN Digital Spectrometer (VSI input) upgrade to GPU-based alternative
• ACA Total Power array (OC-192 input) software spectrometer and correlator
12
GPU Spectrometer-Correlator Activities
Note: I/O bottleneck to GPU VLBI/single-dish processing isthe slow network. GPU PCIe gen3/4 x16 beats 10GbE/IB. 13
• One spectrometer node houses 1– 4 single-GPU cards• Forms spectra and cross-spectra in 512 MHz – 10 GHz wide IFs• Samples via10G/40G (KVN/ASTE; 2-bit) or “OC-192” (ALMA; 3-bit)
Cross-Power Spectrometer on nVidia GPU
Plan: GPU spectrometer rackfor ACA TP (Iguchi, Asayama)
Currently: nVidia Tesla K40m and two TITAN X in a Dell PowerEdge T630
Cross-Power Spectrometer Performance
One TITAN X GPU can handle about 4 Gsamples/second = 2 GHz bandwidth = 1 GHz in dual-pol
15
Cross-Power Spectrometer Performance
Performance breakdown by processing steps:• Data input (Host->GPU) 48 Gs/s (2-bit samples @ 12 GByte/s)
• Decoding 2-bit/3-bit to float 50–59 Gs/s (for single, and dual pol data)
• Window function 33 Gs/s (one cos() call per sample)
• Fourier transform 5–16 Gs/s (CUFFT library)
• Spectral averaging (AC+XC) 66 Gs/s (similar to VLBI averaging)
Speed-up via multi-GPU:• Got linear improvement (#GPUs) by frequency/time division of raw input data
Future ‘DiFX-GPU’? DiFX core swapped for “feature-free” GPU core?• Performance might be similar to windowed-FFT cross-power spectrometer?
16
Compensated Summation
From Constatine & Gleich, "Sparse Matrix Computations in MapReduce", 2013
Compensated summation
Summation (standard approach) Floating point errors
Compensated Summation
From Constatine & Gleich, "Sparse Matrix Computations in MapReduce", 2013
Compensated summation
Summation (standard approach) Floating point errors