23
Update on DiFX at the KVN And: GPU Spectrometer-Correlator Activities 9 th DiFX Meeting, University of Tasmania, Hobart 16–18 November 2015 Jan Wagner (KASI) Korean VLBI Network (KVN) Group + ALMA Group Korea Astronomy and Space Science Institute

Update on DiFX at the KVN - Australia Telescope National ... · Update on DiFX at the KVN And: GPU Spectrometer-Correlator Activities 9th DiFX Meeting, ... import ctypes, numpy import

Embed Size (px)

Citation preview

Update on DiFX at the KVN And: GPU Spectrometer-Correlator Activities

9th DiFX Meeting, University of Tasmania, Hobart16–18 November 2015

Jan Wagner (KASI)Korean VLBI Network (KVN) Group + ALMA Group

Korea Astronomy and Space Science Institute

Korean and Japanese VLBI Networks (KVN, VERA)

East Sea

KVN

VERA

KVN : Yonsei (Seoul), Ulsan, Tanma (Jeju)VERA : Mizusawa, Ishigaki, Iriki, Ogasawara

uv coverage

KVN+VERA: beam 1.4 mas@22GHz, 0.7 mas@43GHzKVN only: beam 6/3/1.5/1 mas@22/43/86/129 GHz

1

22GHz

43GHz

86GHz

129GHz

x2

x4

x6

ref.

The Korean Simultaneous-Multifrequency VLBI SystemHan et al. 2013

SNR~213(~90% of theoretical SNR)

Frequency phase transfer (e.g., increase coherence time)

2

Korea-Japan Correlation Center (KJCC)

H.R. Kim (DiFX) D.G. Roh (HW) J.H. Yeom D.K. Jung

C.S. Oh S.J. OhJ.S. Shin

Correlator routine operations, mainly HW Additional people partly in SW correl.

Jongsoo Kim computing cluster and distributed storage admin, DiFX testing, …

Taehyun Jung realtime e-VLBI, PCal, K-band geo-VLBI, …

MyselfDiFX and Mark6 go-to person, developer(?), 8 Gbps+, e-VLBI, PCal, KVN-VERA DiFX tests, pulsar, …

3

Korea-Japan Correlation Center (KJCC) – Daejeon Correlator

Storage / buffer Correlator“Station Units” VERA data playback

Mark5B/5C/6 data playback

4

Korea-Japan Correlation Center (KJCC) – Computing Cluster

Mainly storage upgrades since the DiFX meeting inKorea, expansions for ALMA, and Cosmology Group.

The main DiFX cluster has 35 x 16-core nodes

Newly expanded cluster also has:- 1 x 64-core IBM 8722C2K 55 TB local- 1 x 36-core node Xeon E5-2699 560 TB local- 5 x 64-core Dell PowerEdge R730 1 PB Lustre-FS- 1 x 16-core Dell PowerEdge T630; GPU computing

Local Mark5B/5C/6 connected via 10G or 1G.Nodes connected via IB. Partial FDR upgrade ongoing.

Cluster+KVN stations are on 10G links to KREONET.(But, remote correlation off stations still problematic)

Small part of HW/SW storageand DiFX cluster

5

DiFX at the KJCC

• Now ~400/yr DiFX-correlated KVN experiments since KVN start ~2011

• Types of experiments correlated in DiFXScience : AGN, AGN monitoring, stellar masers, spectral line, μQSO, pulsarsTest : real-time e-VLBI, 8 Gbps, Phase Cal, KVN-VERA DiFX, 230 GHz VLBI

6

#Experiments KVN or KaVA(KVN-VERA) #Hours KVN/KaVA/EVN

Obs. time ≈ correl. time

DiFX at the KJCC

• Software development related to DiFX• External : corrupt VDIF fixing, VEX cleanup, KVN-specific job preparations, …

• In DiFX : minor changes for KVN 200 MHz PCal

• In libraries: mark5access and vdifio Python bindings, Mark6sg library

6

Additions to DiFX : KVN PCal

KVN Phase Cal has 200 MHz spacing : minor modifications to DiFX.Added new general DiFX .PCAL plotting utility plotDiFXPCal.py.

7

Additions to DiFX : mark5access Python bindings

Python bindings : raw VLBI data can now be decoded quickly in Python. Primarily for data quality checks, and perhaps custom single-dish data processing.

import ctypes, numpy

import mark5access as m5lib

# Open the input file

m5file = m5lib.new_mark5_stream_file(fn, ctypes.c_longlong(offset))

m5fmt = m5lib.new_mark5_format_generic_from_string(fmt)

ms = m5lib.new_mark5_stream_absorb(m5file, m5fmt)

# Let mark5access decode a piece of data

nsamples = 8192

pdata = m5lib.helpers.make_decoder_array(ms, nsamples, dtype=ctypes.c_float)

rc = m5lib.mark5_stream_decode(ms, nsamples, pdata)

# Do something with the data of each recorded channel

ch_data = [ ctypes.cast(pdata[ii],ctypes.POINTER(ctypes.c_float* nsamples))

for ii in range(ms.contents.nchan) ]

for ii in range(ms.contents.nchan):

x = numpy.frombuffer(ch_data[ii].contents, dtype='float32')8

Additions to DiFX : mark5access Python examples

m5stat.py : for data quality checking, similar to m5bstate but allows ≥2-bit datam5spec.py : for data checking, a m5spec-like spectrometer with immediate plotting

9

Additions to DiFX : mark5access Python examples (2)

m5iacorr.py : 2nd-order autocorrelationi.e. intensity autocorrelation (a Hilberttransform example)

m5subband.py : extracts a spectral subbandinto a new VDIF file via filtering.

Spectrum of 2 MHz time domain data extracted from a 256 MHz wide bandrecording of H2O maser

10

Additions to DiFX : mark6sg library

• Offers open(), read(), close() interface for reading Mark6 scatter-gather scans.Throughput ~2 GB/s for 2 x 8-disk modules.

• Includes some library usage example programs: fuseMk6, mk6copy, m6sg_gather.

• Initially thought it might help to make new “DiFX Native Mark6 support” trivial.

(Un)Mounting disks of all attached modules$ m6sg_mount [-u]

Copying : library example vs. standard ‘gather’$ mk6copy -s test_KY_170-0829.vdif /dev/null

... copied: 27.04/303.0 GB Mark6 read rate: 2183.99 MB/s

$ gather /mnt/disks/[1-2]/[0-7]/test_KY_170-0829.vdif - | pv > /dev/null

... 57GB 0:00:39 [1.46GB/s]

File system –like access : library example vs. similar new ‘vdifuse’$ fuseMk6 /mnt/mark6sg

$ vdifuse -a ~/vdifuse-cachefile -xm6sg -xrate=12500 /mnt/vdifuse /mnt/disks/[1-2]/[0-7]

11

GPU Spectrometer-Correlator Activities

Background

• Initally• Started with JS Kim’s plans for GPU spectrometer for ongoing

Korean Focal Plane Array development for ASTE (near ALMA)

• Basic single-pol FFT spectrometer

• Expanded to• General Dual-Pol GPU FFT and GPU Polyphase Filter Bank spectrometer

• Candidates for wideband samplers (2—20 Gs/s) for ASTE and maybe VLBI

• KVN Digital Spectrometer (VSI input) upgrade to GPU-based alternative

• ACA Total Power array (OC-192 input) software spectrometer and correlator

12

GPU Spectrometer-Correlator Activities

Note: I/O bottleneck to GPU VLBI/single-dish processing isthe slow network. GPU PCIe gen3/4 x16 beats 10GbE/IB. 13

• One spectrometer node houses 1– 4 single-GPU cards• Forms spectra and cross-spectra in 512 MHz – 10 GHz wide IFs• Samples via10G/40G (KVN/ASTE; 2-bit) or “OC-192” (ALMA; 3-bit)

Cross-Power Spectrometer on nVidia GPU

Plan: GPU spectrometer rackfor ACA TP (Iguchi, Asayama)

Currently: nVidia Tesla K40m and two TITAN X in a Dell PowerEdge T630

Cross-Power Spectrometer Performance

One TITAN X GPU can handle about 4 Gsamples/second = 2 GHz bandwidth = 1 GHz in dual-pol

15

Cross-Power Spectrometer Performance

Performance breakdown by processing steps:• Data input (Host->GPU) 48 Gs/s (2-bit samples @ 12 GByte/s)

• Decoding 2-bit/3-bit to float 50–59 Gs/s (for single, and dual pol data)

• Window function 33 Gs/s (one cos() call per sample)

• Fourier transform 5–16 Gs/s (CUFFT library)

• Spectral averaging (AC+XC) 66 Gs/s (similar to VLBI averaging)

Speed-up via multi-GPU:• Got linear improvement (#GPUs) by frequency/time division of raw input data

Future ‘DiFX-GPU’? DiFX core swapped for “feature-free” GPU core?• Performance might be similar to windowed-FFT cross-power spectrometer?

16

Thanks!

Compensated Summation

From Constatine & Gleich, "Sparse Matrix Computations in MapReduce", 2013

Compensated summation

Summation (standard approach) Floating point errors

Compensated Summation

From Constatine & Gleich, "Sparse Matrix Computations in MapReduce", 2013

Compensated summation

Summation (standard approach) Floating point errors

Compensated Summation