JURECA - An Overview

JURECA - An Overview

JURECA Member of the Helmholtz-Association An Overview 2015-11-26 Dorian Krause HPS group @ JSC JURECA Julich ¨ Research on Exascale Cluster Arc...

2MB Sizes 0 Downloads 7 Views

Recommend Documents

2013 Korea an overview
appeared in the Liaoning region of Manchuria and in northwestern Korea. ... Buddhist order that was markedly class-orien

Jute Retting: An Overview
AbstractJute fibre quality quends to a very great extent on the process of retting. Much research has been done so far o

Azerbaijan: An Overview - Angelfire
Thus, in modern times, this area is divided between Iranian Azerbaijan with its provincial capital at Tabriz, Iran, and

An Overview of Apartheid
An Overview of Apartheid. Many ethnic groups live in South Africa: whites (of British and. Dutch descent), blacks (of va

An Overview of Physical Examination and History Taking An Overview
BATES' GUIDE TO PHYSICAL EXAMINATION AND HISTORY TAKING and skills of good interviewing; Chapters 3 through 16 detail te

Kyrgyzstan: An Overview - Angelfire
Kyrgyzstan and its natural resources; but Kyrgyzstan did not become an ... Formerly known as Kirghizia, the Central Asia

An Overview of PayPal
Apr 24, 2008 - Primary products. – Stored value facilities for online purchases. – Payment processing for online ven

An Overview, An Evolution - Newsroom | Mohegan Sun
Mohegan Sun: An Overview, An Evolution. What began with the 1996 opening of Casino of the Earth has evolved into the Eas

PART I UNHCR: An overview
Central Africa and the Great Lakes. 1,100,070. 20. 1,100,080. 673,630. 21,900. 109,370. East and Horn of Africa. 815,170

An Overview of Services - Blackbaud
... the United States, Australia, Canada, the Netherlands and the United Kingdom. ... Target Analytics, a division of Bl

JURECA Member of the Helmholtz-Association

An Overview 2015-11-26

Dorian Krause

HPS group @ JSC

JURECA

Julich ¨ Research on Exascale Cluster Architectures Project partners: T-Platforms, ParTec FZJ next-generation general purpose production system NIC, VSR and commercial projects Replaces the decomissioned JUROPA system

Intended for mixed capacity and capability workloads Designed with big-data science needs in mind Member of the Helmholtz-Association

Cluster architecture Commodity hardware Largely based on a open-source software stack

2015-11-26

Dorian Krause

Slide 2

JURECA hardware overview

Dual-socket Intel Xeon E5-2680 v3 Haswell nodes 24 cores @ 2.5 GHz

NVIDIA K40 and K80 GPUs 128/256/512 GiB memory per node (DDR4 @ 2133 MHz) 1884 compute nodes ñ 45,216 cores 1800 TFps + 430 TFps peak performance Member of the Helmholtz-Association

InfiniBand EDR (100 Gbps per link and direction) Full fat tree topology

100 GiBps I/O bandwidth to central GPFS storage cluster

2015-11-26

Dorian Krause

Slide 3

JURECA software overview Operating system: CentOS 7.X Batch system based on Slurm/Parastation Workload management and UI ñ Slurm Resource management ñ Parastation (psid + psslurm)

Programming environment:

Member of the Helmholtz-Association

GNU Compilers, Intel Professional Fortran, C/C++ Compilers, OpenMP (Intel, GNU) CUDA Parastation MPI (based on MPICH3), Intel MPI, MVAPICH2-GDR Optimized mathematical libraries (Intel Math Kernel Library, etc.) and applications (/usr/local) 2015-11-26

Dorian Krause

Slide 4

JURECA node types Login nodes 256 GiB memory Intended for interactive work: development, compilation, interactive pre- and post-processing CPU time limits (2 hours)

Standard/slim nodes

Member of the Helmholtz-Association

128 GiB memory Default for batch jobs (batch partition) Smallest allocation is one node, charge based on wall-clock time No direct login ñ Interactive sessions with salloc and srun --forward-x --pty 2015-11-26

Dorian Krause

Slide 5

JURECA node types (2)

Fat (type 1) 256 GiB memory --gres=mem256 Included in batch

Fat (type 2)

Member of the Helmholtz-Association

512 GiB memory -p mem512 --gres=mem512 Currently in a separate mem512 partition (lower memory bandwidth)

2015-11-26

Dorian Krause

Slide 6

JURECA node types (3)

Visualization nodes ≥512 GiB memory (2 nodes with 1 TiB), 2× NVIDIA K40 -p vis --gres=gpu:[1-2] --gres=mem1024 for large memory nodes Client-server visualization requires ssh tunneling Software stack (visualization software) still

GPU nodes Member of the Helmholtz-Association

128 GiB memory, 2× NVIDIA K80 (4 visible GPUs per host) -p gpus --gres=gpu:[1-4] Available beginning of 2016

2015-11-26

Dorian Krause

Slide 7

JURECA node quantities

Node type

Member of the Helmholtz-Association

Standard/Slim Fat (type 1) Fat (type 2) Accelerated Login Visualization (type 1) Visualization (type 2)

2015-11-26

# 1605 128 64 75 12 10 2

Characteristics 24 cores, 128 GiB 24 cores, 256 GiB 24 cores, 512 GiB 24 cores, 128 GiB, 2× K80 24 cores, 256 GiB 24 cores, 512 GiB, 2× K40 24 cores, 1 TiB, 2× K40

Dorian Krause

Slide 8

JURECA: Accessing the system

$ ssh @jureca.fz-juelich.de $ ssh @jureca[01-12].fz-juelich.de Access with SSH keys

Member of the Helmholtz-Association

Recommendation: 2048 bit RSA (ssh-keygen -t rsa -b 2048) Protection of private key with non-trivial pass phrase is mandatory!

CPU time limits apply Soft limit: 2 hours

2015-11-26

Dorian Krause

Slide 9

JURECA: Accessing software (hierarchical modules) 1. List available toolchains $ module avail 2. Load a selected toolchain $ module load

Member of the Helmholtz-Association

3. List availables packages for the selected toolchain $ module avail 4. Load additional applications/libraries $ module load Search for an application/library $ module spider 2015-11-26

Dorian Krause

Slide 10

JURECA: Filesystems All user filesystems mounted from the central GPFS fileserver Julich ¨ Storage Cluster (JUST) Exception: Node local /tmp filesystem (ext4), O (10 GiB) $HOME $WORK

Member of the Helmholtz-Association

$ARCH

2015-11-26

Dorian Krause

Slide 11

JURECA: Filesystems ($HOME)

Purposes Storage of regularly used files and applications Storage of smaller files used for current computation

Daily backup Quota: Max. 10 TiB disk space and max. 3 mio. inodes per group Member of the Helmholtz-Association

$ q_dataquota [-l] Careful with chmod -R ! Safer alternative: Access control lists (ACL)

2015-11-26

Dorian Krause

Slide 12

JURECA: Filesystems ($WORK) Purpose Storage of large files used or generated by the current computation

Scratch filesystem with highest performance No backup Files will be deleted 90 days after last usage !

Member of the Helmholtz-Association

atime is not updated for performance reasons

Quota: Max. 30 TiB disk space and max. 4 mio. inodes per group $ q_dataquota [-l] Copy important files to $HOME or $ARCH 2015-11-26

Dorian Krause

Slide 13

JURECA: Filesystems ($ARCH)

Purpose Storage of large, not recently used, files

Not available on compute nodes ! Daily backup Files migrated to tapes

Member of the Helmholtz-Association

Quota: No space quota and max. 2 mio. inodes per group Usage recommendations tar/zip many small files Do not touch/move files

2015-11-26

Dorian Krause

Slide 14

JURECA: Sketch

Member of the Helmholtz-Association

...

2015-11-26

Dorian Krause

Slide 15

JURECA: Fat-tree InfiniBand topology

648p

36p 2×

648p

36p

648p

36p

...

648p

36p

36p

Member of the Helmholtz-Association

... JUST 2015-11-26

Dorian Krause

Slide 16

JURECA: NUMA architecture

DDR4

CPU

DDR4

QPI DDR4

CPU

DDR4

Member of the Helmholtz-Association

PCIe 3 HCA, GPU, . . .

2015-11-26

Dorian Krause

Slide 17

Member of the Helmholtz-Association

JURECA: Multicore

2015-11-26

Core 0

Core 1

Core 2

Core 3

Core 4

Core 5

Core 6

Core 7

Core 8

Core 9

Core 10

Core 11

Dorian Krause

Slide 18

Member of the Helmholtz-Association

JURECA: Hyper-Threading Technology

2015-11-26

HWT 0

HWT 1

HWT 2

HWT 3

HWT 12

HWT 13

HWT 14

HWT 15

HWT 4

HWT 5

HWT 6

HWT 7

HWT 16

HWT 17

HWT 18

HWT 19

HWT 8

HWT 9

HWT 10

HWT 11

HWT 20

HWT 21

HWT 22

HWT 23

Dorian Krause

Slide 19

JURECA: AVX 2.0 ISA extension c0

a0

b0

c1

a1

b1

Member of the Helmholtz-Association

+=

6

c2

a2

b2

c3

a3

b3

AVX 2.0 ISA extension ñ Two 256-bit wide multiply-adds per cycle !

2015-11-26

Dorian Krause

Slide 20

Further information

motd: Message of the day Information about preventive and emergency maintenances Information about system configuration changes

On-line documentation http://www.fz-juelich.de/ias/jsc/jureca

Member of the Helmholtz-Association

User support at FZJ [email protected] Phone: 02461 61-2828

2015-11-26

Dorian Krause

Slide 21