TSM 5.2 Experiences Lothar Wollschläger Zentralinstitut für Angewandte Mathematik Forschungszentrum Jülich L.Wollschlaeger@fz-juelich.de Contents TSM Test Configuration Supercomputer Data Management TSM-HSM for GPFS TSM for the installed IBM Supercomputer TSM for the planned IBM Supercomputer TSM 5.2 Backup Server 1
rz TSM Test Configuration 2
je 3 STK 9940B pro StorageTek 9310 TSM Test Configuration 2 RAID5 Volumes mit 200 GB für DB FAStT 700 1 RAID5 Volume mit 270 GB für STG Gigabit Ethernet P690 2 CPU 4 GB AIX 5.1 TSM Software Configuration TSM 5.2 Beta Server TSM 5.2 Beta GPFS Client GPFS 2.1 Gresham EDT-DistribuTAPE 6.4.3 STK ACSLS 6.1.1 3
TSM Beta Test Archive function without problems Backup function without problems Tape devices and external tape manager without problems Tape performance 30 MB/sec HSM function with problems HSM finction in GA Version without problems Current IBM Supercomputer Gigabit Ethernet 6 Compute Nodes SAN Storage Devices 4
6 Compute LPARs 1 TSM Server LPAR SAN SAN RAID Controller 10 TB Tape Robot 16 Tape Units, 1.2 PB 1 TSM Server LPAR 4 CPUS 8 GB Main Memory 14 FAStT disk (68 GB) SAN Tape Robot 16 Tape Units, 1.2 PB STK Roboter 5500 Cartridges with 200 GB each 16 Tape units 9940B 30 MB/sec 15 seconds mount time 5
FAStT 700 : hot spare Disks for TSM Controller A Controller B home1 home2 home3 home4 home5 work reserve home1 home2 home3 home4 home5 work Disks for TSM Disks for Testsystem Controller A Controller B Controller A Controller B work home5 home4 home3 home2 home1 reserve work home5 home4 home3 home2 home1 work home5 home4 home3 home2 home1 reserve work home5 home4 home3 home2 home1 23.9.2003 Lothar Wollschläger 6
RAID5 arrays(0.25tb) building blocks (hdisk) 5 home filesystems and 1 work filesystem each filesystem has a capacity of 1 TB no single point of failure 10 hot spares 2 arrays (hdisks) (2 x 0.25 TB) in reserve Filesystemlayout No local user data on the compute nodes All user data in Global Parallel File System (GPFS) Each user has his datasets in one filesystem /work for temporary datasets 7
TSM and the Supercomputer incremental backup with TSM Hierarchical Storage Management with TSM each filesystem has its own TSM Server all TSM Servers on one LPAR TSM Data (DB,Log,STG) on FAStT One TSM Server for backup of normal TSM Server Recovery 300 GB Filesystem with 3000 Datasets 1. recreate GPFS filesystem 2. restore all datasets from TSM Backup server 3.5 hours (24 MB/sec) 3. Filesystem available 8
Recovery Problem: recovery of large filesystems take a long time (1 TB in 12 hours, 10 TB in 5 days) more then one user filesystem: if one filesystem is damaged, the other users can work one TSM Server per filesystem keeps the TSM database small 16 tape units allows parallel restore migrate as much datasets as possible and restore only the inodes Recovery Filesystem defect 1. recreate GPFS Filesystem 2. restore directory structure from TSM Backup server 3. recreate inodes of migrated datasets from TSM HSM server 4. Restore all other datasets from TSM Backup server 5. Filesystem available 9
Recovery 300 GB Filesystem with 3000 Datasets 1. recreate GPFS filesystem 2. restore directory structure from TSM Backup server => 1 minute 3. recreate inodes of migrated datasets from TSM HSM server => 1 minute 4. Restore all other datasets from TSM Backup server => 3 minutes 5. Filesystem available => 5 minutes versus 210 muinutes Hierarchical Storage Management All userfilesystems are controlled by TSM/HSM. Only large datasetes which are not used for a long time are removed from disks (migrated) Copy alle datasets as fast as possible to tape (premigration) Free disk space is more disk space is needed Large datasets first one TSM server per filesystem Integrated with backup solution 10
Hierarchical Storage Management open Problems dsmmigrate from all nodes in a GPFS cluster dsmrecall from all nodes in a GPFS cluster Planned IBM Supercomputer (12/03) 35 Compute Nodes Federation 2 I/O Nodes SAN Storage Devices 11
Federation 4 VSD Server LPARs 2 TSM Server LPARs SAN SAN RAID Controller 50 TB Benutzerdaten Tape Robot 16 Tape Units, 1.2 PB Disk access over SAN no single point of failure GPFS over VSD All data on RAID5 (4+1) arrays 144 RAID5 arrays with 250 GB each 9 filesystems of 16 arrays each (4 TB) enough Hot Spare disk TSM Daten (Datenbank, Log) in RAID5 Systemen 12
TSM hardware configuration production system 4 3590H per SCSI TSM hardware configuration production system 2 RAID5 Volumes with 300GB pro controller 2 SCSI per FC 2 SCSI per RAID controller Gigabit Ethernet P620 2 CPU 4 GB 13
TSM Software Configuration 7 backup server and 1 archive server 70 GB Database each shared IBM 3490 Tape Library 1 dedicated server for the tape library 1 backup server for the supercomputer TSM Servers 10 TSM Server on this System ADSM Backup Clients 3000 2500 2000 1500 1000 500 0 93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 00 01 01 02 02 03 14
ADSM Backup Number of datasets 300.000.000 250.000.000 200.000.000 150.000.000 100.000.000 50.000.000 0 93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 00 01 01 02 02 03 ADSM Backup Amount of data 40 35 30 25 TeraBytes 20 15 10 5 0 93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 00 01 01 02 02 03 15
ADSM Archive Clients 600 500 400 300 200 100 0 Okt 94 95 Apr 95 95 Okt 95 96 Apr 96 96 Okt 96 97 Apr 97 97 Okt 97 98 Apr 98 98 Okt 98 99 Apr 99 99 Okt 99 00 Apr 00 00 Okt 00 01 Apr 01 01 Okt 01 02 Apr 02 02 Okt 02 ADSM Archive Number of Datasets 03 Apr 03 30.000.000 25.000.000 20.000.000 15.000.000 10.000.000 5.000.000 0 Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr Okt 94 95 95 95 95 96 96 96 96 97 97 97 97 98 98 98 98 99 99 99 99 00 00 00 00 Apr Okt Apr Okt Apr 01 01 01 01 02 02 02 02 03 03 16
ADSM Archive Amount of data 10 9 8 7 6 TeraBytes 5 4 3 2 1 0 Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr Okt Apr 94 95 95 95 95 96 96 96 96 97 97 97 97 98 98 98 98 99 99 99 99 00 00 00 00 01 01 01 01 02 02 02 02 03 03 Test of TSM 5.2 (Backup) Install TSM 5.2 server on a test system Install TSM 5.2 clients on dedicated Systems (Win,LINUX,AIX,Solaris) Test of 5.2 clients with 5.1 Server Test of 5.2 clients with 5.2 Server Test of old clients with 5.2 Server Define the test Server on the 5.1 TAPE Server to share the IBM Tape Library 17
Migration to TSM 5.2 Each server has its own bin directory (copy of /usr/tivoli/tsm/server/bin) Installation of TSM 5.2.0.0 Halt of a server Copy /usr/tivoli/tsm/server/bin to /server/bin Start of a server with /server/bin/dsmserv upgradedb After migration of first server wait a week to see problems Migrate the other servers 18