Commands
#lfs df
-h
|
Lustre
file system Disk Free Space
|
#lfs
check servers
|
List MDS
and OSS Servers.
|
#cat
/proc/fs/lustre/mds/lustre_MDT0000
|
Recovering
Monitoring Status
|
#cat
/proc/fs/lustre/devices
|
List the
Devices
|
#find
/proc/fs/lustre -name status
|
Check the
Status.
|
#cat
/proc/fs/lustre/health_check
|
|
#lctl
device_list
|
List
Devices List
|
Lustre File system Error Message
Compute Node
<DATE- HOSTNAME> kernel: LustreError: 11-0: an error occurred while communicating with <OSS2-IP-Address>@o2ib. The ost_connect operation failed with -30
Master Node
<DATE - HOSTNAME> kernel: LustreError: 11-0: an error occurred while communicating with <OSS2-IP-Address>@o2ib. The ost_write operation failed with -30
#lfs check servers
lustre-MDT0000-mdc-ffff810826763800 active.
error: check 'lustre-OST0000-osc-ffff810826763800' Resource temporarily unavailable
lustre-OST0001-osc-ffff810826763800 active.
lustre-OST0002-osc-ffff810826763800 active.
lustre-OST0003-osc-ffff810826763800 active.
lustre-OST0004-osc-ffff810826763800 active.
lustre-OST0005-osc-ffff810826763800 active.
lustre-OST0006-osc-ffff810826763800 active.
lustre-OST0007-osc-ffff810826763800 active.
NOTE: OST0001 is mounted on the OSS2 Server.
Even through lustre file system mounting we were getting error Input/Output Error message at the compute node while accessing the file.
SOLUTION: Hard Disk I/O Error.
While accessing file Getting : Input/output error
#Hard disk partition table may have possibility to corrupted.
Use fsck or e2table command to recover the data and fix Hard disk I/O Error.
Solution: Run the fsck the problem to the particular Lun (Lustre OST0000) to fix
Lustre File System Slice
1)Creating Slice is very very important.
1)HPC Application Data transfer very in bytes, kbytes,Mbytes,Gbytes.Speed will vary depends on the size data transfer also.So slice is play very vital role in the luster file system.
/scratch - Lustrte /app-NFS
# time dd if=/dev/zero of=/scratch/one.iso bs=1024M count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB) copied, 19.0871 s, 281 MB/s
real 0m19.540s
user 0m0.001s
sys 0m16.865s
# time dd if=/dev/zero of=/scratch/one.iso bs=1024k count=5120
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 17.2196 s, 312 MB/s
real 0m17.224s
user 0m0.007s
sys 0m15.766s
NFS FILE SYSTEM Writing File.
# time dd if=/dev/zero of=/app/one.iso bs=1024M count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB) copied, 23.1646 s, 232 MB/s
real 0m23.714s
user 0m0.002s
sys 0m6.494s
# time dd if=/dev/zero of=/app/one.iso bs=1024k count=5120
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 40.9098 s, 131 MB/s
real 0m42.588s
user 0m0.003s
sys 0m6.866s
Points
1)Reading Small files it is good for NFS.
2)But for large like 5GB Lustre file system better compare to NFS.
3)Both NFS & Lustre if we write the big file like segment(1 Mb rather 1024 K) speed will get better performance.
To Change these attribute we have to work on the Lustre file system.
So, If we install the HPC application on NFS we will get the more performance rather than the lustre file system.
Compute Node
<DATE- HOSTNAME> kernel: LustreError: 11-0: an error occurred while communicating with <OSS2-IP-Address>@o2ib. The ost_connect operation failed with -30
Master Node
<DATE - HOSTNAME> kernel: LustreError: 11-0: an error occurred while communicating with <OSS2-IP-Address>@o2ib. The ost_write operation failed with -30
#lfs check servers
lustre-MDT0000-mdc-ffff810826763800 active.
error: check 'lustre-OST0000-osc-ffff810826763800' Resource temporarily unavailable
lustre-OST0001-osc-ffff810826763800 active.
lustre-OST0002-osc-ffff810826763800 active.
lustre-OST0003-osc-ffff810826763800 active.
lustre-OST0004-osc-ffff810826763800 active.
lustre-OST0005-osc-ffff810826763800 active.
lustre-OST0006-osc-ffff810826763800 active.
lustre-OST0007-osc-ffff810826763800 active.
NOTE: OST0001 is mounted on the OSS2 Server.
Even through lustre file system mounting we were getting error Input/Output Error message at the compute node while accessing the file.
SOLUTION: Hard Disk I/O Error.
While accessing file Getting : Input/output error
#Hard disk partition table may have possibility to corrupted.
Use fsck or e2table command to recover the data and fix Hard disk I/O Error.
Solution: Run the fsck the problem to the particular Lun (Lustre OST0000) to fix
Lustre File System Slice
1)Creating Slice is very very important.
1)HPC Application Data transfer very in bytes, kbytes,Mbytes,Gbytes.Speed will vary depends on the size data transfer also.So slice is play very vital role in the luster file system.
/scratch - Lustrte /app-NFS
# time dd if=/dev/zero of=/scratch/one.iso bs=1024M count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB) copied, 19.0871 s, 281 MB/s
real 0m19.540s
user 0m0.001s
sys 0m16.865s
# time dd if=/dev/zero of=/scratch/one.iso bs=1024k count=5120
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 17.2196 s, 312 MB/s
real 0m17.224s
user 0m0.007s
sys 0m15.766s
NFS FILE SYSTEM Writing File.
# time dd if=/dev/zero of=/app/one.iso bs=1024M count=5
5+0 records in
5+0 records out
5368709120 bytes (5.4 GB) copied, 23.1646 s, 232 MB/s
real 0m23.714s
user 0m0.002s
sys 0m6.494s
# time dd if=/dev/zero of=/app/one.iso bs=1024k count=5120
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 40.9098 s, 131 MB/s
real 0m42.588s
user 0m0.003s
sys 0m6.866s
Points
1)Reading Small files it is good for NFS.
2)But for large like 5GB Lustre file system better compare to NFS.
3)Both NFS & Lustre if we write the big file like segment(1 Mb rather 1024 K) speed will get better performance.
To Change these attribute we have to work on the Lustre file system.
So, If we install the HPC application on NFS we will get the more performance rather than the lustre file system.
0 comments:
Post a Comment