File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform

View Only

Back to Blog List

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution

By Archive User posted Tue December 05, 2017 09:52 AM

The ever expanding and accelerating generation of Big data has made mandatory the usage of more efficient and robust storage filesystems. The data migration across multiple storage mediums is proving to be expensive. There is an inevitable and urgent need to perform inplace data analytics and to provide a single unified namespace to all types of storage mediums. IBM Spectrum Scale file system and IBM Elastic Storage Server (ESS) provide the perfect solution for these requirements. IBM Spectrum Scale filesystem is officially certified as a storage offering for Hortonworks HDP hadoop distribution.

IBM Spectrum Scale offers numerous advantages over HDFS, which is the default storage for hortonworks hdp clusters.

Reduces the data center footprint. No data copying and migration required for running Hadoop analytics.

Provides a unified access to data using different protocols such as File, Block and and Object.

Management of geographically dispersed data including disaster recovery

In-place data analytics. Spectrum Scale is POSIX compatible, which supports various applications and workloads. With Spectrum Scale HDFS Transparency Connector, you can analyze file and object data in-place with no data transfer or data movement.

Flexible deployment mode. You can not only run IBM Spectrum Scale on commercial storage rich server, but also choose IBM Elastic Storage Server (ESS) to provide higher performance massive storage system for your Hadoop workload. You can even deploy Spectrum Scale in traditionally SAN storage system as well for HDP.

Spectrum Scale enterprise-class data management features, such as POSIX-compliant APIs or the command line

Unified File and Object support (NFS,SMB,Object)

FIPS and NIST compliant data encryption

Cold data compression

Disaster Recovery

Snapshot support for point-in-time data captures

Policy-based information lifecycle management capabilities to manage PBs of data

Maturely enterprise-level data backup and archive solutions (inclusing Tape)

Remote cluster Mount Support

Mixed Filesystem Support

Seamless secure tiering to Cloud Object stores

[caption id="attachment_3752" align="alignnone" width="1436"]

IBM Spectrum Scale provides Unified access using different protocols and single Namespace to all kind of storage mediums.[/caption]

HDFS Protocol for Spectrum Scale
Spectrum Scale provides a mechanism for accessing and ingesting data using HDFS RPC protocols such as HDFS file system utility and RPC daemons such as NameNodes and DataNodes. This is achieved using HDFS Transparency Connector. The HDFS Transparency connector redirects all the I/O traffic from Native HDFS to Spectrum Scale File System. This allows any Big Data Application to run seamlessly on top of Spectrum Scale file system without any changes to the application logic.

[caption id="attachment_4442" align="alignnone" width="874"]

Comparison of Spectrum Scale HDFS Transparency Connector Architecture with Native HDFS RPC.[/caption]

Spectrum Scale File System Configurations on Big Data Clusters

Hortonworks HDP uses Apache Ambari for configuration, management and creating Big data clusters. Ambari server also provides a easy to use GUI Wizard for adding any new service on a existing cluster. The Ambari Integration module helps in linking Spectrum Scale Service to the Ambari Server, so that Add New Service Wizard of Ambari GUI can be used for creating the cluster.

Spectrum Scale supports Three type of configurations in the cluster:-

1. Shared Nothing Architecture (FPO(File Placement Optimizer))
Spectrum Scale FPO configuration makes use of the local disks attached to each node which are part of the cluster.
In FPO-enabled Spectrum Scale cluster deployments, a physical disk and Network Shared Disks(NSD) can have
a one to one mapping. In this case, each node in the cluster is a NSD server providing access to the disks from the rest of the cluster. The NSD configuration file specifies the topology of each node of the cluster.
[caption id="attachment_5027" align="alignnone" width="757"]

Shared Nothing Configuration using Local Disks.[/caption]

2. Shared Storage Configuration (ESS(Elastic Storage Server))
This configuration allows any local FPO enabled Spectrum Scale cluster to be added as a part of ESS(Elastic Storage Server) File system.
[caption id="attachment_5028" align="alignnone" width="862"]

Shared Storage Configuration[/caption]
3. Remote Mount Configuration
This configuration allows the ESS Spectrum Scale File System to be mounted on the local cluster.
[caption id="attachment_5029" align="alignnone" width="840"]

Remote Cluster Mount Support[/caption]

4. Mixed Configuration Support
This Configuration allows a local FPO filesystem configuration along with Mapping any other external cluster which can be either a ESS File system or a different Spectrum Scale cluster to be mounted on the cluster. The Big Data Applications can then use the remote mounted filesytem as well.

Deploying Spectrum Scale File System in Hortonworks using Ambari Blueprints
Ambari provides a way to deploy a cluster using all the configurations of a existing cluster. So, Using the exported configuration of the whole cluster can help in deploying the cluster with all the services and Spectrum Scale in a single go without any major intervention.

Spectrum Scale Ambari Management pack
The Spectrum Scale Ambari Management pack provides a seamless way to register the Spectrum Scale Filesystem onto the existing Hortonworks Hadoop cluster. This enables the capability for using Ambari server GUI Wizard for Installing the Spectrum Scale Service on to the existing HDP cluster.

The Management pack comes in the form of tar package:-

SpectrumScaleMPack-2.4.2.1-noarch.tar.gz

This tar package comprises of Installer/Uninstaller and mpack upgrade scripts as well.

$ tar -xvzf SpectrumScaleMPack-2.4.2.1-noarch.tar.gz 
./SpectrumScaleExtension-MPack-2.4.2.1.tar.gz
./SpectrumScaleIntegrationPackageInstaller-2.4.2.1.bin
./SpectrumScaleMPackInstaller.py
./SpectrumScaleMPackUninstaller.py
./SpectrumScale_UpgradeIntegrationPackage
./sum.txt

[caption id="attachment_5062" align="alignnone" width="782"] Hortonworks HDP Stack before adding Spectrum Scale MPack[/caption]

The Mpack Binary file adds the Spectrum Scale service onto the existing cluster along with creating extension links required for linking it to the current stack in the HDP cluster

[root@c902f09x09 ~]# ./SpectrumScaleIntegrationPackageInstaller-2.4.2.1.bin 

Apache License Agreement ...........................
....................................................
....................................................
Do you agree to the above license terms? [yes or no] 
yes          
Installing...
INFO: ***Starting the Mpack Installer***   

Enter Ambari Server Port Number. If it is not entered, the installer will take default port 8080  :   
INFO: Taking default port 8080 as Ambari Server Port Number.
Enter Ambari Server IP Address. Default=127.0.0.1  :   
INFO: Ambari Server IP Address not provided. Taking default Amabri Server IP Address as "127.0.0.1".
Enter Ambari Server Username, default=admin  :   
INFO: Taking default username "admin" as Ambari Server Username.
Enter Ambari Server Password  :   
INFO: Verifying Ambari Server Address, Username and Password.
INFO: Verification Successful.
INFO: Adding Spectrum Scale MPack : ambari-server install-mpack --mpack=SpectrumScaleExtension-MPack-2.4.2.1.tar.gz -v
INFO: Spectrum Scale MPack Successfully Added. Continuing with Ambari Server Restart...
INFO: Performing Ambari Server Restart.
INFO: Ambari Server Restart Completed Successfully.
INFO: Running command - curl -u admin:******* -H 'X-Requested-By: ambari' -X POST -d '{"ExtensionLink": {"stack_name":"HDP", "stack_version": "2.6", "extension_name": "SpectrumScaleExtension", "extension_version": "2.4.2.1"}}' http://127.0.0.1:8080/api/v1/links/
INFO: Extension Link Created Successfully.
INFO: Starting Spectrum Scale Changes.
INFO: Spectrum Scale Changes Successfully Completed.
INFO: Performing Ambari Server Restart.
INFO: Ambari Server Restart Completed Successfully.
INFO: Backing up original HDFS files to hdfs-original-files-backup
INFO: Running command cp -f -r -p -u /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/ hdfs-original-files-backup
Done.
[root@c902f09x09 ~]# 


This allows the Spectrum Scale Service too be listed in the Add Service Wizard of Ambari Server.
[caption id="attachment_5058" align="alignnone" width="1285"] Spectrum Scale Service Listed after Mpack Installation.[/caption]

The Add Service Wizard helps in configuring the Spectrum Scale Service.


[caption id="attachment_5065" align="alignnone" width="1281"] Assigment of Spectrum Scale Nodes on the cluster.[/caption]
[caption id="attachment_5066" align="alignnone" width="1276"] Customize Service Panel[/caption]
[caption id="attachment_5067" align="alignnone" width="1279"] Spectrum Scale Parameters Configuration [/caption]
[caption id="attachment_5071" align="alignnone" width="1278"] Spectrum Scale Service Installation on the cluster nodes.[/caption]
[caption id="attachment_5073" align="alignnone" width="1280"] Service Addition completed.[/caption]

[caption id="attachment_5129" align="alignnone" width="779"] Spectrum Scale Filesystem Added as a service in the existing HDP cluster.[/caption]

HDFS Transparency Daemon Status can also be verified:-

# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate
Node1 : namenode running as process 8280.
Node2 : datanode running as process 15192.
Node3 : datanode running as process 31595.
Node4 : datanode running as process 25271.
Node5 : datanode running as process 10777.




The Spectrum Scale Filesystem can also be verified :-

# /usr/lpp/mmfs/bin/mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         Djbigpfs.gpfs.net
  GPFS cluster id:           12888843454012907741
  GPFS UID domain:           Djbigpfs.gpfs.net
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name     IP address   Admin node name      Designation
--------------------------------------------------------------------------
   1   c902f09x09.gpfs.net  172.16.1.51  c902f09x09.gpfs.net  quorum
   2   c902f09x12.gpfs.net  172.16.1.57  c902f09x12.gpfs.net  quorum
   3   c902f09x10.gpfs.net  172.16.1.53  c902f09x10.gpfs.net  
   4   c902f09x11.gpfs.net  172.16.1.55  c902f09x11.gpfs.net  quorum

[caption id="attachment_5354" align="alignnone" width="1201"] Spectrum Scale Service Panel in Integrated state[/caption]

# /usr/lpp/mmfs/bin/mmlsfs all

File system attributes for /dev/bigpfs:
=======================================
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 8192                     Minimum fragment size in bytes (system pool)
                    65536                    Minimum fragment size in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 3                        Default number of metadata replicas
 -M                 3                        Maximum number of metadata replicas
 -r                 3                        Default number of data replicas
 -R                 3                        Maximum number of data replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 all                      ACL semantics in effect
 -n                 32                       Estimated number of nodes that will mount file system
 -B                 262144                   Block size (system pool)
                    2097152                  Block size (other pools)
 -Q                 none                     Quotas accounting enabled
                    none                     Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 17.00 (4.2.3.0)          File system version
 --create-time      Fri Nov 17 08:20:27 2017 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 16252928                 Logfile size
 -E                 No                       Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           Yes                      Fast external attributes enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      3948224                  Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 32               Number of subblocks per full block
 -P                 system;datapool          Disk storage pools in file system
 -d                 gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd  Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /bigpfs                  Default mount point
 --mount-priority   0                        Mount priority


[caption id="attachment_5352" align="alignnone" width="1275"] HDFS Service Panel in Spectrum Scale Integrated State[/caption]


Remote Mount and Multifilesystem support
Spectrum Scale cluster can have multiple local filesystems and also remote mounted filesystems. HDFS Transparency Connector provides support for multiple remote mounted filesystem as well.

For example, if you have a remote filesystem mounted on your cluster:-

#  mmremotefs show all
Local Name  Remote Name  Cluster name       Mount Point        Mount Options    Automount  Drive  Priority
djremotegpfs1 bigpfs       bigpfs.gpfs.net    /djremotegpfs1     rw               no           -        0

When there are multiple filesystems present in the cluster, one of which is a local filesystem and the other is a remote mounted filesystem.

They can be listed :-

 # mmlsfs all

File system attributes for /dev/bigpfs:
=======================================
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 8192                     Minimum fragment size in bytes (system pool)
                    65536                    Minimum fragment size in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 3                        Default number of metadata replicas
 -M                 3                        Maximum number of metadata replicas
 -r                 3                        Default number of data replicas
 -R                 3                        Maximum number of data replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 all                      ACL semantics in effect
 -n                 32                       Estimated number of nodes that will mount file system
 -B                 262144                   Block size (system pool)
                    2097152                  Block size (other pools)
 -Q                 none                     Quotas accounting enabled
                    none                     Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 17.00 (4.2.3.0)          File system version
 --create-time      Fri Nov 17 08:20:27 2017 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 16252928                 Logfile size
 -E                 No                       Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           Yes                      Fast external attributes enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      3948224                  Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 32               Number of subblocks per full block
 -P                 system;datapool          Disk storage pools in file system
 -d                 gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd  Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /bigpfs                  Default mount point
 --mount-priority   0                        Mount priority

File system attributes for bigpfs.gpfs.net:/dev/bigpfs:
=======================================================
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 8192                     Minimum fragment size in bytes (system pool)
                    65536                    Minimum fragment size in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 3                        Default number of metadata replicas
 -M                 3                        Maximum number of metadata replicas
 -r                 3                        Default number of data replicas
 -R                 3                        Maximum number of data replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 all                      ACL semantics in effect
 -n                 32                       Estimated number of nodes that will mount file system
 -B                 262144                   Block size (system pool)
                    2097152                  Block size (other pools)
 -Q                 none                     Quotas accounting enabled
                    none                     Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 17.00 (4.2.3.0)          File system version
 --create-time      Mon Nov 27 05:53:34 2017 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 16252928                 Logfile size
 -E                 No                       Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           Yes                      Fast external attributes enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      3883776                  Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 32               Number of subblocks per full block
 -P                 system;datapool          Disk storage pools in file system
 -d                 gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd  Disks in file system
 -A                 no                       Automatic mount option
 -o                 none                     Additional mount options
 -T                 /djremotegpfs1           Default mount point
 --mount-priority   0                        Mount priority 

The Spectrum Scale Service configuration can be changed to support this kind of multi-filesystem support.

[caption id="attachment_5339" align="alignnone" width="1009"] Spectrum Scale Spectrum Scale configuration changes for Multifilesystem support. [/caption]

HDFS Transparency daemons supports multifilesystem configurations like local and remote filesystems.

# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate
Node1 : namenode running as process 8280.
Node2 : datanode running as process 15192.
Node3 : datanode running as process 31595.
Node4 : datanode running as process 25271.
Node5 : datanode running as process 10777.

The HDFS Transparency connector lists the second mount point as a virtualized sub-directory in the first base filesystem mountpoint. So, that big data applications can use the whichever filesystem and based on there suitability and storage type.

# hadoop fs -ls /
Found 11 items
drwxrwxrwx   - yarn   hadoop          0 2017-12-04 11:07 /app-logs
drwxr-xr-x   - hdfs   root            0 2017-12-04 11:13 /apps
drwxr-xr-x   - yarn   hadoop          0 2017-12-04 11:07 /ats
drwxr-xr-x   - hdfs   hadoop          0 2017-11-27 07:10 /djremotegpfs1
drwxr-xr-x   - hdfs   root            0 2017-12-04 11:07 /hdp
drwxr-xr-x   - mapred root            0 2017-12-04 11:07 /mapred
drwxrwxrwx   - mapred hadoop          0 2017-12-04 11:07 /mr-history
drwxrwxrwx   - spark  hadoop          0 2017-12-04 11:18 /spark-history
drwxrwxrwx   - spark  hadoop          0 2017-12-04 11:18 /spark2-history
drwxrwxrwx   - hdfs   root            0 2017-12-04 11:10 /tmp
drwxr-xr-x   - hdfs   root            0 2017-12-04 11:11 /user


DFSIO and Teragen/Terasort:-

DFSIO Read and Write standard benchmarks can be run on the Hortonworks HDP Hadoop clusters having spectrum scale in integrated state.

DFSIO Write Throughput
# yarn jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 200MB 2>&1 | tee /tmp/TestDFSIO_write.deepak.txt
17/12/04 12:09:01 INFO fs.TestDFSIO: TestDFSIO.1.8
17/12/04 12:09:01 INFO fs.TestDFSIO: nrFiles = 10
17/12/04 12:09:01 INFO fs.TestDFSIO: nrBytes (MB) = 200.0
17/12/04 12:09:01 INFO fs.TestDFSIO: bufferSize = 1000000
17/12/04 12:09:01 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/12/04 12:09:01 INFO fs.TestDFSIO: creating control file: 209715200 bytes, 10 files
17/12/04 12:09:03 INFO fs.TestDFSIO: created control files for: 10 files
17/12/04 12:09:03 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:09:03 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:09:03 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:09:03 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:09:04 INFO mapred.FileInputFormat: Total input paths to process : 10
17/12/04 12:09:04 INFO mapreduce.JobSubmitter: number of splits:10
17/12/04 12:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0004
17/12/04 12:09:04 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0004
17/12/04 12:09:04 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0004/
17/12/04 12:09:04 INFO mapreduce.Job: Running job: job_1512403681949_0004
17/12/04 12:09:09 INFO mapreduce.Job: Job job_1512403681949_0004 running in uber mode : false
17/12/04 12:09:09 INFO mapreduce.Job:  map 0% reduce 0%
17/12/04 12:09:20 INFO mapreduce.Job:  map 20% reduce 0%
17/12/04 12:09:21 INFO mapreduce.Job:  map 67% reduce 0%
17/12/04 12:09:23 INFO mapreduce.Job:  map 70% reduce 0%
17/12/04 12:09:24 INFO mapreduce.Job:  map 77% reduce 0%
17/12/04 12:09:27 INFO mapreduce.Job:  map 83% reduce 0%
17/12/04 12:09:29 INFO mapreduce.Job:  map 87% reduce 0%
17/12/04 12:09:31 INFO mapreduce.Job:  map 90% reduce 0%
17/12/04 12:09:33 INFO mapreduce.Job:  map 90% reduce 23%
17/12/04 12:09:36 INFO mapreduce.Job:  map 93% reduce 23%
17/12/04 12:09:38 INFO mapreduce.Job:  map 97% reduce 23%
17/12/04 12:09:39 INFO mapreduce.Job:  map 97% reduce 30%
17/12/04 12:09:41 INFO mapreduce.Job:  map 100% reduce 30%
17/12/04 12:09:42 INFO mapreduce.Job:  map 100% reduce 100%
17/12/04 12:09:43 INFO mapreduce.Job: Job job_1512403681949_0004 completed successfully
17/12/04 12:09:43 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=856
		FILE: Number of bytes written=1648330
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2450
		HDFS: Number of bytes written=2097152079
		HDFS: Number of read operations=43
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=12
	Job Counters 
		Launched map tasks=10
		Launched reduce tasks=1
		Data-local map tasks=10
		Total time spent by all maps in occupied slots (ms)=1957054
		Total time spent by all reduces in occupied slots (ms)=380820
		Total time spent by all map tasks (ms)=177914
		Total time spent by all reduce tasks (ms)=17310
		Total vcore-milliseconds taken by all map tasks=177914
		Total vcore-milliseconds taken by all reduce tasks=17310
		Total megabyte-milliseconds taken by all map tasks=2004023296
		Total megabyte-milliseconds taken by all reduce tasks=389959680
	Map-Reduce Framework
		Map input records=10
		Map output records=50
		Map output bytes=750
		Map output materialized bytes=910
		Input split bytes=1330
		Combine input records=0
		Combine output records=0
		Reduce input groups=5
		Reduce shuffle bytes=910
		Reduce input records=50
		Reduce output records=5
		Spilled Records=100
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=3866
		CPU time spent (ms)=63920
		Physical memory (bytes) snapshot=25849901056
		Virtual memory (bytes) snapshot=139142180864
		Total committed heap usage (bytes)=28442099712
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1120
	File Output Format Counters 
		Bytes Written=79
17/12/04 12:09:43 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/12/04 12:09:43 INFO fs.TestDFSIO:            Date & time: Mon Dec 04 12:09:43 EST 2017
17/12/04 12:09:43 INFO fs.TestDFSIO:        Number of files: 10
17/12/04 12:09:43 INFO fs.TestDFSIO: Total MBytes processed: 2000.0
17/12/04 12:09:43 INFO fs.TestDFSIO:      Throughput mb/sec: 13.672222146265433
17/12/04 12:09:43 INFO fs.TestDFSIO: Average IO rate mb/sec: 16.02884292602539
17/12/04 12:09:43 INFO fs.TestDFSIO:  IO rate std deviation: 6.16775906029513
17/12/04 12:09:43 INFO fs.TestDFSIO:     Test exec time sec: 40.033
17/12/04 12:09:43 INFO fs.TestDFSIO: 

DFSIO Read Throughput:- 
# yarn jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 200MB  2>&1 | tee /tmp/TestDFSIO_read.deepak.txt
17/12/04 12:14:15 INFO fs.TestDFSIO: TestDFSIO.1.8
17/12/04 12:14:15 INFO fs.TestDFSIO: nrFiles = 10
17/12/04 12:14:15 INFO fs.TestDFSIO: nrBytes (MB) = 200.0
17/12/04 12:14:15 INFO fs.TestDFSIO: bufferSize = 1000000
17/12/04 12:14:15 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/12/04 12:14:15 INFO fs.TestDFSIO: creating control file: 209715200 bytes, 10 files
17/12/04 12:14:16 INFO fs.TestDFSIO: created control files for: 10 files
17/12/04 12:14:16 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:14:16 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:14:17 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:14:17 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:14:17 INFO mapred.FileInputFormat: Total input paths to process : 10
17/12/04 12:14:17 INFO mapreduce.JobSubmitter: number of splits:10
17/12/04 12:14:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0005
17/12/04 12:14:18 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0005
17/12/04 12:14:18 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0005/
17/12/04 12:14:18 INFO mapreduce.Job: Running job: job_1512403681949_0005
17/12/04 12:14:23 INFO mapreduce.Job: Job job_1512403681949_0005 running in uber mode : false
17/12/04 12:14:23 INFO mapreduce.Job:  map 0% reduce 0%
17/12/04 12:14:28 INFO mapreduce.Job:  map 10% reduce 0%
17/12/04 12:14:29 INFO mapreduce.Job:  map 60% reduce 0%
17/12/04 12:14:30 INFO mapreduce.Job:  map 70% reduce 0%
17/12/04 12:14:32 INFO mapreduce.Job:  map 100% reduce 0%
17/12/04 12:14:33 INFO mapreduce.Job:  map 100% reduce 100%
17/12/04 12:14:34 INFO mapreduce.Job: Job job_1512403681949_0005 completed successfully
17/12/04 12:14:34 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=862
		FILE: Number of bytes written=1648320
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2097154450
		HDFS: Number of bytes written=81
		HDFS: Number of read operations=53
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=10
		Launched reduce tasks=1
		Data-local map tasks=8
		Rack-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=535414
		Total time spent by all reduces in occupied slots (ms)=54450
		Total time spent by all map tasks (ms)=48674
		Total time spent by all reduce tasks (ms)=2475
		Total vcore-milliseconds taken by all map tasks=48674
		Total vcore-milliseconds taken by all reduce tasks=2475
		Total megabyte-milliseconds taken by all map tasks=548263936
		Total megabyte-milliseconds taken by all reduce tasks=55756800
	Map-Reduce Framework
		Map input records=10
		Map output records=50
		Map output bytes=756
		Map output materialized bytes=916
		Input split bytes=1330
		Combine input records=0
		Combine output records=0
		Reduce input groups=5
		Reduce shuffle bytes=916
		Reduce input records=50
		Reduce output records=5
		Spilled Records=100
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=676
		CPU time spent (ms)=20380
		Physical memory (bytes) snapshot=24607191040
		Virtual memory (bytes) snapshot=139105824768
		Total committed heap usage (bytes)=24529338368
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1120
	File Output Format Counters 
		Bytes Written=81
17/12/04 12:14:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/12/04 12:14:34 INFO fs.TestDFSIO:            Date & time: Mon Dec 04 12:14:34 EST 2017
17/12/04 12:14:34 INFO fs.TestDFSIO:        Number of files: 10
17/12/04 12:14:34 INFO fs.TestDFSIO: Total MBytes processed: 2000.0
17/12/04 12:14:34 INFO fs.TestDFSIO:      Throughput mb/sec: 120.43114349370747
17/12/04 12:14:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 268.7469482421875
17/12/04 12:14:34 INFO fs.TestDFSIO:  IO rate std deviation: 177.04005019489514
17/12/04 12:14:34 INFO fs.TestDFSIO:     Test exec time sec: 18.119
17/12/04 12:14:34 INFO fs.TestDFSIO: 

Teragen benchmarking for generating sample data  
# hadoop jar hadoop-mapreduce-examples.jar teragen 100000000 /user/djdeepak5/terasort-input
17/12/04 14:59:38 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 14:59:38 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 14:59:54 INFO terasort.TeraSort: Generating 100000000 using 2
17/12/04 14:59:56 INFO mapreduce.JobSubmitter: number of splits:2
17/12/04 15:00:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0015
17/12/04 15:00:04 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0015
17/12/04 15:00:04 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0015/
17/12/04 15:00:04 INFO mapreduce.Job: Running job: job_1512403681949_0015
17/12/04 15:00:13 INFO mapreduce.Job: Job job_1512403681949_0015 running in uber mode : false
17/12/04 15:00:13 INFO mapreduce.Job:  map 0% reduce 0%
17/12/04 15:00:24 INFO mapreduce.Job:  map 1% reduce 0%
17/12/04 15:00:30 INFO mapreduce.Job:  map 3% reduce 0%
17/12/04 15:00:36 INFO mapreduce.Job:  map 4% reduce 0%
17/12/04 15:00:39 INFO mapreduce.Job:  map 5% reduce 0%
17/12/04 15:00:42 INFO mapreduce.Job:  map 8% reduce 0%
17/12/04 15:00:48 INFO mapreduce.Job:  map 9% reduce 0%
17/12/04 15:00:51 INFO mapreduce.Job:  map 11% reduce 0%
17/12/04 15:00:54 INFO mapreduce.Job:  map 12% reduce 0%
17/12/04 15:01:03 INFO mapreduce.Job:  map 15% reduce 0%
17/12/04 15:01:10 INFO mapreduce.Job:  map 16% reduce 0%
17/12/04 15:01:12 INFO mapreduce.Job:  map 18% reduce 0%
17/12/04 15:01:16 INFO mapreduce.Job:  map 19% reduce 0%
17/12/04 15:01:18 INFO mapreduce.Job:  map 20% reduce 0%
17/12/04 15:01:19 INFO mapreduce.Job:  map 22% reduce 0%
17/12/04 15:01:24 INFO mapreduce.Job:  map 23% reduce 0%
17/12/04 15:01:28 INFO mapreduce.Job:  map 24% reduce 0%
17/12/04 15:01:30 INFO mapreduce.Job:  map 26% reduce 0%
17/12/04 15:01:31 INFO mapreduce.Job:  map 27% reduce 0%
17/12/04 15:01:40 INFO mapreduce.Job:  map 28% reduce 0%
17/12/04 15:01:42 INFO mapreduce.Job:  map 30% reduce 0%
17/12/04 15:01:45 INFO mapreduce.Job:  map 31% reduce 0%
17/12/04 15:01:46 INFO mapreduce.Job:  map 32% reduce 0%
17/12/04 15:01:57 INFO mapreduce.Job:  map 34% reduce 0%
17/12/04 15:01:58 INFO mapreduce.Job:  map 35% reduce 0%
17/12/04 15:02:01 INFO mapreduce.Job:  map 36% reduce 0%
17/12/04 15:02:03 INFO mapreduce.Job:  map 37% reduce 0%
17/12/04 15:02:04 INFO mapreduce.Job:  map 38% reduce 0%
17/12/04 15:02:09 INFO mapreduce.Job:  map 39% reduce 0%
17/12/04 15:02:10 INFO mapreduce.Job:  map 40% reduce 0%
17/12/04 15:02:16 INFO mapreduce.Job:  map 41% reduce 0%
17/12/04 15:02:18 INFO mapreduce.Job:  map 43% reduce 0%
17/12/04 15:02:21 INFO mapreduce.Job:  map 45% reduce 0%
17/12/04 15:02:24 INFO mapreduce.Job:  map 46% reduce 0%
17/12/04 15:02:28 INFO mapreduce.Job:  map 47% reduce 0%
17/12/04 15:02:31 INFO mapreduce.Job:  map 48% reduce 0%
17/12/04 15:02:37 INFO mapreduce.Job:  map 50% reduce 0%
17/12/04 15:02:40 INFO mapreduce.Job:  map 51% reduce 0%
17/12/04 15:02:46 INFO mapreduce.Job:  map 54% reduce 0%
17/12/04 15:02:52 INFO mapreduce.Job:  map 55% reduce 0%
17/12/04 15:02:55 INFO mapreduce.Job:  map 58% reduce 0%
17/12/04 15:03:01 INFO mapreduce.Job:  map 59% reduce 0%
17/12/04 15:03:07 INFO mapreduce.Job:  map 61% reduce 0%
17/12/04 15:03:13 INFO mapreduce.Job:  map 62% reduce 0%
17/12/04 15:03:16 INFO mapreduce.Job:  map 63% reduce 0%
17/12/04 15:03:19 INFO mapreduce.Job:  map 65% reduce 0%
17/12/04 15:03:22 INFO mapreduce.Job:  map 66% reduce 0%
17/12/04 15:03:28 INFO mapreduce.Job:  map 67% reduce 0%
17/12/04 15:03:31 INFO mapreduce.Job:  map 69% reduce 0%
17/12/04 15:03:34 INFO mapreduce.Job:  map 70% reduce 0%
17/12/04 15:03:37 INFO mapreduce.Job:  map 71% reduce 0%
17/12/04 15:03:40 INFO mapreduce.Job:  map 73% reduce 0%
17/12/04 15:03:46 INFO mapreduce.Job:  map 75% reduce 0%
17/12/04 15:03:49 INFO mapreduce.Job:  map 77% reduce 0%
17/12/04 15:03:55 INFO mapreduce.Job:  map 79% reduce 0%
17/12/04 15:03:58 INFO mapreduce.Job:  map 81% reduce 0%
17/12/04 15:04:04 INFO mapreduce.Job:  map 82% reduce 0%
17/12/04 15:04:07 INFO mapreduce.Job:  map 83% reduce 0%
17/12/04 15:04:10 INFO mapreduce.Job:  map 85% reduce 0%
17/12/04 15:04:13 INFO mapreduce.Job:  map 87% reduce 0%
17/12/04 15:04:19 INFO mapreduce.Job:  map 89% reduce 0%
17/12/04 15:04:22 INFO mapreduce.Job:  map 90% reduce 0%
17/12/04 15:04:28 INFO mapreduce.Job:  map 93% reduce 0%
17/12/04 15:04:34 INFO mapreduce.Job:  map 94% reduce 0%
17/12/04 15:04:38 INFO mapreduce.Job:  map 95% reduce 0%
17/12/04 15:04:41 INFO mapreduce.Job:  map 96% reduce 0%
17/12/04 15:04:44 INFO mapreduce.Job:  map 97% reduce 0%
17/12/04 15:04:53 INFO mapreduce.Job:  map 98% reduce 0%
17/12/04 15:04:59 INFO mapreduce.Job:  map 100% reduce 0%
17/12/04 15:05:18 INFO mapreduce.Job: Job job_1512403681949_0015 completed successfully
17/12/04 15:05:18 INFO mapreduce.Job: Counters: 31
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=298084
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=170
		HDFS: Number of bytes written=10000000000
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Job Counters 
		Launched map tasks=2
		Other local map tasks=2
		Total time spent by all maps in occupied slots (ms)=6141058
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=558278
		Total vcore-milliseconds taken by all map tasks=558278
		Total megabyte-milliseconds taken by all map tasks=6288443392
	Map-Reduce Framework
		Map input records=100000000
		Map output records=100000000
		Input split bytes=170
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=716
		CPU time spent (ms)=121900
		Physical memory (bytes) snapshot=586973184
		Virtual memory (bytes) snapshot=23502049280
		Total committed heap usage (bytes)=402128896
	org.apache.hadoop.examples.terasort.TeraGen$Counters
		CHECKSUM=214760662691937609
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=10000000000


Terasort on the generated data

# hadoop jar hadoop-mapreduce-examples.jar terasort /user/djdeepak5/terasort-input /user/djdeepak5/terasort-output
17/12/04 15:08:41 INFO terasort.TeraSort: starting
17/12/04 15:08:42 INFO input.FileInputFormat: Total input paths to process : 2
Spent 182ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
Computing input splits took 185ms
Sampling 10 splits of 76
Making 1 from 100000 sampled records
Computing parititions took 9525ms
Spent 9713ms computing partitions.
17/12/04 15:08:51 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 15:08:51 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 15:09:05 INFO mapreduce.JobSubmitter: number of splits:76
17/12/04 15:09:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0016
17/12/04 15:09:14 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0016
17/12/04 15:09:14 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0016/
17/12/04 15:09:14 INFO mapreduce.Job: Running job: job_1512403681949_0016
17/12/04 15:09:21 INFO mapreduce.Job: Job job_1512403681949_0016 running in uber mode : false
17/12/04 15:09:21 INFO mapreduce.Job:  map 0% reduce 0%
17/12/04 15:09:30 INFO mapreduce.Job:  map 1% reduce 0%
17/12/04 15:09:33 INFO mapreduce.Job:  map 4% reduce 0%
17/12/04 15:09:35 INFO mapreduce.Job:  map 8% reduce 0%
17/12/04 15:09:37 INFO mapreduce.Job:  map 10% reduce 0%
17/12/04 15:09:38 INFO mapreduce.Job:  map 11% reduce 0%
17/12/04 15:09:42 INFO mapreduce.Job:  map 12% reduce 0%
17/12/04 15:09:43 INFO mapreduce.Job:  map 15% reduce 0%
17/12/04 15:09:44 INFO mapreduce.Job:  map 18% reduce 0%
17/12/04 15:09:47 INFO mapreduce.Job:  map 19% reduce 0%
17/12/04 15:09:50 INFO mapreduce.Job:  map 20% reduce 0%
17/12/04 15:09:51 INFO mapreduce.Job:  map 26% reduce 0%
17/12/04 15:09:52 INFO mapreduce.Job:  map 28% reduce 0%
17/12/04 15:09:55 INFO mapreduce.Job:  map 29% reduce 0%
17/12/04 15:09:58 INFO mapreduce.Job:  map 33% reduce 0%
17/12/04 15:09:59 INFO mapreduce.Job:  map 36% reduce 0%
17/12/04 15:10:00 INFO mapreduce.Job:  map 37% reduce 0%
17/12/04 15:10:03 INFO mapreduce.Job:  map 38% reduce 4%
17/12/04 15:10:05 INFO mapreduce.Job:  map 39% reduce 4%
17/12/04 15:10:06 INFO mapreduce.Job:  map 41% reduce 4%
17/12/04 15:10:07 INFO mapreduce.Job:  map 43% reduce 4%
17/12/04 15:10:08 INFO mapreduce.Job:  map 46% reduce 4%
17/12/04 15:10:09 INFO mapreduce.Job:  map 46% reduce 5%
17/12/04 15:10:12 INFO mapreduce.Job:  map 47% reduce 6%
17/12/04 15:10:15 INFO mapreduce.Job:  map 48% reduce 7%
17/12/04 15:10:16 INFO mapreduce.Job:  map 50% reduce 7%
17/12/04 15:10:17 INFO mapreduce.Job:  map 51% reduce 7%
17/12/04 15:10:18 INFO mapreduce.Job:  map 53% reduce 8%
17/12/04 15:10:19 INFO mapreduce.Job:  map 55% reduce 8%
17/12/04 15:10:21 INFO mapreduce.Job:  map 57% reduce 8%
17/12/04 15:10:22 INFO mapreduce.Job:  map 57% reduce 9%
17/12/04 15:10:23 INFO mapreduce.Job:  map 58% reduce 9%
17/12/04 15:10:25 INFO mapreduce.Job:  map 59% reduce 10%
17/12/04 15:10:26 INFO mapreduce.Job:  map 61% reduce 10%
17/12/04 15:10:28 INFO mapreduce.Job:  map 61% reduce 11%
17/12/04 15:10:29 INFO mapreduce.Job:  map 66% reduce 11%
17/12/04 15:10:31 INFO mapreduce.Job:  map 67% reduce 12%
17/12/04 15:10:33 INFO mapreduce.Job:  map 68% reduce 12%
17/12/04 15:10:34 INFO mapreduce.Job:  map 69% reduce 12%
17/12/04 15:10:35 INFO mapreduce.Job:  map 70% reduce 12%
17/12/04 15:10:37 INFO mapreduce.Job:  map 72% reduce 14%
17/12/04 15:10:39 INFO mapreduce.Job:  map 75% reduce 14%
17/12/04 15:10:40 INFO mapreduce.Job:  map 76% reduce 14%
17/12/04 15:10:42 INFO mapreduce.Job:  map 77% reduce 14%
17/12/04 15:10:45 INFO mapreduce.Job:  map 78% reduce 14%
17/12/04 15:10:46 INFO mapreduce.Job:  map 78% reduce 15%
17/12/04 15:10:47 INFO mapreduce.Job:  map 82% reduce 15%
17/12/04 15:10:48 INFO mapreduce.Job:  map 83% reduce 15%
17/12/04 15:10:49 INFO mapreduce.Job:  map 86% reduce 17%
17/12/04 15:10:52 INFO mapreduce.Job:  map 86% reduce 18%
17/12/04 15:10:53 INFO mapreduce.Job:  map 87% reduce 18%
17/12/04 15:10:55 INFO mapreduce.Job:  map 89% reduce 18%
17/12/04 15:10:56 INFO mapreduce.Job:  map 91% reduce 18%
17/12/04 15:10:58 INFO mapreduce.Job:  map 91% reduce 19%
17/12/04 15:10:59 INFO mapreduce.Job:  map 93% reduce 19%
17/12/04 15:11:00 INFO mapreduce.Job:  map 96% reduce 19%
17/12/04 15:11:01 INFO mapreduce.Job:  map 96% reduce 20%
17/12/04 15:11:03 INFO mapreduce.Job:  map 97% reduce 20%
17/12/04 15:11:04 INFO mapreduce.Job:  map 100% reduce 22%
17/12/04 15:11:10 INFO mapreduce.Job:  map 100% reduce 23%
17/12/04 15:11:13 INFO mapreduce.Job:  map 100% reduce 24%
17/12/04 15:11:16 INFO mapreduce.Job:  map 100% reduce 25%
17/12/04 15:11:22 INFO mapreduce.Job:  map 100% reduce 26%
17/12/04 15:11:25 INFO mapreduce.Job:  map 100% reduce 27%
17/12/04 15:11:31 INFO mapreduce.Job:  map 100% reduce 28%
17/12/04 15:11:34 INFO mapreduce.Job:  map 100% reduce 29%
17/12/04 15:11:37 INFO mapreduce.Job:  map 100% reduce 30%
17/12/04 15:11:43 INFO mapreduce.Job:  map 100% reduce 31%
17/12/04 15:11:49 INFO mapreduce.Job:  map 100% reduce 32%
17/12/04 15:11:55 INFO mapreduce.Job:  map 100% reduce 33%
17/12/04 15:12:35 INFO mapreduce.Job:  map 100% reduce 38%
17/12/04 15:12:38 INFO mapreduce.Job:  map 100% reduce 48%
17/12/04 15:12:41 INFO mapreduce.Job:  map 100% reduce 57%
17/12/04 15:12:44 INFO mapreduce.Job:  map 100% reduce 66%
17/12/04 15:12:47 INFO mapreduce.Job:  map 100% reduce 67%
17/12/04 15:13:08 INFO mapreduce.Job:  map 100% reduce 68%
17/12/04 15:13:17 INFO mapreduce.Job:  map 100% reduce 69%
17/12/04 15:13:23 INFO mapreduce.Job:  map 100% reduce 70%
17/12/04 15:13:29 INFO mapreduce.Job:  map 100% reduce 71%
17/12/04 15:13:32 INFO mapreduce.Job:  map 100% reduce 72%
17/12/04 15:13:38 INFO mapreduce.Job:  map 100% reduce 73%
17/12/04 15:13:50 INFO mapreduce.Job:  map 100% reduce 74%
17/12/04 15:13:56 INFO mapreduce.Job:  map 100% reduce 75%
17/12/04 15:14:06 INFO mapreduce.Job:  map 100% reduce 76%
17/12/04 15:14:09 INFO mapreduce.Job:  map 100% reduce 77%
17/12/04 15:14:21 INFO mapreduce.Job:  map 100% reduce 78%
17/12/04 15:14:27 INFO mapreduce.Job:  map 100% reduce 79%
17/12/04 15:14:33 INFO mapreduce.Job:  map 100% reduce 80%
17/12/04 15:14:39 INFO mapreduce.Job:  map 100% reduce 81%
17/12/04 15:14:42 INFO mapreduce.Job:  map 100% reduce 82%
17/12/04 15:14:51 INFO mapreduce.Job:  map 100% reduce 83%
17/12/04 15:15:00 INFO mapreduce.Job:  map 100% reduce 84%
17/12/04 15:15:03 INFO mapreduce.Job:  map 100% reduce 85%
17/12/04 15:15:15 INFO mapreduce.Job:  map 100% reduce 86%
17/12/04 15:15:18 INFO mapreduce.Job:  map 100% reduce 87%
17/12/04 15:15:24 INFO mapreduce.Job:  map 100% reduce 88%
17/12/04 15:15:30 INFO mapreduce.Job:  map 100% reduce 89%
17/12/04 15:15:39 INFO mapreduce.Job:  map 100% reduce 90%
17/12/04 15:15:48 INFO mapreduce.Job:  map 100% reduce 91%
17/12/04 15:15:57 INFO mapreduce.Job:  map 100% reduce 92%
17/12/04 15:16:00 INFO mapreduce.Job:  map 100% reduce 93%
17/12/04 15:16:09 INFO mapreduce.Job:  map 100% reduce 94%
17/12/04 15:16:15 INFO mapreduce.Job:  map 100% reduce 95%
17/12/04 15:16:21 INFO mapreduce.Job:  map 100% reduce 96%
17/12/04 15:16:31 INFO mapreduce.Job:  map 100% reduce 97%
17/12/04 15:16:40 INFO mapreduce.Job:  map 100% reduce 98%
17/12/04 15:16:46 INFO mapreduce.Job:  map 100% reduce 99%
17/12/04 15:16:49 INFO mapreduce.Job:  map 100% reduce 100%
17/12/04 15:16:57 INFO mapreduce.Job: Job job_1512403681949_0016 completed successfully
17/12/04 15:16:57 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=10400000012
		FILE: Number of bytes written=20811586438
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=10000010564
		HDFS: Number of bytes written=10000000000
		HDFS: Number of read operations=231
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=76
		Launched reduce tasks=1
		Data-local map tasks=75
		Rack-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=7861986
		Total time spent by all reduces in occupied slots (ms)=9260416
		Total time spent by all map tasks (ms)=714726
		Total time spent by all reduce tasks (ms)=420928
		Total vcore-milliseconds taken by all map tasks=714726
		Total vcore-milliseconds taken by all reduce tasks=420928
		Total megabyte-milliseconds taken by all map tasks=8050673664
		Total megabyte-milliseconds taken by all reduce tasks=9482665984
	Map-Reduce Framework
		Map input records=100000000
		Map output records=100000000
		Map output bytes=10200000000
		Map output materialized bytes=10400000456
		Input split bytes=10564
		Combine input records=0
		Combine output records=0
		Reduce input groups=100000000
		Reduce shuffle bytes=10400000456
		Reduce input records=100000000
		Reduce output records=100000000
		Spilled Records=200000000
		Shuffled Maps =76
		Failed Shuffles=0
		Merged Map outputs=76
		GC time elapsed (ms)=30652
		CPU time spent (ms)=882600
		Physical memory (bytes) snapshot=199282565120
		Virtual memory (bytes) snapshot=914234179584
		Total committed heap usage (bytes)=219659370496
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=10000000000
	File Output Format Counters 
		Bytes Written=10000000000
17/12/04 15:16:57 INFO terasort.TeraSort: done


 

Conclusion:-
IBM Spectrum Scale provides a Enterprise level alternative to HDFS file system used in Big data cluster. HDFS Transparency supports Multifilesystem which helps in leveraging filesystems created on different storage mediums and locations. Thus reducing the need for 
frequent data migrations. Remote cluster mount support helps in performing inplace analytics on a remotely mounted filesystem data as well.
Mutli-protocol support in IBM Spectrum Scale filesystem helps in ingesting and performing operations on data easier.
With the Hortonworks HDP support for IBM Spectrum Scale filesystem will allow many existing users to perform data analytics on there existing filesystem data without having to migrate the data to HDFS filesystem. 


Related Posts:-

Top Five Benefits of IBM Spectrum Scale with Hortonworks Data Platform

IBM Spectrum Scale and Hortonworks HDP for Winning Big Data Plays

Deploying IBM Spectrum Scale File System using Apache Ambari framework on Hadoop clusters

Big Blue Dancing the Hadoop Dance with Hortonworks 

https://hortonworks.com/partner/ibm/

hdp-ibm-spectrum-scale-brings-enterprise-class-storage-place-analytics/

Remote Mount and Multifilesystem support in IBM Spectrum Scale.

IBM Spectrum Scale Performance Tuning

IBM Spectrum Scale System Workloads Tuning in shared nothing cluster

IBM Spectrum Scale system Spark Workloads Tuning

IBM Spectrum Scale system database workloads tuning

IBM Spectrum Scale system performance tuning for hadoop workloads

IBM Spectrum Scale system HDFS Transparency Federation support

IBM Spectrum Scale system HDFS Transparency short-circuit write support.

References:-

IBM Spectrum Scale Hadoop Integration and Support for HortonWorks HDP

HDFS Transparency Protocol

IBM Knowledge Center ( Big data and analytics ) 

IBM Elastic Storage Server

Apache Ambari Project

Adding IBM Spectrum Scale Service to HDP cluster using existing ESS cluster 

Hortonworks Data Platform with IBM Spectrum Scale

Mounting a Remote Spectrum Scale Filesystem

#sharedstorage
#HDP
#cognitivecomputing
#Integratedinfrastructure
#HortonworksHDP2.6
#IBMSpectrumScale
#HAdoop
#IBMElasticStorageServer
#AmbariServer
#Datasecurity
#HDFS
#ApacheAmbari
#Cloudcomputing
#IBMSpectrumProtectPlus
#Workloadandresourceoptimization
#RemoteMount
#hortonworkshdp
#IBMESS
#Softwaredefinedinfrastructure
#GPFS
#ESS
#backupandrecovery
#Cloudstorage
#Flashstorage
#Softwaredefinedstorage
#Data-centricdesign
#Tapeandvirtualtapestorage
#Hortonworks
#HDFSTransparencyconnector
#Real-timeanalytics

0 comments

0 views

Permalink

https://community.ibm.com/community/user/blogs/archive-user/2017/12/05/ibm-spectrum-scale-on-hortonworks-hdp-hadoop-clusters-a-complete-big-data-solution

File and Object Storage

File and Object Storage

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution

By Archive User posted Tue December 05, 2017 09:52 AM

Permalink

Additional
Resources

Office

Quick Links

File and Object Storage

File and Object Storage

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution

By Archive User posted Tue December 05, 2017 09:52 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources