The ever expanding and accelerating generation of Big data has made mandatory the usage of more efficient and robust storage filesystems. The data migration across multiple storage mediums is proving to be expensive. There is an inevitable and urgent need to perform inplace data analytics and to provide a single unified namespace to all types of storage mediums. IBM Spectrum Scale file system and IBM Elastic Storage Server (ESS) provide the perfect solution for these requirements. IBM Spectrum Scale filesystem is officially certified as a storage offering for Hortonworks HDP hadoop distribution.
IBM Spectrum Scale offers numerous advantages over HDFS, which is the default storage for hortonworks hdp clusters. Reduces the data center footprint. No data copying and migration required for running Hadoop analytics.
Provides a unified access to data using different protocols such as File, Block and and Object.
Management of geographically dispersed data including disaster recovery
In-place data analytics. Spectrum Scale is POSIX compatible, which supports various applications and workloads. With Spectrum Scale HDFS Transparency Connector, you can analyze file and object data in-place with no data transfer or data movement. Flexible deployment mode. You can not only run IBM Spectrum Scale on commercial storage rich server, but also choose IBM Elastic Storage Server (ESS) to provide higher performance massive storage system for your Hadoop workload. You can even deploy Spectrum Scale in traditionally SAN storage system as well for HDP. Spectrum Scale enterprise-class data management features, such as POSIX-compliant APIs or the command line Unified File and Object support (NFS,SMB,Object) FIPS and NIST compliant data encryption Cold data compression Disaster Recovery Snapshot support for point-in-time data captures Policy-based information lifecycle management capabilities to manage PBs of data Maturely enterprise-level data backup and archive solutions (inclusing Tape) Remote cluster Mount Support Mixed Filesystem Support Seamless secure tiering to Cloud Object stores[caption id="attachment_3752" align="alignnone" width="1436"]
IBM Spectrum Scale provides Unified access using different protocols and single Namespace to all kind of storage mediums.[/caption]
HDFS Protocol for Spectrum ScaleSpectrum Scale provides a mechanism for accessing and ingesting data using HDFS RPC protocols such as HDFS file system utility and RPC daemons such as NameNodes and DataNodes. This is achieved using HDFS Transparency Connector. The HDFS Transparency connector redirects all the I/O traffic from Native HDFS to Spectrum Scale File System. This allows any Big Data Application to run seamlessly on top of Spectrum Scale file system without any changes to the application logic.
[caption id="attachment_4442" align="alignnone" width="874"]
Comparison of Spectrum Scale HDFS Transparency Connector Architecture with Native HDFS RPC.[/caption]
Spectrum Scale File System Configurations on Big Data ClustersHortonworks HDP uses
Apache Ambari for configuration, management and creating Big data clusters. Ambari server also provides a easy to use GUI Wizard for adding any new service on a existing cluster. The Ambari Integration module helps in linking Spectrum Scale Service to the Ambari Server, so that Add New Service Wizard of Ambari GUI can be used for creating the cluster.
Spectrum Scale supports Three type of configurations in the cluster:-
1.
Shared Nothing Architecture (FPO(File Placement Optimizer)) Spectrum Scale FPO configuration makes use of the local disks attached to each node which are part of the cluster.
In FPO-enabled Spectrum Scale cluster deployments, a physical disk and Network Shared Disks(NSD) can have
a one to one mapping. In this case, each node in the cluster is a NSD server providing access to the disks from the rest of the cluster. The NSD configuration file specifies the topology of each node of the cluster.
[caption id="attachment_5027" align="alignnone" width="757"]
Shared Nothing Configuration using Local Disks.[/caption]
2.
Shared Storage Configuration (ESS(Elastic Storage Server))This configuration allows any local FPO enabled Spectrum Scale cluster to be added as a part of ESS(Elastic Storage Server) File system.
[caption id="attachment_5028" align="alignnone" width="862"]
Shared Storage Configuration[/caption]
3.
Remote Mount ConfigurationThis configuration allows the ESS Spectrum Scale File System to be mounted on the local cluster.
[caption id="attachment_5029" align="alignnone" width="840"]
Remote Cluster Mount Support[/caption]
4.
Mixed Configuration SupportThis Configuration allows a local FPO filesystem configuration along with Mapping any other external cluster which can be either a ESS File system or a different Spectrum Scale cluster to be mounted on the cluster. The Big Data Applications can then use the remote mounted filesytem as well.
Deploying Spectrum Scale File System in Hortonworks using Ambari Blueprints Ambari provides a way to deploy a cluster using all the configurations of a existing cluster. So, Using the exported configuration of the whole cluster can help in deploying the cluster with all the services and Spectrum Scale in a single go without any major intervention.
Spectrum Scale Ambari Management packThe Spectrum Scale Ambari Management pack provides a seamless way to register the Spectrum Scale Filesystem onto the existing Hortonworks Hadoop cluster. This enables the capability for using Ambari server GUI Wizard for Installing the Spectrum Scale Service on to the existing HDP cluster.
The Management pack comes in the form of tar package:-
SpectrumScaleMPack-2.4.2.1-noarch.tar.gz
This tar package comprises of Installer/Uninstaller and mpack upgrade scripts as well.
$ tar -xvzf SpectrumScaleMPack-2.4.2.1-noarch.tar.gz
./SpectrumScaleExtension-MPack-2.4.2.1.tar.gz
./SpectrumScaleIntegrationPackageInstaller-2.4.2.1.bin
./SpectrumScaleMPackInstaller.py
./SpectrumScaleMPackUninstaller.py
./SpectrumScale_UpgradeIntegrationPackage
./sum.txt
[caption id="attachment_5062" align="alignnone" width="782"] Hortonworks HDP Stack before adding Spectrum Scale MPack[/caption]
The Mpack Binary file adds the Spectrum Scale service onto the existing cluster along with creating extension links required for linking it to the current stack in the HDP cluster
[root@c902f09x09 ~]# ./SpectrumScaleIntegrationPackageInstaller-2.4.2.1.bin
Apache License Agreement ...........................
....................................................
....................................................
Do you agree to the above license terms? [yes or no]
yes
Installing...
INFO: ***Starting the Mpack Installer***
Enter Ambari Server Port Number. If it is not entered, the installer will take default port 8080 :
INFO: Taking default port 8080 as Ambari Server Port Number.
Enter Ambari Server IP Address. Default=127.0.0.1 :
INFO: Ambari Server IP Address not provided. Taking default Amabri Server IP Address as "127.0.0.1".
Enter Ambari Server Username, default=admin :
INFO: Taking default username "admin" as Ambari Server Username.
Enter Ambari Server Password :
INFO: Verifying Ambari Server Address, Username and Password.
INFO: Verification Successful.
INFO: Adding Spectrum Scale MPack : ambari-server install-mpack --mpack=SpectrumScaleExtension-MPack-2.4.2.1.tar.gz -v
INFO: Spectrum Scale MPack Successfully Added. Continuing with Ambari Server Restart...
INFO: Performing Ambari Server Restart.
INFO: Ambari Server Restart Completed Successfully.
INFO: Running command - curl -u admin:******* -H 'X-Requested-By: ambari' -X POST -d '{"ExtensionLink": {"stack_name":"HDP", "stack_version": "2.6", "extension_name": "SpectrumScaleExtension", "extension_version": "2.4.2.1"}}' http://127.0.0.1:8080/api/v1/links/
INFO: Extension Link Created Successfully.
INFO: Starting Spectrum Scale Changes.
INFO: Spectrum Scale Changes Successfully Completed.
INFO: Performing Ambari Server Restart.
INFO: Ambari Server Restart Completed Successfully.
INFO: Backing up original HDFS files to hdfs-original-files-backup
INFO: Running command cp -f -r -p -u /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/ hdfs-original-files-backup
Done.
[root@c902f09x09 ~]#
This allows the Spectrum Scale Service too be listed in the Add Service Wizard of Ambari Server.
[caption id="attachment_5058" align="alignnone" width="1285"] Spectrum Scale Service Listed after Mpack Installation.[/caption]
The Add Service Wizard helps in configuring the Spectrum Scale Service.
[caption id="attachment_5065" align="alignnone" width="1281"] Assigment of Spectrum Scale Nodes on the cluster.[/caption]
[caption id="attachment_5066" align="alignnone" width="1276"] Customize Service Panel[/caption]
[caption id="attachment_5067" align="alignnone" width="1279"] Spectrum Scale Parameters Configuration [/caption]
[caption id="attachment_5071" align="alignnone" width="1278"] Spectrum Scale Service Installation on the cluster nodes.[/caption]
[caption id="attachment_5073" align="alignnone" width="1280"] Service Addition completed.[/caption]
[caption id="attachment_5129" align="alignnone" width="779"] Spectrum Scale Filesystem Added as a service in the existing HDP cluster.[/caption]
HDFS Transparency Daemon Status can also be verified:-
# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate
Node1 : namenode running as process 8280.
Node2 : datanode running as process 15192.
Node3 : datanode running as process 31595.
Node4 : datanode running as process 25271.
Node5 : datanode running as process 10777.
The Spectrum Scale Filesystem can also be verified :-
# /usr/lpp/mmfs/bin/mmlscluster
GPFS cluster information
========================
GPFS cluster name: Djbigpfs.gpfs.net
GPFS cluster id: 12888843454012907741
GPFS UID domain: Djbigpfs.gpfs.net
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
--------------------------------------------------------------------------
1 c902f09x09.gpfs.net 172.16.1.51 c902f09x09.gpfs.net quorum
2 c902f09x12.gpfs.net 172.16.1.57 c902f09x12.gpfs.net quorum
3 c902f09x10.gpfs.net 172.16.1.53 c902f09x10.gpfs.net
4 c902f09x11.gpfs.net 172.16.1.55 c902f09x11.gpfs.net quorum
[caption id="attachment_5354" align="alignnone" width="1201"] Spectrum Scale Service Panel in Integrated state[/caption]
# /usr/lpp/mmfs/bin/mmlsfs all
File system attributes for /dev/bigpfs:
=======================================
flag value description
------------------- ------------------------ -----------------------------------
-f 8192 Minimum fragment size in bytes (system pool)
65536 Minimum fragment size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 3 Default number of metadata replicas
-M 3 Maximum number of metadata replicas
-r 3 Default number of data replicas
-R 3 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 262144 Block size (system pool)
2097152 Block size (other pools)
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 17.00 (4.2.3.0) File system version
--create-time Fri Nov 17 08:20:27 2017 File system creation time
-z No Is DMAPI enabled?
-L 16252928 Logfile size
-E No Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea Yes Fast external attributes enabled?
--encryption No Encryption enabled?
--inode-limit 3948224 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 32 Number of subblocks per full block
-P system;datapool Disk storage pools in file system
-d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /bigpfs Default mount point
--mount-priority 0 Mount priority
[caption id="attachment_5352" align="alignnone" width="1275"] HDFS Service Panel in Spectrum Scale Integrated State[/caption]
Remote Mount and Multifilesystem support
Spectrum Scale cluster can have multiple local filesystems and also remote mounted filesystems. HDFS Transparency Connector provides support for multiple remote mounted filesystem as well.
For example, if you have a remote filesystem mounted on your cluster:-
# mmremotefs show all
Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority
djremotegpfs1 bigpfs bigpfs.gpfs.net /djremotegpfs1 rw no - 0
When there are multiple filesystems present in the cluster, one of which is a local filesystem and the other is a remote mounted filesystem.
They can be listed :-
# mmlsfs all
File system attributes for /dev/bigpfs:
=======================================
flag value description
------------------- ------------------------ -----------------------------------
-f 8192 Minimum fragment size in bytes (system pool)
65536 Minimum fragment size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 3 Default number of metadata replicas
-M 3 Maximum number of metadata replicas
-r 3 Default number of data replicas
-R 3 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 262144 Block size (system pool)
2097152 Block size (other pools)
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 17.00 (4.2.3.0) File system version
--create-time Fri Nov 17 08:20:27 2017 File system creation time
-z No Is DMAPI enabled?
-L 16252928 Logfile size
-E No Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea Yes Fast external attributes enabled?
--encryption No Encryption enabled?
--inode-limit 3948224 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 32 Number of subblocks per full block
-P system;datapool Disk storage pools in file system
-d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /bigpfs Default mount point
--mount-priority 0 Mount priority
File system attributes for bigpfs.gpfs.net:/dev/bigpfs:
=======================================================
flag value description
------------------- ------------------------ -----------------------------------
-f 8192 Minimum fragment size in bytes (system pool)
65536 Minimum fragment size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 3 Default number of metadata replicas
-M 3 Maximum number of metadata replicas
-r 3 Default number of data replicas
-R 3 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 262144 Block size (system pool)
2097152 Block size (other pools)
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 17.00 (4.2.3.0) File system version
--create-time Mon Nov 27 05:53:34 2017 File system creation time
-z No Is DMAPI enabled?
-L 16252928 Logfile size
-E No Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea Yes Fast external attributes enabled?
--encryption No Encryption enabled?
--inode-limit 3883776 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 32 Number of subblocks per full block
-P system;datapool Disk storage pools in file system
-d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd Disks in file system
-A no Automatic mount option
-o none Additional mount options
-T /djremotegpfs1 Default mount point
--mount-priority 0 Mount priority
The Spectrum Scale Service configuration can be changed to support this kind of multi-filesystem support.
[caption id="attachment_5339" align="alignnone" width="1009"] Spectrum Scale Spectrum Scale configuration changes for Multifilesystem support. [/caption]
HDFS Transparency daemons supports multifilesystem configurations like local and remote filesystems.
# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate
Node1 : namenode running as process 8280.
Node2 : datanode running as process 15192.
Node3 : datanode running as process 31595.
Node4 : datanode running as process 25271.
Node5 : datanode running as process 10777.
The HDFS Transparency connector lists the second mount point as a virtualized sub-directory in the first base filesystem mountpoint. So, that big data applications can use the whichever filesystem and based on there suitability and storage type.
# hadoop fs -ls /
Found 11 items
drwxrwxrwx - yarn hadoop 0 2017-12-04 11:07 /app-logs
drwxr-xr-x - hdfs root 0 2017-12-04 11:13 /apps
drwxr-xr-x - yarn hadoop 0 2017-12-04 11:07 /ats
drwxr-xr-x - hdfs hadoop 0 2017-11-27 07:10 /djremotegpfs1
drwxr-xr-x - hdfs root 0 2017-12-04 11:07 /hdp
drwxr-xr-x - mapred root 0 2017-12-04 11:07 /mapred
drwxrwxrwx - mapred hadoop 0 2017-12-04 11:07 /mr-history
drwxrwxrwx - spark hadoop 0 2017-12-04 11:18 /spark-history
drwxrwxrwx - spark hadoop 0 2017-12-04 11:18 /spark2-history
drwxrwxrwx - hdfs root 0 2017-12-04 11:10 /tmp
drwxr-xr-x - hdfs root 0 2017-12-04 11:11 /user
DFSIO and Teragen/Terasort:-
DFSIO Read and Write standard benchmarks can be run on the Hortonworks HDP Hadoop clusters having spectrum scale in integrated state.
DFSIO Write Throughput
# yarn jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 200MB 2>&1 | tee /tmp/TestDFSIO_write.deepak.txt
17/12/04 12:09:01 INFO fs.TestDFSIO: TestDFSIO.1.8
17/12/04 12:09:01 INFO fs.TestDFSIO: nrFiles = 10
17/12/04 12:09:01 INFO fs.TestDFSIO: nrBytes (MB) = 200.0
17/12/04 12:09:01 INFO fs.TestDFSIO: bufferSize = 1000000
17/12/04 12:09:01 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/12/04 12:09:01 INFO fs.TestDFSIO: creating control file: 209715200 bytes, 10 files
17/12/04 12:09:03 INFO fs.TestDFSIO: created control files for: 10 files
17/12/04 12:09:03 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:09:03 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:09:03 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:09:03 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:09:04 INFO mapred.FileInputFormat: Total input paths to process : 10
17/12/04 12:09:04 INFO mapreduce.JobSubmitter: number of splits:10
17/12/04 12:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0004
17/12/04 12:09:04 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0004
17/12/04 12:09:04 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0004/
17/12/04 12:09:04 INFO mapreduce.Job: Running job: job_1512403681949_0004
17/12/04 12:09:09 INFO mapreduce.Job: Job job_1512403681949_0004 running in uber mode : false
17/12/04 12:09:09 INFO mapreduce.Job: map 0% reduce 0%
17/12/04 12:09:20 INFO mapreduce.Job: map 20% reduce 0%
17/12/04 12:09:21 INFO mapreduce.Job: map 67% reduce 0%
17/12/04 12:09:23 INFO mapreduce.Job: map 70% reduce 0%
17/12/04 12:09:24 INFO mapreduce.Job: map 77% reduce 0%
17/12/04 12:09:27 INFO mapreduce.Job: map 83% reduce 0%
17/12/04 12:09:29 INFO mapreduce.Job: map 87% reduce 0%
17/12/04 12:09:31 INFO mapreduce.Job: map 90% reduce 0%
17/12/04 12:09:33 INFO mapreduce.Job: map 90% reduce 23%
17/12/04 12:09:36 INFO mapreduce.Job: map 93% reduce 23%
17/12/04 12:09:38 INFO mapreduce.Job: map 97% reduce 23%
17/12/04 12:09:39 INFO mapreduce.Job: map 97% reduce 30%
17/12/04 12:09:41 INFO mapreduce.Job: map 100% reduce 30%
17/12/04 12:09:42 INFO mapreduce.Job: map 100% reduce 100%
17/12/04 12:09:43 INFO mapreduce.Job: Job job_1512403681949_0004 completed successfully
17/12/04 12:09:43 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=856
FILE: Number of bytes written=1648330
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2450
HDFS: Number of bytes written=2097152079
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=1957054
Total time spent by all reduces in occupied slots (ms)=380820
Total time spent by all map tasks (ms)=177914
Total time spent by all reduce tasks (ms)=17310
Total vcore-milliseconds taken by all map tasks=177914
Total vcore-milliseconds taken by all reduce tasks=17310
Total megabyte-milliseconds taken by all map tasks=2004023296
Total megabyte-milliseconds taken by all reduce tasks=389959680
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=750
Map output materialized bytes=910
Input split bytes=1330
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=910
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=3866
CPU time spent (ms)=63920
Physical memory (bytes) snapshot=25849901056
Virtual memory (bytes) snapshot=139142180864
Total committed heap usage (bytes)=28442099712
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=79
17/12/04 12:09:43 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/12/04 12:09:43 INFO fs.TestDFSIO: Date & time: Mon Dec 04 12:09:43 EST 2017
17/12/04 12:09:43 INFO fs.TestDFSIO: Number of files: 10
17/12/04 12:09:43 INFO fs.TestDFSIO: Total MBytes processed: 2000.0
17/12/04 12:09:43 INFO fs.TestDFSIO: Throughput mb/sec: 13.672222146265433
17/12/04 12:09:43 INFO fs.TestDFSIO: Average IO rate mb/sec: 16.02884292602539
17/12/04 12:09:43 INFO fs.TestDFSIO: IO rate std deviation: 6.16775906029513
17/12/04 12:09:43 INFO fs.TestDFSIO: Test exec time sec: 40.033
17/12/04 12:09:43 INFO fs.TestDFSIO:
DFSIO Read Throughput:-
# yarn jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 200MB 2>&1 | tee /tmp/TestDFSIO_read.deepak.txt
17/12/04 12:14:15 INFO fs.TestDFSIO: TestDFSIO.1.8
17/12/04 12:14:15 INFO fs.TestDFSIO: nrFiles = 10
17/12/04 12:14:15 INFO fs.TestDFSIO: nrBytes (MB) = 200.0
17/12/04 12:14:15 INFO fs.TestDFSIO: bufferSize = 1000000
17/12/04 12:14:15 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/12/04 12:14:15 INFO fs.TestDFSIO: creating control file: 209715200 bytes, 10 files
17/12/04 12:14:16 INFO fs.TestDFSIO: created control files for: 10 files
17/12/04 12:14:16 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:14:16 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:14:17 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 12:14:17 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 12:14:17 INFO mapred.FileInputFormat: Total input paths to process : 10
17/12/04 12:14:17 INFO mapreduce.JobSubmitter: number of splits:10
17/12/04 12:14:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0005
17/12/04 12:14:18 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0005
17/12/04 12:14:18 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0005/
17/12/04 12:14:18 INFO mapreduce.Job: Running job: job_1512403681949_0005
17/12/04 12:14:23 INFO mapreduce.Job: Job job_1512403681949_0005 running in uber mode : false
17/12/04 12:14:23 INFO mapreduce.Job: map 0% reduce 0%
17/12/04 12:14:28 INFO mapreduce.Job: map 10% reduce 0%
17/12/04 12:14:29 INFO mapreduce.Job: map 60% reduce 0%
17/12/04 12:14:30 INFO mapreduce.Job: map 70% reduce 0%
17/12/04 12:14:32 INFO mapreduce.Job: map 100% reduce 0%
17/12/04 12:14:33 INFO mapreduce.Job: map 100% reduce 100%
17/12/04 12:14:34 INFO mapreduce.Job: Job job_1512403681949_0005 completed successfully
17/12/04 12:14:34 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=862
FILE: Number of bytes written=1648320
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2097154450
HDFS: Number of bytes written=81
HDFS: Number of read operations=53
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=8
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=535414
Total time spent by all reduces in occupied slots (ms)=54450
Total time spent by all map tasks (ms)=48674
Total time spent by all reduce tasks (ms)=2475
Total vcore-milliseconds taken by all map tasks=48674
Total vcore-milliseconds taken by all reduce tasks=2475
Total megabyte-milliseconds taken by all map tasks=548263936
Total megabyte-milliseconds taken by all reduce tasks=55756800
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=756
Map output materialized bytes=916
Input split bytes=1330
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=916
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=676
CPU time spent (ms)=20380
Physical memory (bytes) snapshot=24607191040
Virtual memory (bytes) snapshot=139105824768
Total committed heap usage (bytes)=24529338368
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1120
File Output Format Counters
Bytes Written=81
17/12/04 12:14:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/12/04 12:14:34 INFO fs.TestDFSIO: Date & time: Mon Dec 04 12:14:34 EST 2017
17/12/04 12:14:34 INFO fs.TestDFSIO: Number of files: 10
17/12/04 12:14:34 INFO fs.TestDFSIO: Total MBytes processed: 2000.0
17/12/04 12:14:34 INFO fs.TestDFSIO: Throughput mb/sec: 120.43114349370747
17/12/04 12:14:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 268.7469482421875
17/12/04 12:14:34 INFO fs.TestDFSIO: IO rate std deviation: 177.04005019489514
17/12/04 12:14:34 INFO fs.TestDFSIO: Test exec time sec: 18.119
17/12/04 12:14:34 INFO fs.TestDFSIO:
Teragen benchmarking for generating sample data
# hadoop jar hadoop-mapreduce-examples.jar teragen 100000000 /user/djdeepak5/terasort-input
17/12/04 14:59:38 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 14:59:38 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 14:59:54 INFO terasort.TeraSort: Generating 100000000 using 2
17/12/04 14:59:56 INFO mapreduce.JobSubmitter: number of splits:2
17/12/04 15:00:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0015
17/12/04 15:00:04 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0015
17/12/04 15:00:04 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0015/
17/12/04 15:00:04 INFO mapreduce.Job: Running job: job_1512403681949_0015
17/12/04 15:00:13 INFO mapreduce.Job: Job job_1512403681949_0015 running in uber mode : false
17/12/04 15:00:13 INFO mapreduce.Job: map 0% reduce 0%
17/12/04 15:00:24 INFO mapreduce.Job: map 1% reduce 0%
17/12/04 15:00:30 INFO mapreduce.Job: map 3% reduce 0%
17/12/04 15:00:36 INFO mapreduce.Job: map 4% reduce 0%
17/12/04 15:00:39 INFO mapreduce.Job: map 5% reduce 0%
17/12/04 15:00:42 INFO mapreduce.Job: map 8% reduce 0%
17/12/04 15:00:48 INFO mapreduce.Job: map 9% reduce 0%
17/12/04 15:00:51 INFO mapreduce.Job: map 11% reduce 0%
17/12/04 15:00:54 INFO mapreduce.Job: map 12% reduce 0%
17/12/04 15:01:03 INFO mapreduce.Job: map 15% reduce 0%
17/12/04 15:01:10 INFO mapreduce.Job: map 16% reduce 0%
17/12/04 15:01:12 INFO mapreduce.Job: map 18% reduce 0%
17/12/04 15:01:16 INFO mapreduce.Job: map 19% reduce 0%
17/12/04 15:01:18 INFO mapreduce.Job: map 20% reduce 0%
17/12/04 15:01:19 INFO mapreduce.Job: map 22% reduce 0%
17/12/04 15:01:24 INFO mapreduce.Job: map 23% reduce 0%
17/12/04 15:01:28 INFO mapreduce.Job: map 24% reduce 0%
17/12/04 15:01:30 INFO mapreduce.Job: map 26% reduce 0%
17/12/04 15:01:31 INFO mapreduce.Job: map 27% reduce 0%
17/12/04 15:01:40 INFO mapreduce.Job: map 28% reduce 0%
17/12/04 15:01:42 INFO mapreduce.Job: map 30% reduce 0%
17/12/04 15:01:45 INFO mapreduce.Job: map 31% reduce 0%
17/12/04 15:01:46 INFO mapreduce.Job: map 32% reduce 0%
17/12/04 15:01:57 INFO mapreduce.Job: map 34% reduce 0%
17/12/04 15:01:58 INFO mapreduce.Job: map 35% reduce 0%
17/12/04 15:02:01 INFO mapreduce.Job: map 36% reduce 0%
17/12/04 15:02:03 INFO mapreduce.Job: map 37% reduce 0%
17/12/04 15:02:04 INFO mapreduce.Job: map 38% reduce 0%
17/12/04 15:02:09 INFO mapreduce.Job: map 39% reduce 0%
17/12/04 15:02:10 INFO mapreduce.Job: map 40% reduce 0%
17/12/04 15:02:16 INFO mapreduce.Job: map 41% reduce 0%
17/12/04 15:02:18 INFO mapreduce.Job: map 43% reduce 0%
17/12/04 15:02:21 INFO mapreduce.Job: map 45% reduce 0%
17/12/04 15:02:24 INFO mapreduce.Job: map 46% reduce 0%
17/12/04 15:02:28 INFO mapreduce.Job: map 47% reduce 0%
17/12/04 15:02:31 INFO mapreduce.Job: map 48% reduce 0%
17/12/04 15:02:37 INFO mapreduce.Job: map 50% reduce 0%
17/12/04 15:02:40 INFO mapreduce.Job: map 51% reduce 0%
17/12/04 15:02:46 INFO mapreduce.Job: map 54% reduce 0%
17/12/04 15:02:52 INFO mapreduce.Job: map 55% reduce 0%
17/12/04 15:02:55 INFO mapreduce.Job: map 58% reduce 0%
17/12/04 15:03:01 INFO mapreduce.Job: map 59% reduce 0%
17/12/04 15:03:07 INFO mapreduce.Job: map 61% reduce 0%
17/12/04 15:03:13 INFO mapreduce.Job: map 62% reduce 0%
17/12/04 15:03:16 INFO mapreduce.Job: map 63% reduce 0%
17/12/04 15:03:19 INFO mapreduce.Job: map 65% reduce 0%
17/12/04 15:03:22 INFO mapreduce.Job: map 66% reduce 0%
17/12/04 15:03:28 INFO mapreduce.Job: map 67% reduce 0%
17/12/04 15:03:31 INFO mapreduce.Job: map 69% reduce 0%
17/12/04 15:03:34 INFO mapreduce.Job: map 70% reduce 0%
17/12/04 15:03:37 INFO mapreduce.Job: map 71% reduce 0%
17/12/04 15:03:40 INFO mapreduce.Job: map 73% reduce 0%
17/12/04 15:03:46 INFO mapreduce.Job: map 75% reduce 0%
17/12/04 15:03:49 INFO mapreduce.Job: map 77% reduce 0%
17/12/04 15:03:55 INFO mapreduce.Job: map 79% reduce 0%
17/12/04 15:03:58 INFO mapreduce.Job: map 81% reduce 0%
17/12/04 15:04:04 INFO mapreduce.Job: map 82% reduce 0%
17/12/04 15:04:07 INFO mapreduce.Job: map 83% reduce 0%
17/12/04 15:04:10 INFO mapreduce.Job: map 85% reduce 0%
17/12/04 15:04:13 INFO mapreduce.Job: map 87% reduce 0%
17/12/04 15:04:19 INFO mapreduce.Job: map 89% reduce 0%
17/12/04 15:04:22 INFO mapreduce.Job: map 90% reduce 0%
17/12/04 15:04:28 INFO mapreduce.Job: map 93% reduce 0%
17/12/04 15:04:34 INFO mapreduce.Job: map 94% reduce 0%
17/12/04 15:04:38 INFO mapreduce.Job: map 95% reduce 0%
17/12/04 15:04:41 INFO mapreduce.Job: map 96% reduce 0%
17/12/04 15:04:44 INFO mapreduce.Job: map 97% reduce 0%
17/12/04 15:04:53 INFO mapreduce.Job: map 98% reduce 0%
17/12/04 15:04:59 INFO mapreduce.Job: map 100% reduce 0%
17/12/04 15:05:18 INFO mapreduce.Job: Job job_1512403681949_0015 completed successfully
17/12/04 15:05:18 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=298084
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=170
HDFS: Number of bytes written=10000000000
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=6141058
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=558278
Total vcore-milliseconds taken by all map tasks=558278
Total megabyte-milliseconds taken by all map tasks=6288443392
Map-Reduce Framework
Map input records=100000000
Map output records=100000000
Input split bytes=170
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=716
CPU time spent (ms)=121900
Physical memory (bytes) snapshot=586973184
Virtual memory (bytes) snapshot=23502049280
Total committed heap usage (bytes)=402128896
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=214760662691937609
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=10000000000
Terasort on the generated data
# hadoop jar hadoop-mapreduce-examples.jar terasort /user/djdeepak5/terasort-input /user/djdeepak5/terasort-output
17/12/04 15:08:41 INFO terasort.TeraSort: starting
17/12/04 15:08:42 INFO input.FileInputFormat: Total input paths to process : 2
Spent 182ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
Computing input splits took 185ms
Sampling 10 splits of 76
Making 1 from 100000 sampled records
Computing parititions took 9525ms
Spent 9713ms computing partitions.
17/12/04 15:08:51 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
17/12/04 15:08:51 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
17/12/04 15:09:05 INFO mapreduce.JobSubmitter: number of splits:76
17/12/04 15:09:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0016
17/12/04 15:09:14 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0016
17/12/04 15:09:14 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0016/
17/12/04 15:09:14 INFO mapreduce.Job: Running job: job_1512403681949_0016
17/12/04 15:09:21 INFO mapreduce.Job: Job job_1512403681949_0016 running in uber mode : false
17/12/04 15:09:21 INFO mapreduce.Job: map 0% reduce 0%
17/12/04 15:09:30 INFO mapreduce.Job: map 1% reduce 0%
17/12/04 15:09:33 INFO mapreduce.Job: map 4% reduce 0%
17/12/04 15:09:35 INFO mapreduce.Job: map 8% reduce 0%
17/12/04 15:09:37 INFO mapreduce.Job: map 10% reduce 0%
17/12/04 15:09:38 INFO mapreduce.Job: map 11% reduce 0%
17/12/04 15:09:42 INFO mapreduce.Job: map 12% reduce 0%
17/12/04 15:09:43 INFO mapreduce.Job: map 15% reduce 0%
17/12/04 15:09:44 INFO mapreduce.Job: map 18% reduce 0%
17/12/04 15:09:47 INFO mapreduce.Job: map 19% reduce 0%
17/12/04 15:09:50 INFO mapreduce.Job: map 20% reduce 0%
17/12/04 15:09:51 INFO mapreduce.Job: map 26% reduce 0%
17/12/04 15:09:52 INFO mapreduce.Job: map 28% reduce 0%
17/12/04 15:09:55 INFO mapreduce.Job: map 29% reduce 0%
17/12/04 15:09:58 INFO mapreduce.Job: map 33% reduce 0%
17/12/04 15:09:59 INFO mapreduce.Job: map 36% reduce 0%
17/12/04 15:10:00 INFO mapreduce.Job: map 37% reduce 0%
17/12/04 15:10:03 INFO mapreduce.Job: map 38% reduce 4%
17/12/04 15:10:05 INFO mapreduce.Job: map 39% reduce 4%
17/12/04 15:10:06 INFO mapreduce.Job: map 41% reduce 4%
17/12/04 15:10:07 INFO mapreduce.Job: map 43% reduce 4%
17/12/04 15:10:08 INFO mapreduce.Job: map 46% reduce 4%
17/12/04 15:10:09 INFO mapreduce.Job: map 46% reduce 5%
17/12/04 15:10:12 INFO mapreduce.Job: map 47% reduce 6%
17/12/04 15:10:15 INFO mapreduce.Job: map 48% reduce 7%
17/12/04 15:10:16 INFO mapreduce.Job: map 50% reduce 7%
17/12/04 15:10:17 INFO mapreduce.Job: map 51% reduce 7%
17/12/04 15:10:18 INFO mapreduce.Job: map 53% reduce 8%
17/12/04 15:10:19 INFO mapreduce.Job: map 55% reduce 8%
17/12/04 15:10:21 INFO mapreduce.Job: map 57% reduce 8%
17/12/04 15:10:22 INFO mapreduce.Job: map 57% reduce 9%
17/12/04 15:10:23 INFO mapreduce.Job: map 58% reduce 9%
17/12/04 15:10:25 INFO mapreduce.Job: map 59% reduce 10%
17/12/04 15:10:26 INFO mapreduce.Job: map 61% reduce 10%
17/12/04 15:10:28 INFO mapreduce.Job: map 61% reduce 11%
17/12/04 15:10:29 INFO mapreduce.Job: map 66% reduce 11%
17/12/04 15:10:31 INFO mapreduce.Job: map 67% reduce 12%
17/12/04 15:10:33 INFO mapreduce.Job: map 68% reduce 12%
17/12/04 15:10:34 INFO mapreduce.Job: map 69% reduce 12%
17/12/04 15:10:35 INFO mapreduce.Job: map 70% reduce 12%
17/12/04 15:10:37 INFO mapreduce.Job: map 72% reduce 14%
17/12/04 15:10:39 INFO mapreduce.Job: map 75% reduce 14%
17/12/04 15:10:40 INFO mapreduce.Job: map 76% reduce 14%
17/12/04 15:10:42 INFO mapreduce.Job: map 77% reduce 14%
17/12/04 15:10:45 INFO mapreduce.Job: map 78% reduce 14%
17/12/04 15:10:46 INFO mapreduce.Job: map 78% reduce 15%
17/12/04 15:10:47 INFO mapreduce.Job: map 82% reduce 15%
17/12/04 15:10:48 INFO mapreduce.Job: map 83% reduce 15%
17/12/04 15:10:49 INFO mapreduce.Job: map 86% reduce 17%
17/12/04 15:10:52 INFO mapreduce.Job: map 86% reduce 18%
17/12/04 15:10:53 INFO mapreduce.Job: map 87% reduce 18%
17/12/04 15:10:55 INFO mapreduce.Job: map 89% reduce 18%
17/12/04 15:10:56 INFO mapreduce.Job: map 91% reduce 18%
17/12/04 15:10:58 INFO mapreduce.Job: map 91% reduce 19%
17/12/04 15:10:59 INFO mapreduce.Job: map 93% reduce 19%
17/12/04 15:11:00 INFO mapreduce.Job: map 96% reduce 19%
17/12/04 15:11:01 INFO mapreduce.Job: map 96% reduce 20%
17/12/04 15:11:03 INFO mapreduce.Job: map 97% reduce 20%
17/12/04 15:11:04 INFO mapreduce.Job: map 100% reduce 22%
17/12/04 15:11:10 INFO mapreduce.Job: map 100% reduce 23%
17/12/04 15:11:13 INFO mapreduce.Job: map 100% reduce 24%
17/12/04 15:11:16 INFO mapreduce.Job: map 100% reduce 25%
17/12/04 15:11:22 INFO mapreduce.Job: map 100% reduce 26%
17/12/04 15:11:25 INFO mapreduce.Job: map 100% reduce 27%
17/12/04 15:11:31 INFO mapreduce.Job: map 100% reduce 28%
17/12/04 15:11:34 INFO mapreduce.Job: map 100% reduce 29%
17/12/04 15:11:37 INFO mapreduce.Job: map 100% reduce 30%
17/12/04 15:11:43 INFO mapreduce.Job: map 100% reduce 31%
17/12/04 15:11:49 INFO mapreduce.Job: map 100% reduce 32%
17/12/04 15:11:55 INFO mapreduce.Job: map 100% reduce 33%
17/12/04 15:12:35 INFO mapreduce.Job: map 100% reduce 38%
17/12/04 15:12:38 INFO mapreduce.Job: map 100% reduce 48%
17/12/04 15:12:41 INFO mapreduce.Job: map 100% reduce 57%
17/12/04 15:12:44 INFO mapreduce.Job: map 100% reduce 66%
17/12/04 15:12:47 INFO mapreduce.Job: map 100% reduce 67%
17/12/04 15:13:08 INFO mapreduce.Job: map 100% reduce 68%
17/12/04 15:13:17 INFO mapreduce.Job: map 100% reduce 69%
17/12/04 15:13:23 INFO mapreduce.Job: map 100% reduce 70%
17/12/04 15:13:29 INFO mapreduce.Job: map 100% reduce 71%
17/12/04 15:13:32 INFO mapreduce.Job: map 100% reduce 72%
17/12/04 15:13:38 INFO mapreduce.Job: map 100% reduce 73%
17/12/04 15:13:50 INFO mapreduce.Job: map 100% reduce 74%
17/12/04 15:13:56 INFO mapreduce.Job: map 100% reduce 75%
17/12/04 15:14:06 INFO mapreduce.Job: map 100% reduce 76%
17/12/04 15:14:09 INFO mapreduce.Job: map 100% reduce 77%
17/12/04 15:14:21 INFO mapreduce.Job: map 100% reduce 78%
17/12/04 15:14:27 INFO mapreduce.Job: map 100% reduce 79%
17/12/04 15:14:33 INFO mapreduce.Job: map 100% reduce 80%
17/12/04 15:14:39 INFO mapreduce.Job: map 100% reduce 81%
17/12/04 15:14:42 INFO mapreduce.Job: map 100% reduce 82%
17/12/04 15:14:51 INFO mapreduce.Job: map 100% reduce 83%
17/12/04 15:15:00 INFO mapreduce.Job: map 100% reduce 84%
17/12/04 15:15:03 INFO mapreduce.Job: map 100% reduce 85%
17/12/04 15:15:15 INFO mapreduce.Job: map 100% reduce 86%
17/12/04 15:15:18 INFO mapreduce.Job: map 100% reduce 87%
17/12/04 15:15:24 INFO mapreduce.Job: map 100% reduce 88%
17/12/04 15:15:30 INFO mapreduce.Job: map 100% reduce 89%
17/12/04 15:15:39 INFO mapreduce.Job: map 100% reduce 90%
17/12/04 15:15:48 INFO mapreduce.Job: map 100% reduce 91%
17/12/04 15:15:57 INFO mapreduce.Job: map 100% reduce 92%
17/12/04 15:16:00 INFO mapreduce.Job: map 100% reduce 93%
17/12/04 15:16:09 INFO mapreduce.Job: map 100% reduce 94%
17/12/04 15:16:15 INFO mapreduce.Job: map 100% reduce 95%
17/12/04 15:16:21 INFO mapreduce.Job: map 100% reduce 96%
17/12/04 15:16:31 INFO mapreduce.Job: map 100% reduce 97%
17/12/04 15:16:40 INFO mapreduce.Job: map 100% reduce 98%
17/12/04 15:16:46 INFO mapreduce.Job: map 100% reduce 99%
17/12/04 15:16:49 INFO mapreduce.Job: map 100% reduce 100%
17/12/04 15:16:57 INFO mapreduce.Job: Job job_1512403681949_0016 completed successfully
17/12/04 15:16:57 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=10400000012
FILE: Number of bytes written=20811586438
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=10000010564
HDFS: Number of bytes written=10000000000
HDFS: Number of read operations=231
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=76
Launched reduce tasks=1
Data-local map tasks=75
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7861986
Total time spent by all reduces in occupied slots (ms)=9260416
Total time spent by all map tasks (ms)=714726
Total time spent by all reduce tasks (ms)=420928
Total vcore-milliseconds taken by all map tasks=714726
Total vcore-milliseconds taken by all reduce tasks=420928
Total megabyte-milliseconds taken by all map tasks=8050673664
Total megabyte-milliseconds taken by all reduce tasks=9482665984
Map-Reduce Framework
Map input records=100000000
Map output records=100000000
Map output bytes=10200000000
Map output materialized bytes=10400000456
Input split bytes=10564
Combine input records=0
Combine output records=0
Reduce input groups=100000000
Reduce shuffle bytes=10400000456
Reduce input records=100000000
Reduce output records=100000000
Spilled Records=200000000
Shuffled Maps =76
Failed Shuffles=0
Merged Map outputs=76
GC time elapsed (ms)=30652
CPU time spent (ms)=882600
Physical memory (bytes) snapshot=199282565120
Virtual memory (bytes) snapshot=914234179584
Total committed heap usage (bytes)=219659370496
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=10000000000
File Output Format Counters
Bytes Written=10000000000
17/12/04 15:16:57 INFO terasort.TeraSort: done
Conclusion:-
IBM Spectrum Scale provides a Enterprise level alternative to HDFS file system used in Big data cluster. HDFS Transparency supports Multifilesystem which helps in leveraging filesystems created on different storage mediums and locations. Thus reducing the need for
frequent data migrations. Remote cluster mount support helps in performing inplace analytics on a remotely mounted filesystem data as well.
Mutli-protocol support in IBM Spectrum Scale filesystem helps in ingesting and performing operations on data easier.
With the Hortonworks HDP support for IBM Spectrum Scale filesystem will allow many existing users to perform data analytics on there existing filesystem data without having to migrate the data to HDFS filesystem.
Related Posts:-
Top Five Benefits of IBM Spectrum Scale with Hortonworks Data Platform
IBM Spectrum Scale and Hortonworks HDP for Winning Big Data Plays
Deploying IBM Spectrum Scale File System using Apache Ambari framework on Hadoop clusters
Big Blue Dancing the Hadoop Dance with Hortonworks
https://hortonworks.com/partner/ibm/
hdp-ibm-spectrum-scale-brings-enterprise-class-storage-place-analytics/
Remote Mount and Multifilesystem support in IBM Spectrum Scale.
IBM Spectrum Scale Performance Tuning
IBM Spectrum Scale System Workloads Tuning in shared nothing cluster
IBM Spectrum Scale system Spark Workloads Tuning
IBM Spectrum Scale system database workloads tuning
IBM Spectrum Scale system performance tuning for hadoop workloads
IBM Spectrum Scale system HDFS Transparency Federation support
IBM Spectrum Scale system HDFS Transparency short-circuit write support.
References:-
IBM Spectrum Scale Hadoop Integration and Support for HortonWorks HDP
HDFS Transparency Protocol
IBM Knowledge Center ( Big data and analytics )
IBM Elastic Storage Server
Apache Ambari Project
Adding IBM Spectrum Scale Service to HDP cluster using existing ESS cluster
Hortonworks Data Platform with IBM Spectrum Scale
Mounting a Remote Spectrum Scale Filesystem
#sharedstorage#HDP#cognitivecomputing#Integratedinfrastructure#HortonworksHDP2.6#IBMSpectrumScale#HAdoop#IBMElasticStorageServer#AmbariServer#Datasecurity#HDFS#ApacheAmbari#Cloudcomputing#IBMSpectrumProtectPlus#Workloadandresourceoptimization#RemoteMount#hortonworkshdp#IBMESS#Softwaredefinedinfrastructure#GPFS#ESS#backupandrecovery#Cloudstorage#Flashstorage#Softwaredefinedstorage#Data-centricdesign#Tapeandvirtualtapestorage#Hortonworks#HDFSTransparencyconnector#Real-timeanalytics