org.apache.hadoop.examples.terasort
Class TeraGenWithCRC
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.examples.terasort.TeraGenWithCRC
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class TeraGenWithCRC
- extends org.apache.hadoop.conf.Configured
- implements org.apache.hadoop.util.Tool
Generate the official terasort input data set.
The user specifies the number of rows and the output directory and this
class runs a map/reduce program to generate the data.
The format of the data is:
- (10 bytes key) (10 bytes rowid) (78 bytes filler) \r \n
- The keys are random characters from the set ' ' .. '~'.
- The rowid is the right justified row id as a int.
- The filler consists of 7 runs of 10 characters from 'A' to 'Z'.
To run the program:
bin/hadoop jar hadoop-*-examples.jar teragen 10000000000 in-dir
|
Nested Class Summary |
static class |
TeraGenWithCRC.SortGenMapper
The Mapper class that given a row number, will generate the appropriate
output line. |
| Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
TeraGenWithCRC
public TeraGenWithCRC()
run
public int run(String[] args)
throws IOException
- Specified by:
run in interface org.apache.hadoop.util.Tool
- Parameters:
args - the cli arguments
- Throws:
IOException
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2014 Apache Software Foundation. All Rights Reserved.