org.apache.hadoop.examples.terasort
Class TeraInputFormatWithCRC
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
org.apache.hadoop.examples.terasort.TeraInputFormatWithCRC
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
public class TeraInputFormatWithCRC
- extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
An input format that reads the first 10 characters of each line as the key
and the rest of the line as the value. Both key and value are represented
as Text.
| Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat |
org.apache.hadoop.mapred.FileInputFormat.Counter |
| Fields inherited from class org.apache.hadoop.mapred.FileInputFormat |
INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES |
|
Method Summary |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf conf,
int splits)
|
static void |
writePartitionFile(org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.fs.Path partFile)
Use the input splits to take samples of the input and generate sample
keys. |
| Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TeraInputFormatWithCRC
public TeraInputFormatWithCRC()
writePartitionFile
public static void writePartitionFile(org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.fs.Path partFile)
throws IOException
- Use the input splits to take samples of the input and generate sample
keys. By default reads 100,000 keys from 10 locations in the input, sorts
them and picks N-1 keys to generate N equally sized partitions.
- Parameters:
conf - the job to samplepartFile - where to write the output file to
- Throws:
IOException - if something goes wrong
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
- Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>- Specified by:
getRecordReader in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
- Throws:
IOException
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf,
int splits)
throws IOException
- Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>- Overrides:
getSplits in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
- Throws:
IOException
Copyright © 2014 Apache Software Foundation. All Rights Reserved.