org.apache.hadoop.examples.terasort
Class TeraInputFormatWithCRC

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
      extended by org.apache.hadoop.examples.terasort.TeraInputFormatWithCRC
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>

public class TeraInputFormatWithCRC
extends org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>

An input format that reads the first 10 characters of each line as the key and the rest of the line as the value. Both key and value are represented as Text.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
org.apache.hadoop.mapred.FileInputFormat.Counter
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES
 
Constructor Summary
TeraInputFormatWithCRC()
           
 
Method Summary
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
           
 org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf, int splits)
           
static void writePartitionFile(org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.fs.Path partFile)
          Use the input splits to take samples of the input and generate sample keys.
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TeraInputFormatWithCRC

public TeraInputFormatWithCRC()
Method Detail

writePartitionFile

public static void writePartitionFile(org.apache.hadoop.mapred.JobConf conf,
                                      org.apache.hadoop.fs.Path partFile)
                               throws IOException
Use the input splits to take samples of the input and generate sample keys. By default reads 100,000 keys from 10 locations in the input, sorts them and picks N-1 keys to generate N equally sized partitions.

Parameters:
conf - the job to sample
partFile - where to write the output file to
Throws:
IOException - if something goes wrong

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
                                                                                                                  org.apache.hadoop.mapred.JobConf job,
                                                                                                                  org.apache.hadoop.mapred.Reporter reporter)
                                                                                                           throws IOException
Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Specified by:
getRecordReader in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Throws:
IOException

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf,
                                                       int splits)
                                                throws IOException
Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Overrides:
getSplits in class org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Throws:
IOException


Copyright © 2014 Apache Software Foundation. All Rights Reserved.