org.apache.hadoop.contrib.utils.join
Class DataJoinReducerBase
java.lang.Object
org.apache.hadoop.contrib.utils.join.JobBase
org.apache.hadoop.contrib.utils.join.DataJoinReducerBase
- All Implemented Interfaces:
- Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper, org.apache.hadoop.mapred.Reducer
public abstract class DataJoinReducerBase
- extends JobBase
This abstract class serves as the base class for the reducer class of a data
join job. The reduce function will first group the values according to their
input tags, and then compute the cross product of over the groups. For each
tuple in the cross product, it calls the following method, which is expected
to be implemented in a subclass.
protected abstract TaggedMapOutput combine(Object[] tags, Object[] values);
The above method is expected to produce one output value from an array of
records of different sources. The user code can also perform filtering here.
It can return null if it decides to the records do not meet certain
conditions.
| Fields inherited from class org.apache.hadoop.contrib.utils.join.JobBase |
LOG |
|
Method Summary |
void |
close()
|
protected void |
collect(Object key,
TaggedMapOutput aRecord,
org.apache.hadoop.mapred.OutputCollector output,
org.apache.hadoop.mapred.Reporter reporter)
The subclass can overwrite this method to perform additional filtering
and/or other processing logic before a value is collected. |
protected abstract TaggedMapOutput |
combine(Object[] tags,
Object[] values)
|
void |
configure(org.apache.hadoop.mapred.JobConf job)
Initializes a new instance from a JobConf. |
protected ResetableIterator |
createResetableIterator()
The subclass can provide a different implementation on ResetableIterator. |
void |
map(Object arg0,
Object arg1,
org.apache.hadoop.mapred.OutputCollector arg2,
org.apache.hadoop.mapred.Reporter arg3)
|
void |
reduce(Object key,
Iterator values,
org.apache.hadoop.mapred.OutputCollector output,
org.apache.hadoop.mapred.Reporter reporter)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
reporter
protected org.apache.hadoop.mapred.Reporter reporter
largestNumOfValues
protected long largestNumOfValues
numOfValues
protected long numOfValues
collected
protected long collected
job
protected org.apache.hadoop.mapred.JobConf job
SOURCE_TAGS_FIELD
public static org.apache.hadoop.io.Text SOURCE_TAGS_FIELD
NUM_OF_VALUES_FIELD
public static org.apache.hadoop.io.Text NUM_OF_VALUES_FIELD
DataJoinReducerBase
public DataJoinReducerBase()
close
public void close()
throws IOException
- Throws:
IOException
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Description copied from class:
JobBase
- Initializes a new instance from a
JobConf.
- Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable- Overrides:
configure in class JobBase
- Parameters:
job - the configuration
createResetableIterator
protected ResetableIterator createResetableIterator()
- The subclass can provide a different implementation on ResetableIterator.
This is necessary if the number of values in a reduce call is very high.
The default provided here uses ArrayListBackedIterator
- Returns:
- an Object of ResetableIterator.
reduce
public void reduce(Object key,
Iterator values,
org.apache.hadoop.mapred.OutputCollector output,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
- Throws:
IOException
collect
protected void collect(Object key,
TaggedMapOutput aRecord,
org.apache.hadoop.mapred.OutputCollector output,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
- The subclass can overwrite this method to perform additional filtering
and/or other processing logic before a value is collected.
- Parameters:
key - aRecord - output - reporter -
- Throws:
IOException
combine
protected abstract TaggedMapOutput combine(Object[] tags,
Object[] values)
- Parameters:
tags - a list of source tagsvalues - a value per source
- Returns:
- combined value derived from values of the sources
map
public void map(Object arg0,
Object arg1,
org.apache.hadoop.mapred.OutputCollector arg2,
org.apache.hadoop.mapred.Reporter arg3)
throws IOException
- Throws:
IOException
Copyright © 2014 Apache Software Foundation. All Rights Reserved.