Class Vectorizer<T>

java.lang.Object
org.apache.flink.orc.vector.Vectorizer<T>
Type Parameters:
T - The type of the element
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
RowDataVectorizer

@PublicEvolving public abstract class Vectorizer<T> extends Object implements Serializable
This class provides an abstracted set of methods to handle the lifecycle of VectorizedRowBatch.

Users have to extend this class and override the vectorize() method with the logic to transform the element to a VectorizedRowBatch.

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Adds arbitrary user metadata to the outgoing ORC file.
    org.apache.orc.TypeDescription
    Provides the ORC schema.
    void
    setWriter(org.apache.orc.Writer writer)
    Users are not supposed to use this method since this is intended to be used only by the OrcBulkWriter.
    abstract void
    vectorize(T element, org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch)
    Transforms the provided element to ColumnVectors and sets them in the exposed VectorizedRowBatch.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Vectorizer

      public Vectorizer(String schema)
  • Method Details

    • getSchema

      public org.apache.orc.TypeDescription getSchema()
      Provides the ORC schema.
      Returns:
      the ORC schema
    • setWriter

      public void setWriter(org.apache.orc.Writer writer)
      Users are not supposed to use this method since this is intended to be used only by the OrcBulkWriter.
      Parameters:
      writer - the underlying ORC Writer.
    • addUserMetadata

      public void addUserMetadata(String key, ByteBuffer value)
      Adds arbitrary user metadata to the outgoing ORC file.

      Users who want to dynamically add new metadata either based on either the input or from an external system can do so by calling addUserMetadata(...) inside the overridden vectorize() method.

      Parameters:
      key - a key to label the data with.
      value - the contents of the metadata.
    • vectorize

      public abstract void vectorize(T element, org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) throws IOException
      Transforms the provided element to ColumnVectors and sets them in the exposed VectorizedRowBatch.
      Parameters:
      element - The input element
      batch - The batch to write the ColumnVectors
      Throws:
      IOException - if there is an error while transforming the input.