Packages

class HDFSMetadataLog[T <: AnyRef] extends MetadataLog[T] with Logging

A MetadataLog implementation based on HDFS. HDFSMetadataLog uses the specified path as the metadata storage.

When writing a new batch, HDFSMetadataLog will firstly write to a temp file and then rename it to the final batch file. If the rename step fails, there must be multiple writers and only one of them will succeed and the others will fail.

Note: HDFSMetadataLog doesn't support S3-like file systems as they don't guarantee listing files in a directory always shows the latest files.

Linear Supertypes
Logging, MetadataLog[T], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HDFSMetadataLog
  2. Logging
  3. MetadataLog
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new HDFSMetadataLog(sparkSession: SparkSession, path: String)(implicit arg0: ClassTag[T])

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def add(batchId: Long, metadata: T): Boolean

    Store the metadata for the specified batchId and return true if successful.

    Store the metadata for the specified batchId and return true if successful. If the batchId's metadata has already been stored, this method will return false.

    Definition Classes
    HDFSMetadataLogMetadataLog
  5. def addNewBatchByStream(batchId: Long)(fn: (OutputStream) ⇒ Unit): Boolean

    Store the metadata for the specified batchId and return true if successful.

    Store the metadata for the specified batchId and return true if successful. This method fills the content of metadata via executing function. If the function throws an exception, writing will be automatically cancelled and this method will propagate the exception.

    If the batchId's metadata has already been stored, this method will return false.

    Writing the metadata is done by writing a batch to a temp file then rename it to the batch file.

    There may be multiple HDFSMetadataLog using the same metadata path. Although it is not a valid behavior, we still need to prevent it from destroying the files.

  6. def applyFnToBatchByStream[RET](batchId: Long)(fn: (InputStream) ⇒ RET): RET

    Apply provided function to each entry in the specific batch metadata log.

    Apply provided function to each entry in the specific batch metadata log.

    Unlike get which will materialize all entries into memory, this method streamlines the process via READ-AND-PROCESS. This helps to avoid the memory issue on huge metadata log file.

    NOTE: This no longer fails early on corruption. The caller should handle the exception properly and make sure the logic is not affected by failing in the middle.

  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. val batchFilesFilter: PathFilter

    A PathFilter to filter only batch files

    A PathFilter to filter only batch files

    Attributes
    protected
  9. def batchIdToPath(batchId: Long): Path
    Attributes
    protected
  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  11. def deserialize(in: InputStream): T

    Read and deserialize the metadata from input stream.

    Read and deserialize the metadata from input stream. If this method is overridden in a subclass, the overriding method should not close the given input stream, as it will be closed in the caller.

    Attributes
    protected
  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  14. val fileManager: CheckpointFileManager
    Attributes
    protected
  15. def get(startId: Option[Long], endId: Option[Long]): Array[(Long, T)]

    Return metadata for batches between startId (inclusive) and endId (inclusive).

    Return metadata for batches between startId (inclusive) and endId (inclusive). If startId is None, just return all batches before endId (inclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  16. def get(batchId: Long): Option[T]

    Return the metadata for the specified batchId if it's stored.

    Return the metadata for the specified batchId if it's stored. Otherwise, return None.

    Definition Classes
    HDFSMetadataLogMetadataLog
  17. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  18. def getLatest(): Option[(Long, T)]

    Return the latest batch Id and its metadata if exist.

    Return the latest batch Id and its metadata if exist.

    Definition Classes
    HDFSMetadataLogMetadataLog
  19. def getLatestBatchId(): Option[Long]

    Return the latest batch Id without reading the file.

    Return the latest batch Id without reading the file. This method only checks for existence of file to avoid cost on reading and deserializing log file.

  20. def getOrderedBatchFiles(): Array[FileStatus]

    Get an array of [FileStatus] referencing batch files.

    Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.

  21. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  22. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  23. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def isBatchFile(path: Path): Boolean
    Attributes
    protected
  25. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  26. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  27. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  28. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  35. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. val metadataPath: Path
  40. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  41. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  42. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  43. def pathToBatchId(path: Path): Long
    Attributes
    protected
  44. def purge(thresholdBatchId: Long): Unit

    Removes all the log entry earlier than thresholdBatchId (exclusive).

    Removes all the log entry earlier than thresholdBatchId (exclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  45. def purgeAfter(thresholdBatchId: Long): Unit

    Removes all log entries later than thresholdBatchId (exclusive).

  46. def serialize(metadata: T, out: OutputStream): Unit

    Serialize the metadata and write to the output stream.

    Serialize the metadata and write to the output stream. If this method is overridden in a subclass, the overriding method should not close the given output stream, as it will be closed in the caller.

    Attributes
    protected
  47. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  48. def toString(): String
    Definition Classes
    AnyRef → Any
  49. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  50. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  51. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Logging

Inherited from MetadataLog[T]

Inherited from AnyRef

Inherited from Any

Ungrouped