Packages

class RocksDBFileManager extends Logging

Class responsible for syncing RocksDB checkpoint files from local disk to DFS. For each version, checkpoint is saved in specific directory structure that allows successive versions to reuse to SST data files and archived log files. This allows each commit to be incremental, only new SST files and archived log files generated by RocksDB will be uploaded. The directory structures on local disk and in DFS are as follows.

Local checkpoint dir structure ------------------------------ RocksDB generates a bunch of files in the local checkpoint directory. The most important among them are the SST files; they are the actual log structured data files. Rest of the files contain the metadata necessary for RocksDB to read the SST files and start from the checkpoint. Note that the SST files are hard links to files in the RocksDB's working directory, and therefore successive checkpoints can share some of the SST files. So these SST files have to be copied to DFS in shared directory such that different committed versions can save them.

We consider both SST files and archived log files as immutable files which can be shared between different checkpoints.

localCheckpointDir | +-- OPTIONS-000005 +-- MANIFEST-000008 +-- CURRENT +-- 00007.sst +-- 00011.sst +-- archive | +-- 00008.log | +-- 00013.log ...

DFS directory structure after saving to DFS as version 10 ----------------------------------------------------------- The SST and archived log files are given unique file names and copied to the shared subdirectory. Every version maintains a mapping of local immutable file name to the unique file name in DFS. This mapping is saved in a JSON file (named metadata), which is zipped along with other checkpoint files into a single file [version].zip.

dfsRootDir | +-- SSTs | +-- 00007-[uuid1].sst | +-- 00011-[uuid2].sst +-- logs | +-- 00008-[uuid3].log | +-- 00013-[uuid4].log +-- 10.zip | +-- metadata <--- contains mapping between 00007.sst and [uuid1].sst, and the mapping between 00008.log and [uuid3].log | +-- OPTIONS-000005 | +-- MANIFEST-000008 | +-- CURRENT | ... | +-- 9.zip +-- 8.zip ...

Note the following. - Each [version].zip is a complete description of all the data and metadata needed to recover a RocksDB instance at the corresponding version. The SST files and log files are not included in the zip files, they can be shared cross different versions. This is unlike the [version].delta files of HDFSBackedStateStore where previous delta files needs to be read to be recovered. - This is safe wrt speculatively executed tasks running concurrently in different executors as each task would upload a different copy of the generated immutable files and atomically update the [version].zip. - Immutable files are identified uniquely based on their file name and file size. - Immutable files can be reused only across adjacent checkpoints/versions. - This class is thread-safe. Specifically, it is safe to concurrently delete old files from a different thread than the task thread saving files.

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RocksDBFileManager
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RocksDBFileManager(dfsRootDir: String, localTempDir: File, hadoopConf: Configuration, loggingId: String = "")

    dfsRootDir

    Directory where the [version].zip files will be stored

    localTempDir

    Local directory for temporary work

    hadoopConf

    Hadoop configuration for talking to DFS

    loggingId

    Id that will be prepended in logs for isolating concurrent RocksDBs

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  6. def deleteOldVersions(numVersionsToRetain: Int): Unit

    Delete old versions by deleting the associated version and SST files.

    Delete old versions by deleting the associated version and SST files. At a high-level, this method finds which versions to delete, and which SST files that were last used in those versions. It's safe to delete these SST files because a SST file can be reused only in successive versions. Therefore, if a SST file F was last used in version V, then it won't be used in version V+1 or later, and if version V can be deleted, then F can safely be deleted as well.

    To find old files, it does the following. - List all the existing [version].zip files - Find the min version that needs to be retained based on the given numVersionsToRetain. - Accordingly decide which versions should be deleted. - Resolve all SSTs files of all the existing versions, if not already resolved. - Find what was the latest version in which each SST file was used. - Delete the files that were last used in the to-be-deleted versions as we will not need those files any more.

    Note that it only deletes files that it knows are safe to delete. It may not delete the following files. - Partially written SST files - SST files that were used in a version, but that version got overwritten with a different set of SST files.

  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  10. def getLatestVersion(): Long

    Get the latest version available in the DFS directory.

    Get the latest version available in the DFS directory. If no data present, it returns 0.

  11. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  12. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  13. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  16. def latestLoadCheckpointMetrics: RocksDBFileManagerMetrics
  17. def latestSaveCheckpointMetrics: RocksDBFileManagerMetrics
  18. def loadCheckpointFromDfs(version: Long, localDir: File): RocksDBCheckpointMetadata

    Load all necessary files for specific checkpoint version from DFS to given local directory.

    Load all necessary files for specific checkpoint version from DFS to given local directory. If version is 0, then it will delete all files in the directory. For other versions, it ensures that only the exact files generated during checkpointing will be present in the local directory.

  19. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  20. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  21. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  22. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  26. def logName: String
    Attributes
    protected
    Definition Classes
    RocksDBFileManager → Logging
  27. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  32. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  33. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  34. def saveCheckpointToDfs(checkpointDir: File, version: Long, numKeys: Long): Unit

    Save all the files in given local checkpoint directory as a committed version in DFS

  35. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  36. def toString(): String
    Definition Classes
    AnyRef → Any
  37. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  39. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped