Class BinaryStringData

java.lang.Object
org.apache.flink.table.data.binary.LazyBinaryFormat<String>
org.apache.flink.table.data.binary.BinaryStringData
All Implemented Interfaces:
Comparable<StringData>, BinaryFormat, StringData

@Internal public final class BinaryStringData extends LazyBinaryFormat<String> implements StringData
A lazily binary implementation of StringData which is backed by MemorySegments and String.

Either MemorySegments or String must be provided when constructing BinaryStringData. The other representation will be materialized when needed.

It provides many useful methods for comparison, search, and so on.

  • Field Details

  • Constructor Details

    • BinaryStringData

      public BinaryStringData()
    • BinaryStringData

      public BinaryStringData(String javaObject)
    • BinaryStringData

      public BinaryStringData(org.apache.flink.core.memory.MemorySegment[] segments, int offset, int sizeInBytes)
    • BinaryStringData

      public BinaryStringData(org.apache.flink.core.memory.MemorySegment[] segments, int offset, int sizeInBytes, String javaObject)
  • Method Details

    • fromAddress

      public static BinaryStringData fromAddress(org.apache.flink.core.memory.MemorySegment[] segments, int offset, int numBytes)
      Creates a BinaryStringData instance from the given address (base and offset) and length.
    • fromString

      public static BinaryStringData fromString(String str)
      Creates a BinaryStringData instance from the given Java string.
    • fromBytes

      public static BinaryStringData fromBytes(byte[] bytes)
      Creates a BinaryStringData instance from the given UTF-8 bytes.
    • fromBytes

      public static BinaryStringData fromBytes(byte[] bytes, int offset, int numBytes)
      Creates a BinaryStringData instance from the given UTF-8 bytes with offset and number of bytes.
    • blankString

      public static BinaryStringData blankString(int length)
      Creates a BinaryStringData instance that contains `length` spaces.
    • toBytes

      public byte[] toBytes()
      Description copied from interface: StringData
      Converts this StringData object to a UTF-8 byte array.

      Note: The returned byte array may be reused.

      Specified by:
      toBytes in interface StringData
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Description copied from interface: StringData
      Converts this StringData object to a String.
      Specified by:
      toString in interface StringData
      Overrides:
      toString in class Object
    • compareTo

      public int compareTo(@Nonnull StringData o)
      Compares two strings lexicographically. Since UTF-8 uses groups of six bits, it is sometimes useful to use octal notation which uses 3-bit groups. With a calculator which can convert between hexadecimal and octal it can be easier to manually create or interpret UTF-8 compared with using binary. So we just compare the binary.
      Specified by:
      compareTo in interface Comparable<StringData>
    • numChars

      public int numChars()
      Returns the number of UTF-8 code points in the string.
    • byteAt

      public byte byteAt(int index)
      Returns the byte value at the specified index. An index ranges from 0 to binarySection.sizeInBytes - 1.
      Parameters:
      index - the index of the byte value.
      Returns:
      the byte value at the specified index of this UTF-8 bytes.
      Throws:
      IndexOutOfBoundsException - if the index argument is negative or not less than the length of this UTF-8 bytes.
    • getSegments

      public org.apache.flink.core.memory.MemorySegment[] getSegments()
      Description copied from interface: BinaryFormat
      Gets the underlying MemorySegments this binary format spans.
      Specified by:
      getSegments in interface BinaryFormat
      Overrides:
      getSegments in class LazyBinaryFormat<String>
    • getOffset

      public int getOffset()
      Description copied from interface: BinaryFormat
      Gets the start offset of this binary data in the MemorySegments.
      Specified by:
      getOffset in interface BinaryFormat
      Overrides:
      getOffset in class LazyBinaryFormat<String>
    • getSizeInBytes

      public int getSizeInBytes()
      Description copied from interface: BinaryFormat
      Gets the size in bytes of this binary data.
      Specified by:
      getSizeInBytes in interface BinaryFormat
      Overrides:
      getSizeInBytes in class LazyBinaryFormat<String>
    • ensureMaterialized

      public void ensureMaterialized()
    • materialize

      protected BinarySection materialize(org.apache.flink.api.common.typeutils.TypeSerializer<String> serializer)
      Description copied from class: LazyBinaryFormat
      Materialize java object to binary format. Inherited classes need to hold the information they need. (For example, RawValueData needs javaObjectSerializer).
      Specified by:
      materialize in class LazyBinaryFormat<String>
    • copy

      public BinaryStringData copy()
      Copy a new BinaryStringData.
    • substring

      public BinaryStringData substring(int beginIndex, int endIndex)
      Returns a binary string that is a substring of this binary string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1.

      Examples:

       fromString("hamburger").substring(4, 8) returns binary string "urge"
       fromString("smiles").substring(1, 5) returns binary string "mile"
       
      Parameters:
      beginIndex - the beginning index, inclusive.
      endIndex - the ending index, exclusive.
      Returns:
      the specified substring, return EMPTY_UTF8 when index out of bounds instead of StringIndexOutOfBoundsException.
    • contains

      public boolean contains(BinaryStringData s)
      Returns true if and only if this BinaryStringData contains the specified sequence of bytes values.
      Parameters:
      s - the sequence to search for
      Returns:
      true if this BinaryStringData contains s, false otherwise
    • startsWith

      public boolean startsWith(BinaryStringData prefix)
      Tests if this BinaryStringData starts with the specified prefix.
      Parameters:
      prefix - the prefix.
      Returns:
      true if the bytes represented by the argument is a prefix of the bytes represented by this string; false otherwise. Note also that true will be returned if the argument is an empty BinaryStringData or is equal to this BinaryStringData object as determined by the equals(Object) method.
    • endsWith

      public boolean endsWith(BinaryStringData suffix)
      Tests if this BinaryStringData ends with the specified suffix.
      Parameters:
      suffix - the suffix.
      Returns:
      true if the bytes represented by the argument is a suffix of the bytes represented by this object; false otherwise. Note that the result will be true if the argument is the empty string or is equal to this BinaryStringData object as determined by the equals(Object) method.
    • trim

      public BinaryStringData trim()
      Returns a string whose value is this string, with any leading and trailing whitespace removed.
      Returns:
      A string whose value is this string, with any leading and trailing white space removed, or this string if it has no leading or trailing white space.
    • indexOf

      public int indexOf(BinaryStringData str, int fromIndex)
      Returns the index within this string of the first occurrence of the specified substring, starting at the specified index.
      Parameters:
      str - the substring to search for.
      fromIndex - the index from which to start the search.
      Returns:
      the index of the first occurrence of the specified substring, starting at the specified index, or -1 if there is no such occurrence.
    • toUpperCase

      public BinaryStringData toUpperCase()
      Converts all of the characters in this BinaryStringData to upper case.
      Returns:
      the BinaryStringData, converted to uppercase.
    • toLowerCase

      public BinaryStringData toLowerCase()
      Converts all of the characters in this BinaryStringData to lower case.
      Returns:
      the BinaryStringData, converted to lowercase.