Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver. This stream can guarantee that each message from Kafka is included in transformations exactly once (see points below).
Points to note: - No receivers: This stream does not use any receiver. It directly queries Kafka - Offsets: This does not use Zookeeper to store offsets. The consumed offsets are tracked by the stream itself. You can access the offsets used in each batch from the generated RDDs (see org.apache.spark.streaming.kafka.v09.HasOffsetRanges). - Failure Recovery: To recover from driver failures, you have to enable checkpointing in the StreamingContext. The information on consumed offset can be recovered from the checkpoint. See the programming guide for details (constraints, etc.). - End-to-end semantics: This stream ensures that every records is effectively received and transformed exactly once, but gives no guarantees on whether the transformed data are outputted exactly once. For end-to-end exactly-once semantics, you have to either ensure that the output operation is idempotent, or use transactions to output records atomically. See the programming guide for more details.
type of Kafka message key
type of Kafka message value
JavaStreamingContext object
Class of the keys in the Kafka records
Class of the values in the Kafka records
Kafka configuration parameters. Requires "bootstrap.servers" to be set with Kafka broker(s) (NOT zookeeper servers), specified in host1:port1,host2:port2 form. If not starting from a checkpoint, "auto.offset.reset" may be set to "latest" or "earliest" to determine where the stream starts (defaults to "latest")
Names of the topics to consume
DStream of (Kafka message key, Kafka message value)
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver. This stream can guarantee that each message from Kafka is included in transformations exactly once (see points below).
Points to note: - No receivers: This stream does not use any receiver. It directly queries Kafka - Offsets: This does not use Zookeeper to store offsets. The consumed offsets are tracked by the stream itself. You can access the offsets used in each batch from the generated RDDs (see org.apache.spark.streaming.kafka.v09.HasOffsetRanges). - Failure Recovery: To recover from driver failures, you have to enable checkpointing in the StreamingContext. The information on consumed offset can be recovered from the checkpoint. See the programming guide for details (constraints, etc.). - End-to-end semantics: This stream ensures that every records is effectively received and transformed exactly once, but gives no guarantees on whether the transformed data are outputted exactly once. For end-to-end exactly-once semantics, you have to either ensure that the output operation is idempotent, or use transactions to output records atomically. See the programming guide for more details.
type of Kafka message key
type of Kafka message value
type returned by messageHandler
JavaStreamingContext object
Class of the keys in the Kafka records
Class of the values in the Kafka records
Class of the records in DStream
Kafka configuration parameters. Requires "bootstrap.servers" specified in host1:port1,host2:port2 form.
Per-topic/partition Kafka offsets defining the (inclusive) starting point of the stream
Function for translating each message and metadata into the desired type
DStream of R
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver. This stream can guarantee that each message from Kafka is included in transformations exactly once (see points below).
Points to note: - No receivers: This stream does not use any receiver. It directly queries Kafka - Offsets: This does not use Zookeeper to store offsets. The consumed offsets are tracked by the stream itself. You can access the offsets used in each batch from the generated RDDs (see org.apache.spark.streaming.kafka.v09.HasOffsetRanges). - Failure Recovery: To recover from driver failures, you have to enable checkpointing in the StreamingContext. The information on consumed offset can be recovered from the checkpoint. See the programming guide for details (constraints, etc.). - End-to-end semantics: This stream ensures that every records is effectively received and transformed exactly once, but gives no guarantees on whether the transformed data are outputted exactly once. For end-to-end exactly-once semantics, you have to either ensure that the output operation is idempotent, or use transactions to output records atomically. See the programming guide for more details.
type of Kafka message key
type of Kafka message value
StreamingContext object
Kafka configuration parameters. Requires "bootstrap.servers" to be set with Kafka broker(s) (NOT zookeeper servers), specified in host1:port1,host2:port2 form. If not starting from a checkpoint, "auto.offset.reset" may be set to "earliest" or "latest" to determine where the stream starts (defaults to "latest")
Names of the topics to consume
DStream of (Kafka message key, Kafka message value)
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver.
Create an input stream that directly pulls messages from Kafka Brokers without using any receiver. This stream can guarantee that each message from Kafka is included in transformations exactly once (see points below).
Points to note: - No receivers: This stream does not use any receiver. It directly queries Kafka - Offsets: This does not use Zookeeper to store offsets. The consumed offsets are tracked by the stream itself. You can access the offsets used in each batch from the generated RDDs (see org.apache.spark.streaming.kafka.v09.HasOffsetRanges). - Failure Recovery: To recover from driver failures, you have to enable checkpointing in the StreamingContext. The information on consumed offset can be recovered from the checkpoint. See the programming guide for details (constraints, etc.). - End-to-end semantics: This stream ensures that every records is effectively received and transformed exactly once, but gives no guarantees on whether the transformed data are outputted exactly once. For end-to-end exactly-once semantics, you have to either ensure that the output operation is idempotent, or use transactions to output records atomically. See the programming guide for more details.
type of Kafka message key
type of Kafka message value
type returned by messageHandler
StreamingContext object
Kafka configuration parameters. Requires "bootstrap.servers" to be set with Kafka broker(s) (NOT zookeeper servers) specified in host1:port1,host2:port2 form.
Per-topic/partition Kafka offsets defining the (inclusive) starting point of the stream
Function for translating each message and metadata into the desired type
DStream of R
Create a RDD from Kafka using offset ranges for each topic and partition.
Create a RDD from Kafka using offset ranges for each topic and partition. This allows you specify the Kafka leader to connect to (to optimize fetching) and access the message as well as the metadata.
type of Kafka message key
type of Kafka message value
type returned by messageHandler
JavaSparkContext object
Kafka configuration parameters. Requires "bootstrap.servers" specified in host1:port1,host2:port2 form.
Each OffsetRange in the batch corresponds to a range of offsets for a given Kafka topic/partition
Function for translating each message and metadata into the desired type
RDD of R
Create a RDD from Kafka using offset ranges for each topic and partition.
Create a RDD from Kafka using offset ranges for each topic and partition.
type of Kafka message key
type of Kafka message value
JavaSparkContext object
type of Kafka message key
type of Kafka message value
Kafka configuration parameters. Requires "bootstrap.servers" specified in host1:port1,host2:port2 form.
Each OffsetRange in the batch corresponds to a range of offsets for a given Kafka topic/partition
RDD of (Kafka message key, Kafka message value)
Create a RDD from Kafka using offset ranges for each topic and partition.
Create a RDD from Kafka using offset ranges for each topic and partition. This allows you specify the Kafka leader to connect to (to optimize fetching) and access the message as well as the metadata.
type of Kafka message key
type of Kafka message value
type returned by messageHandler
SparkContext object
Kafka configuration parameters. Requires "bootstrap.servers" to be set with Kafka broker(s) (NOT zookeeper servers) specified in host1:port1,host2:port2 form.
Each OffsetRange in the batch corresponds to a range of offsets for a given Kafka topic/partition
Function for translating each message and metadata into the desired type * @tparam K type of Kafka message key
RDD of R
Create a RDD from Kafka using offset ranges for each topic and partition.
Create a RDD from Kafka using offset ranges for each topic and partition.
type of Kafka message key
type of Kafka message value
SparkContext object
Kafka configuration parameters. Requires "bootstrap.servers" to be set with Kafka broker(s) (NOT zookeeper servers) specified in host1:port1,host2:port2 form.
Each OffsetRange in the batch corresponds to a range of offsets for a given Kafka topic/partition
RDD of (Kafka message key, Kafka message value)