Commit 16a2a9f6 authored by Yong Tang's avatar Yong Tang Committed by Rasmus Munk Larsen
Browse files

Add KafkaReader for processing streaming data with Apache Kafka (#14098)



* Add KafkaReader for processing streaming data with Apache Kafka

Apache Kafka is a widely used distributed streaming platform in
open source community. The goal of this fix is to create a contrib
Reader ops (inherits ReaderBase and is similiar to
TextLineReader/TFRecordReader) so that it is possible to reader
Kafka streaming data from TensorFlow in a similiar fashion.

This fix uses a C/C++ Apache Kafka client library librdkafka which
is released under the 2-clause BSD license, and is widely used in
a number of Kafka bindings such as Go, Python, C#/.Net, etc.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add KafkaReader Python wrapper.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add BUILD file and op registration for KafkaReader.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add C++ Kernel for KafkaReader

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add librdkafka to third_party packages in Bazel

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add contrib/kafka to part of the contrib bazel file.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Update workspace.bzl

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Comment out clean_deps of `tensorflow/core:framework` and `tensorflow/core:lib`

so that it is possible to build with ReaderBase.

See 1419 for details.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add group id flag.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Sync offset

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add test cases and scipt to start and stop Kafka server (with docker)

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Convert to KafkaConsumer from the legacy Consumer with librdkafka

so that thread join does not hang.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Only output offset as the key.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add timeout attr so that Kafka Consumer could use

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Build Kafka kernels by default, so that to get around the linkage issue.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Convert KafkaReader to KafkaDataset.

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Fix workspace.bzl for kafka with tf_http_archive

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Add public visibility

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Address review feedbacks

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>

* Optionally select Kafka support through ./configure

Signed-off-by: default avatarYong Tang <yong.tang.github@outlook.com>
parent 995378c4
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment