Kafka-based storage for Zipkin.
                    +----------------------------*zipkin*----------------------------------------------
                    |                                     [ dependency-storage ]--->( dependencies      )
                    |                                                  ^        +-->( autocomplete-tags )
( collected-spans )-|->[ partitioning ]   [ aggregation ]    [ trace-storage ]--+-->( traces            )
  via http, kafka,  |       |                    ^    |         ^      |        +-->( service-names     )
  amq, grpc, etc.   +-------|--------------------|----|---------|------|-------------------------------
                            |                    |    |         |      |
----------------------------|--------------------|----|---------|------|-------------------------------
                            +-->( spans )--------+----+---------|      |
                                                      |         |      |
*kafka*                                               +->( traces )    |
 topics                                               |                |
                                                      +->( dependencies )
-------------------------------------------------------------------------------------------------------
Spans collected via different transports are partitioned by
traceIdand stored in a partitioned spans Kafka topic. Partitioned spans are then aggregated into traces and then into dependency links, both results are emitted into Kafka topics as well. These 3 topics are used as source for local stores (Kafka Stream stores) that support Zipkin query and search APIs.
A limitation of zipkin-dependencies module, is that it requires to be scheduled with a defined frequency. This batch-oriented execution causes out-of-date values until processing runs again.
Kafka-based storage enables aggregating dependencies as spans are received, allowing a (near-)real-time calculation of dependency metrics.
To enable this, other components could be disabled. There is a profile prepared to enable aggregation and search of dependency graphs.
This profile can be enable by adding Java option: -Dspring.profiles.active=kafka-only-dependencies
Docker image includes a environment variable to set the profile:
MODULE_OPTS="-Dloader.path=lib -Dspring.profiles.active=kafka-only-dependencies"To try out, there is a Docker compose configuration ready to test.
If an existing Kafka collector is in place downstreaming traces into an existing storage, another Kafka consumer group id can be used for zipkin-storage-kafka to consume traces in parallel. Otherwise, you can forward spans from another Zipkin server  to zipkin-storage-kafka if Kafka transport is not available.
To build the project you will need Java 8+.
make buildAnd testing:
make testIf you want to build a docker image:
make docker-buildTo run locally, first you need to get Zipkin binaries:
make get-zipkinBy default Zipkin will be waiting for a Kafka broker to be running on localhost:19092.
Then run Zipkin locally:
make run-localTo validate storage make sure that Kafka topics are created so Kafka Stream instances can be initialized properly:
make kafka-topics
make zipkin-testThis will start a browser and check a traces has been registered.
It will send another trace after a minute (trace timeout) + 1 second to trigger
aggregation and visualize dependency graph.
If you have Docker available, run:
make run-dockerAnd Docker image will be built and Docker compose will start.
To test it, run:
make zipkin-test-single
# or
make zipkin-test-distributed- Single-node: span partitioning, aggregation, and storage happening on the same containers.
- Distributed-mode: partitioning and aggregation is in a different container than storage.
- Only-dependencies: only components to support aggregation and search of dependency graphs.
This project is inspired in Adrian Cole's VoltDB storage https://github.com/adriancole/zipkin-voltdb
Kafka Streams images are created with https://zz85.github.io/kafka-streams-viz/
All artifacts publish to the group ID "io.zipkin.contrib.zipkin-storage-kafka". We use a common release version for all components.
Releases are at Sonatype and Maven Central
Snapshots are uploaded to Sonatype after commits to master.
Released versions of zipkin-storage-kafka are published to GitHub Container Registry as
beta.zipkin.io/openzipkin-contrib/zipkin-storage-kafka. See docker for details.

