During many years now, driven by Big data/data-avid projects, streaming platform architecture (lambda or kappa ) is the pattern proposed for ingesting business-valued data into a persistant repository (database, key/value store/object /column storage).
ESB / Enterprise Service Bus, EAI Enterprise Application Integration was/are part of SOA projects also but with some notable differences:
- implementation of Enterprise Integration Pattern like internal routing , based on content
- point to point communications / based on “old” Message Oriented Middleware with lack of performance for PubSub channels, and possible complexity(integration spaghetti)
- “Hub and spoke” message broker / decouple producer & consumer for reusable data but with a pivot format
- GUI for designing BPM
- lot of compute/network overhead with data manipulations
Such products are leading to more and more complexity , due to the break of Single Responsability Principle (SOLID) / central processing
Event-driven / Streaming use cases are not adequately implemented by ESB products (latency for write, scaling).
Solutions for the resiliency of streaming events
=> Distributed message systems with low latency write and immutable
Such as Log Data Store :
- Apache Kafka
- Apache Pulsar
- Apache Bookkeeper
- Azure Cosmos DB Change Feed
- Azure EventHub
- Chronicle Queue
- Pravega …
Apache Kafka is supported (commercial license) mainly by Confluent and Cloudera (Hortonworks)
October 2019: Splunk agrees to acquire Streamlio, a company aimed to support Apache Pulsar
Implements real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Kafka is a great publish/subscribe system (only/no PtP) distributed (with several brokers and local storage) or monolithic (with co-located storage) Data are stored inside one to several Partitions as a Topic, (local internal storage of brokers). Data is indexed by offset id in the topic.
There are plenty of client library implemented the same API. (strong pros argue)
Kafka requires a process supervisor (Apache Zookeeper)
Confluent proposes a SQL layer (KSQL propriertary library). their strategy is to sell Kafka as a DBMS (would lead to perf problems?)
https://etcd.io/ Distributed , reliable Key/value store
embedded inside Kubernetes for service discoverty and cluster state/config
Lowlatency storage service (scalable / distributed)
messagebroker / in memory (low latency) data structure, or key/value database
• Building a unified data processing stack with Apache Pulsar and Apache Spark :
Apache Pulsar is a cloud-native event streaming system. It deploys a cloud-native architecture and a segment-centric storage. Pulsar provides a unified data view on both historic and real-time data. Hence it can be easily integrated with a unified computing engine to build a unified data processing stack.
- No limit on Topics numbers
- PubSub AND Message Queuing
- based on Bookkeeper(for persistence)
- less REX than Kafka