Disclaimer: This post is a compilation of information from various links that I found on the Internet about Real time Big Data processing. This is for a beginner level audience for the topic.
What are some of the architectures for Real time Big Data processing?
In this post, we will take a quick overview of some of the Real time Big Data processing architectures - Lambda, Kappa and Zeta. These are example manifestations of polyglot processing.
1. Lambda architecture:
Wikipedia defines Lambda architecture as a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream methods. This approach to architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The two view outputs may be joined before presentation. Nathan Marz came up with this architecture.
Here is the Lambda architecture diagram:
Fig 1. Lambda architecture
Reference: https://www.altusinsight.de/en/lambda-architecture
Twitter, Spotify, Liveperson and Inneractive are some of the users of this architecture.
2. Kappa architecture:
Kappa Architecture is a simplification of Lambda Architecture. According to www.kappa-architecture.com, a Kappa architecture system is like a Lambda architecture system with the batch processing system removed. Rather than using a relational DB like SQL or a key-value store like Cassandra, the canonical data store in a Kappa architecture system is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving. To replace batch processing, data is simply fed through the streaming system quickly. Jay Kreps from LinkedIn came up with this architecture.
Here is the Kappa architecture diagram:
Fig 2. Kappa architecture
LinkedIn and Yahoo are some of the users of this architecture.
The following table shows the differences between Lambda and Kappa architectures:
Fig 3. Lambda vs Kappa
Reference: http://www.slideshare.net/DanielMarcous/big-data-real-time-architectures-51967547
The software tech stack used in these architectures are depicted in the following table:
Fig 4. Tech stacks
Reference: http://www.slideshare.net/DanielMarcous/big-data-real-time-architectures-51967547
3. Zeta architecture:
This is the next generation Enterprise architecture cultivated by Jim Scott. This is a pluggable architecture which consists of Distributed file system, Real-time data storage, Pluggable compute model/execution engine, Deployment/container management system, Solution architecture, Enterprise applications and Dynamic and global resource management.
Here is the Zeta architecture diagram:
Fig 5. Zeta architecture
Image courtesy of Jim Scott
Reference: http://radar.oreilly.com/2015/04/zeta-architecture-hexagon-is-the-new-circle.html
Google operates its technology stack using these principles. Anyone who implements this architecture can run at Google scale and efficiency.
Conclusion:
In this post, we took a quick overview of some of the architecture patterns of real time Big Data processing - Lambda, Kappa and Zeta.
References:
http://datadventures.ghost.io/2014/07/06/polyglot-processing/
https://en.wikipedia.org/wiki/Lambda_architecture
http://lambda-architecture.net/
https://www.altusinsight.de/en/lambda-architecture/
http://milinda.pathirage.org/kappa-architecture.com/
http://www.drdobbs.com/database/applying-the-big-data-lambda-architectur/240162604
http://www.infoq.com/news/2014/09/lambda-architecture-questions
http://www.ericsson.com/research-blog/data-knowledge/data-processing-architectures-lambda-and-kappa/
http://www.slideshare.net/DanielMarcous/big-data-real-time-architectures-51967547
http://radar.oreilly.com/2015/04/zeta-architecture-hexagon-is-the-new-circle.html
https://www.mapr.com/blog/introduction-zeta-architecture-whiteboard-walkthrough
http://www.slideshare.net/MapRTechnologies/next-generation-enterprise-architecture-41966097