Distributed Systems & Micro Services

Uber- started as monolith where following modules are into single appln integrated with each other. Tight coupling with single codebase

Passenger mgmt
driver mgmt
trip mgmt
billing
notification service

Challenges

Every bugfix require rebuilding complete appln & testing as it is single code base
Fixing bug is extremely difficult in single repository
scaling one module over other is difficult

Micro Services - Advantages

Deploy modules independently with out changing any other modules
Modules are scaled independently
Bug fixing will be easy and doesn't interfere with other modules
API Gateway/reverse proxy

routes the requests to right micro services
single entry points for all requests. Helps in analytics/metrics generation

Different modules communicate via message queues(ActiveMQ)

Active MQ

Async message flow
JMS Model

Queue based -> Point to Point communication
Topic based -> Publisher, Subscriber model.

Receiver

synchronously receive a message
Use listener to notify synchronous consumers that there is a message available

How to Handle Cascading failure in System Design Problems

If you design a distributed system, how will you handle fault tolerance. How you enable system to continue operate properly in the event of failure of some of components

eg: Uber has say micro services Trip Mgmt & Billing
Trip management is invoked by REST api
Trip management will have Executor Service(Thread pool)
one of the thread from pool will be used to contact Billing service
What if Billing Service is down OR Slow due to DB slowness

Executor service will keep on trying to contact Billing service which may impact system performance

When one of the service is slow - it may impact other services to be slow and cause cascading impact(Notification service also can go slow). Services connected to notification also can be slow
This is cascading failures
Circuit Breaker Pattern

Don't communicate with other services like Billing directly
Rather communicate with Interceptor and Interceptor will communicate with Billing
When Interceptor detects Billing is down, it will stop calling it for sometime

Advantages

Allows Billing server to recover from overload situation
Interceptor responds immediately saying service not available and timeout doesn't happen
Interceptor maintains number of pass/fail requests to billing
When fail requests are more than threshold it stops sending requests to the micro service for some time
Once timer expired interceptor goes to ALLOW PARTIAL state. Later to open once the requests failures stop

Projects that implement curcuit breaker concept

Hystrix & resilience4j
Hytrix provides only circuit breaker
resilience4J provides

circuit breaker
rate limiter
retry
bulkhead feature

Decorator design pattern

Handling and Managing the cluster of machine in Distributed Environment.

Benefits of distributed microservices

small codebase
easy to deploy
easy to scale
extensible

Challenges - managing micro services

Service Discover

How one micro service knows the ip address of other micro services
Stores the service id for each micro service
Any service will lookup in service discovery gets all ip address for the service

Load balancing

load balance within serves that support same service

Distributed tracking

for debugging
traceid for the complete flow

Authentication/Authorization/network policy

Security -

Monitoring

How to overcome the challenges

Sidecar - It performs all non-functional services listed above
All machines will have their side car
Control Panel -

To manage all the side cars -> managed by operation team
Traffic management(configure % of traffic to any specific server)

Istio - Project for control panel
Envoy - Project for sidecar

Kafka

In distributed systems, there would be 100's of requests/events in a sec
Huge number of events in a very short time - the messaging system like activeMQ cannot handle. Active MQ works on message systems and not Streams.
When requests come in form of Streams then we need distributed messaging system like Kafka
Kafka is highly available and resilience in node failure.
It supports automatic recovery
Fast, scalable and fault tolerant
It works on publisher-subscriber messaging system
Components of Kafka

Kafka Broker
Kafka Topic
Partitions
Consumer Group
kafka producer
kafka consumer

Kafka cluster is group of machines/brokers. Producer sends to cluster and consumer receives from cluster
Kafka Topics

Amazon inventory has many suppliers of groceries
one topic for Grocery for suppliers to send the line details
another for electornics

Topics are distributed across multiple brokers
Topics are broken into partitions. Partitions are put into different brokers
This is the way the huge distributed systems are managed.
One consumer can take data from diff Brokers
One broker cannot send data to multiple consumers within a consumer group
Fault Tolerance

How to recover data if one broker dies
Create multiple copies of partitions and store in multiple machines(called as replication factor0
For every partition - one broker is marked as leader
Only leader will communicate with consumer & producers

Kafka uses zookeeper to manage clusters of machines

zookeeper maintains all the list of brokers in system using heart beat
If a broker node is shutting down, then zookeeper role is to tell replicas to act as leaders
Stores config of all Kafka systems

It stores number of partitions of each Kafka topic
Locations of all replicas of a particular partition
which broker nodes are leaders and which are followers

Concurrency and System failure issues in DB transaction in System Design

Two-Phase Commit

API gateway -> Transaction Coordinator -> Order micro service & inventory microservices

Prepare -> locks the records
Commit -> starts only Prepare is successful from all the services

Disadvantages

Some time it can cause perf issues as Transaction coordinator is dependent on multipler services
It can cause dead lock situations

SAGA -> to overcome above disadvantages

Two-phase is syncronus in nature - Hence slow
Saga is Async. Diff micro services communicate using Event Bus/queue.
Event bus works with pub/sub model
Flow

e-commerge api gateway -> inserts Create Order event in Event Bus -> Order Microservice which is subscribed will create order and emits another event Order Created. This is consumed by api gateway
Reserve Event for Inventory service -> Item reserved -> event bus

If any event fails - system will create compensating events to rollback

Distributed Transaction of backend in Distributed System

How to address System failures in middle of txn
concurrency issue
ACID Properties

Automicity - all operations are executed or none executed
Consistency - DB should be in consistent state after any number of txns
Isolation - Every txn should be isolated. No txn will have adverse impact in any existing txn
Durability -After commit -> data should be persisted

Distributed Transaction

E-commerse -> Process order(verify wallet balance) + reserve item(verify inventory existence)

Complete steps of Building and Deploying App in Docker Container Cloud Platform

Docker Registry:

Storage and distributed system for docker images
It has multiple repositories
we push images to repositories

product lifecycle

developers -> git repository -> Jenkins -> Registry

Search This Blog

Interview Questions

Distributed Systems & Micro Services

Comments

Post a Comment