Distributed Systems & Micro Services
Uber- started as monolith where following modules are into single appln integrated with each other. Tight coupling with single codebase
- Passenger mgmt
- driver mgmt
- trip mgmt
- billing
- notification service
- Every bugfix require rebuilding complete appln & testing as it is single code base
- Fixing bug is extremely difficult in single repository
- scaling one module over other is difficult
- Deploy modules independently with out changing any other modules
- Modules are scaled independently
- Bug fixing will be easy and doesn't interfere with other modules
- API Gateway/reverse proxy
- routes the requests to right micro services
- single entry points for all requests. Helps in analytics/metrics generation
- Different modules communicate via message queues(ActiveMQ)
- Async message flow
- JMS Model
- Queue based -> Point to Point communication
- Topic based -> Publisher, Subscriber model.
- Receiver
- synchronously receive a message
- Use listener to notify synchronous consumers that there is a message available
How to Handle Cascading failure in System Design Problems
- If you design a distributed system, how will you handle fault tolerance. How you enable system to continue operate properly in the event of failure of some of components
- eg: Uber has say micro services Trip Mgmt & Billing
- Trip management is invoked by REST api
- Trip management will have Executor Service(Thread pool)
- one of the thread from pool will be used to contact Billing service
- What if Billing Service is down OR Slow due to DB slowness
- Executor service will keep on trying to contact Billing service which may impact system performance
- When one of the service is slow - it may impact other services to be slow and cause cascading impact(Notification service also can go slow). Services connected to notification also can be slow
- This is cascading failures
- Circuit Breaker Pattern
- Don't communicate with other services like Billing directly
- Rather communicate with Interceptor and Interceptor will communicate with Billing
- When Interceptor detects Billing is down, it will stop calling it for sometime
- Advantages
- Allows Billing server to recover from overload situation
- Interceptor responds immediately saying service not available and timeout doesn't happen
- Interceptor maintains number of pass/fail requests to billing
- When fail requests are more than threshold it stops sending requests to the micro service for some time
- Once timer expired interceptor goes to ALLOW PARTIAL state. Later to open once the requests failures stop
- Projects that implement curcuit breaker concept
- Hystrix & resilience4j
- Hytrix provides only circuit breaker
- resilience4J provides
- circuit breaker
- rate limiter
- retry
- bulkhead feature
- Decorator design pattern
Handling and Managing the cluster of machine in Distributed Environment.
- Benefits of distributed microservices
- small codebase
- easy to deploy
- easy to scale
- extensible
- Challenges - managing micro services
- Service Discover
- How one micro service knows the ip address of other micro services
- Stores the service id for each micro service
- Any service will lookup in service discovery gets all ip address for the service
- Load balancing
- load balance within serves that support same service
- Distributed tracking
- for debugging
- traceid for the complete flow
- Authentication/Authorization/network policy
- Security -
- Monitoring
- How to overcome the challenges
- Sidecar - It performs all non-functional services listed above
- All machines will have their side car
- Control Panel -
- To manage all the side cars -> managed by operation team
- Traffic management(configure % of traffic to any specific server)
- Istio - Project for control panel
- Envoy - Project for sidecar
Kafka
- In distributed systems, there would be 100's of requests/events in a sec
- Huge number of events in a very short time - the messaging system like activeMQ cannot handle. Active MQ works on message systems and not Streams.
- When requests come in form of Streams then we need distributed messaging system like Kafka
- Kafka is highly available and resilience in node failure.
- It supports automatic recovery
- Fast, scalable and fault tolerant
- It works on publisher-subscriber messaging system
- Components of Kafka
- Kafka Broker
- Kafka Topic
- Partitions
- Consumer Group
- kafka producer
- kafka consumer
- Kafka cluster is group of machines/brokers. Producer sends to cluster and consumer receives from cluster
- Kafka Topics
- Amazon inventory has many suppliers of groceries
- one topic for Grocery for suppliers to send the line details
- another for electornics
- Topics are distributed across multiple brokers
- Topics are broken into partitions. Partitions are put into different brokers
- This is the way the huge distributed systems are managed.
- One consumer can take data from diff Brokers
- One broker cannot send data to multiple consumers within a consumer group
- Fault Tolerance
- How to recover data if one broker dies
- Create multiple copies of partitions and store in multiple machines(called as replication factor0
- For every partition - one broker is marked as leader
- Only leader will communicate with consumer & producers
- Kafka uses zookeeper to manage clusters of machines
- zookeeper maintains all the list of brokers in system using heart beat
- If a broker node is shutting down, then zookeeper role is to tell replicas to act as leaders
- Stores config of all Kafka systems
- It stores number of partitions of each Kafka topic
- Locations of all replicas of a particular partition
- which broker nodes are leaders and which are followers
- Two-Phase Commit
- API gateway -> Transaction Coordinator -> Order micro service & inventory microservices
- Prepare -> locks the records
- Commit -> starts only Prepare is successful from all the services
- Disadvantages
- Some time it can cause perf issues as Transaction coordinator is dependent on multipler services
- It can cause dead lock situations
- SAGA -> to overcome above disadvantages
- Two-phase is syncronus in nature - Hence slow
- Saga is Async. Diff micro services communicate using Event Bus/queue.
- Event bus works with pub/sub model
- Flow
- e-commerge api gateway -> inserts Create Order event in Event Bus -> Order Microservice which is subscribed will create order and emits another event Order Created. This is consumed by api gateway
- Reserve Event for Inventory service -> Item reserved -> event bus
- If any event fails - system will create compensating events to rollback
Distributed Transaction of backend in Distributed System
- How to address System failures in middle of txn
- concurrency issue
- ACID Properties
- Automicity - all operations are executed or none executed
- Consistency - DB should be in consistent state after any number of txns
- Isolation - Every txn should be isolated. No txn will have adverse impact in any existing txn
- Durability -After commit -> data should be persisted
- Distributed Transaction
- E-commerse -> Process order(verify wallet balance) + reserve item(verify inventory existence)
Complete steps of Building and Deploying App in Docker Container Cloud Platform
- Docker Registry:
- Storage and distributed system for docker images
- It has multiple repositories
- we push images to repositories
- product lifecycle
- developers -> git repository -> Jenkins -> Registry
Comments
Post a Comment