Distributed Systems & Micro Services

 Uber- started as monolith where following modules are into single appln integrated with each other. Tight coupling with single codebase

  • Passenger mgmt
  • driver mgmt
  • trip mgmt
  • billing
  • notification service 
Challenges

  • Every bugfix require rebuilding complete appln & testing as it is single code base
  • Fixing bug is extremely difficult in single repository
  • scaling one module over other is difficult
Micro Services - Advantages

  • Deploy modules independently with out changing any other modules
  • Modules are scaled independently
  • Bug fixing will be easy and doesn't interfere with other modules
  • API Gateway/reverse proxy
    • routes the requests to right micro services
    • single entry points for all requests. Helps in analytics/metrics generation
  • Different modules communicate via message queues(ActiveMQ)
Active MQ

  • Async message flow
  • JMS Model 
    • Queue based -> Point to Point communication
    • Topic based -> Publisher, Subscriber model. 
  • Receiver
    • synchronously receive a message
    • Use listener to notify synchronous consumers that there is a message available
How to Handle Cascading failure in System Design Problems
  •  If you design a distributed system, how will you handle fault tolerance. How you enable system to continue operate properly in the event of failure of some of components
    • eg: Uber has say micro services Trip Mgmt & Billing
    • Trip management is invoked by REST api
    • Trip management will have Executor Service(Thread pool) 
    • one of the thread from pool will be used to contact Billing service
    • What if Billing Service is down OR Slow due to DB slowness
      • Executor service will keep on trying to contact Billing service which may impact system performance
    • When one of the service is slow - it may impact other services to be slow and cause cascading impact(Notification service also can go slow). Services connected to notification also can be slow
    • This is cascading failures
    • Circuit Breaker Pattern
      • Don't communicate with other services like Billing directly
      • Rather communicate with Interceptor and Interceptor will communicate with Billing
      • When Interceptor detects Billing is down, it will stop calling it for sometime
        • Advantages
          • Allows Billing server to recover from overload situation
          • Interceptor responds immediately saying service not available  and timeout doesn't happen
          • Interceptor maintains number of pass/fail requests to billing
          • When fail requests are more than threshold it stops sending requests to the micro service for some time
          • Once timer expired interceptor goes to ALLOW PARTIAL state. Later to open once the requests failures stop
        • Projects that implement curcuit breaker concept
          • Hystrix & resilience4j
          • Hytrix provides only circuit breaker
          • resilience4J provides
            • circuit breaker
            • rate limiter
            • retry
            • bulkhead feature
        • Decorator design pattern
Handling and Managing the cluster of machine in Distributed Environment.
  • Benefits of distributed microservices
    • small codebase
    • easy to deploy
    • easy to scale
    • extensible
  • Challenges - managing micro services
    • Service Discover
      • How one micro service knows the ip address of other micro services
      •  Stores the service id for each micro service 
      • Any service will lookup in service discovery gets all ip address for the service
    • Load balancing
      • load balance within serves that support same service
    • Distributed tracking
      • for debugging
      • traceid for the complete flow
    • Authentication/Authorization/network policy
      • Security - 
    • Monitoring
  • How to overcome the challenges
    • Sidecar - It performs all non-functional services listed above
    • All machines will have their side car
    • Control Panel -
      • To manage all the side cars -> managed by operation team
      • Traffic management(configure % of traffic to any specific server)
    • Istio - Project for control panel
    • Envoy - Project for sidecar
Kafka
  • In distributed systems, there would be 100's of requests/events in a sec
  • Huge number of events in a very short time - the messaging system like activeMQ cannot handle. Active MQ works on message systems and not Streams. 
  • When requests come in form of Streams then we need distributed messaging system like Kafka
  • Kafka is highly available and resilience in node failure.
  • It supports automatic recovery
  • Fast, scalable and fault tolerant
  • It works on publisher-subscriber messaging system
  • Components of Kafka
    • Kafka Broker
    • Kafka Topic
    • Partitions
    • Consumer Group
    • kafka producer
    • kafka consumer
  • Kafka cluster is group of machines/brokers. Producer sends to cluster and consumer receives from cluster
  • Kafka Topics
    • Amazon inventory has many suppliers of groceries
    • one topic for Grocery for suppliers to send the line details
    • another for electornics
  • Topics are distributed across multiple brokers
  • Topics are broken into partitions. Partitions are put into different brokers
  • This is the way the huge distributed systems are managed.
  • One consumer can take data from diff Brokers
  • One broker cannot send data to multiple consumers within a consumer group
  • Fault Tolerance
    • How to recover data if one broker dies
    • Create multiple copies of partitions and store in multiple machines(called as replication factor0
    • For every partition - one broker is marked as leader
    • Only leader will communicate with consumer & producers
  • Kafka uses zookeeper to manage clusters of machines
    • zookeeper maintains all the list of brokers in system using heart beat
    • If a broker node is shutting down, then zookeeper role is to tell replicas to act as leaders
    • Stores config of all Kafka systems
      • It stores number of partitions of each Kafka topic
      • Locations of all replicas of a particular partition
      • which broker nodes are leaders and which are followers
Concurrency and System failure issues in DB transaction in System Design
  • Two-Phase Commit 
    • API gateway -> Transaction Coordinator -> Order micro service & inventory microservices
      • Prepare  -> locks the records
      • Commit -> starts only Prepare is successful from all the services
    • Disadvantages
      • Some time it can cause perf issues as Transaction coordinator is dependent on multipler services
      • It can cause dead lock situations
  • SAGA -> to overcome above disadvantages
    • Two-phase is syncronus in nature - Hence slow
    • Saga is Async. Diff micro services communicate using Event Bus/queue.
    • Event bus works with pub/sub model
    • Flow
      • e-commerge api gateway -> inserts Create Order event in Event Bus -> Order Microservice which is subscribed will create order and emits another event Order Created. This is consumed by api gateway
      • Reserve Event for Inventory service -> Item reserved -> event bus
    •  If any event fails - system will create compensating events to rollback
Distributed Transaction of backend in Distributed System 
  • How to address System failures in middle of txn
  • concurrency issue
  • ACID Properties
    • Automicity - all operations are executed or none executed 
    • Consistency - DB should be in consistent state after any number of txns
    • Isolation - Every txn should be isolated. No txn will have adverse impact in any existing txn
    • Durability -After commit -> data should be persisted 
  • Distributed Transaction
    • E-commerse -> Process order(verify wallet balance) + reserve item(verify inventory existence)

Complete steps of Building and Deploying App in Docker Container Cloud Platform 
  • Docker Registry: 
    • Storage and distributed system for docker images
    • It has multiple repositories 
    • we push images to repositories
  • product lifecycle
    • developers -> git repository -> Jenkins -> Registry

Comments