RSM (Reliability, Scalability, Maintainability) - tradeoff exists between each
Scalability - property of a system to continue to maintain desired performance proportional to the resources added for handling more work
In a highly distributed system, choose any two amongst - Consistency, Availibility, and Partition Tolerance (network breaking up). Networks are always unreliable, so the choice comes down to Consistency vs Availablity.
“A data store is available if and only if all get and set requests eventually return a response that’s part of their specification. This does not permit error responses, since a system could be trivially available by always returning an error.” ― CAP FAQ
Video: What is CAP Theorem?
PACELC Theorem: In case of (“PAC”) we need to choose between C or A, (E)lse (even if system is running normally in the absence of partitions), we need to choose between (L)atency and (C)onsistency.
We do have CA in non-distributed systems like RDBMS like MySQL, Postgres, etc… it is called ACID there and its a fundamental property of SQL transactions.
Availablity insured by: two servers fail-over (master-master and master-slave)
Consistency insured by: replication to multiple servers, one master, others slave (or all masters)
But how to choose a master/leader?:
Max Consistency - all writes are available globally to everyone reading from any part of the system. In simple words, ensures that every node or replica has the same view of data at a given time, irrespective of which client has updated the data.
In a distributed system, we read/write to multiple replicas of a data store from multiple client servers, handling of these incoming queries/updates needs to be performed such that data in all the replicas is in a consistent state.
In decreasing order of consistency, increasing order of efficiency:
sum(1,2)
utilize multiple keys read(1)
and read(2)
and overall ordering will decide the operation sum
outputWeak consistency (Causal & Eventual) levels are too loose, hence we often pair them with some sort of stronger guarantees like:
Quorum Consistency (Flexible): we can configure R
and W
nodes to have strong or weak consistency depending on the use case (notes)
Within a single database node, when we modify a piece of data from multiple transactions, isolation levels come into the picture.
From the POV of Follower 2
replica in the diagram above:
Servers
) the system acts as if it were a single entityx
can be sent immediately without blocking, even though write was the first operation coming into the system. Both the followers and leader will converge to x=2
eventually.Can be used in between web server and service, service and database, etc… knows which services are up and routes traffic to only them, does health check heartbeats for the same.
Types: layer-4 and layer-7 /networking/notes/#load-balancers
Balancing Algorithms:
https://www.enjoyalgorithms.com/blog/types-of-load-balancing-algorithms
To avoid it being a single-point-of-failure (SPOF), keep another LB as a fail-over (active-active or active-passive).
/microservices/patterns/#load-balancing
JOIN
are often done in the application codeReference: http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
BASE is used in NoSQL to describe properties:
Types of NoSQL Stores:
Types of Databases - Fireship YT
NoSQL Patterns: http://horicky.blogspot.com/2009/11/nosql-patterns.html
Scaling up to your first 10 million users: https://www.youtube.com/watch?v=kKjm4ehYiMs
Partitioning Criteria: hash-based, list, round robin, composite
Sharding: divide database into multiple ones based on criteria (often non-functional such as geographic proximity to user).
JOIN
across shards are very expensive. Consistent hashing can help in resolving uneven shard access patternsFederation: divide database into multiple ones based on functional boundaries
Denormalization: duplicate columns are often kept in multiple tables to make reads easier, writes suffer though. We often use Materialized views which are cached tables stored when we query across partitioned databases i.e. a complex JOIN
Goals: Uniform distribution of data among nodes and minimal rerouting of traffic if nodes are added or removed. Consistent Hashing is used.
How to copy data?
Two generals problem - the problem with systems that reply on ACK
for consistency is that the other one doesn’t know if the update has been performed on the other node if ACK
doesn’t reach us back, resolution - hierarchical replication.
Problems in replication:
Split-Brain Problem: happens when there are even number of masters in a system, half of them have a value of x=1
and the other half has x=2
, but we need a single unambiguous value of x
. Auto-reconciliation like timestamp comparisons can be done, sometimes manual-reconciliation too. Better to have an odd number of nodes to avoid this problem, in such a system even if there is a split, a strict quorum is possible.
Write Amplification Problem (WA): when replicating data from a master to a replica, we’ve to write n
times if there are n
replicas in the system. In such a system, what happens when one of the replicas fails to ack the write? should we wait for it (indefinite wait) or move on (stale data on that replica).
Transfer data from an old DB to a new one with minimal downtime.
Naive Approach:
Stop old DB
DB_copy from old
DB_upload to new
Point services to new DB
Downside is the downtime of database during migration.
Simple Approach:
Utilities like pg_dump can take backup without stopping the database server so it remains operational buring backup.
Creating a backup (dump) of a large database will take some time. Any inserts or updates that come in during this backup time will’ve to be populated to new database i.e. entries after timestamp x
at which the backup started. These new entries will be much smaller in number than entire existing database backup.
DB_copy to new (starts at x)
DB_copy to new with timestamp > x (starts at y)
Add trigger on old to send queries to new (starts at z)
DB_copy to new with: y < timestamp < z
If update queries on data yet to be backed up happens with trigger, we can do UPSERT (INSERT if not present, UPDATE otherwise).
If we’ve less data, we can run backup at x
and add trigger to old at y
, skipping step 2 above (second iteration of DB_copy). After that, we only need to copy data from backup start uptil trigger was added i.e x < timestamp < y
.
Optimized Approach:
Setup CDC or SQL Triggers first, and then transfer data from old to new using backup.
After the new DB is populated we want to point clients to the new DB instead of old DB, all read/write should happen on new DB.
Use VIEW
tables as proxy created from new DB (transformations can be done to mimic old DB) but name them same as in old DB so that code pointing to old DB still works. Rename old DB to “old_DB_obsolete” (unpoint).
Once everything is working fine, refactor service code to point to new DB (some downtime to restart services). We can delete the proxy VIEW
s after that.
Diff Cloud Providers:
When migrating from Azure to AWS, we can’t add a trigger as in simple approach or create a view proxy as in optimized approach.
We use a DB Proxy
server in such a case. Reads are directed to the old DB, writes to both old and new. Any views needed are present in the proxy server. Rest of the steps remains the same.
It requires two restarts though - one to point services to DB Proxy and other is to point services to AWS when we’re done with the migration.
Logs and Metrics: centralized logs with trace and gather metrics in a timeseries DB
Anomaly Detection: detecting faults
Root Cause Analysis: finding out cause of faults, correlation in metrics