This is my first blog post in the Jargon Make Easy series where the goal is to offer an intuitive and simple way of understanding some technical terms in Computer Science, with that said, I will try to keep the rigor aspect of the concepts. For my first post in this series, I would like to introduce the concept of Scaling and Database Sharding.
Introduction:
Software Development is a never-ending process because every piece of software needs to be maintained, new features have to added on top of the old features. A critical aspect of building software is scaling, we can easily understand scaling in the context of business, say you open a restaurant, during the first month you may only have a few customers and one restaurant branch like this.
After a few months, customers love your food but your restaurant is too small and customers have to queue and wait for you to serve them like this picture.
So how can you serve your client better ?
You decided to put more tables inside your restaurant, but you quickly realize that you can only put certain number of tables in one restaurant, that solution will hit a limit. This is what vertical scaling concept in Computer Science is all about.
Is there any other solution ?
You decided to open a restaurant at another location nearby to serve your customers better, this is what horizontal scaling is.
Now to give a rigorous definition:
Scaling is the process of adding more resource to the system to handle a growing amount of work. There are two main types of Scaling: Horizontal Scaling and Vertical Scaling. Vertical Scaling means adding computing resources such as RAM or CPU while Horizontal Scaling means adding server and machine to operate. When referring to Database Sharding, we mostly focus on Horizontal Scaling as Database Sharding is a way to facilitate this process.
What is Database Sharding ?
Database Sharding means splitting tables in your database horizontally, for example, your current database has this table customer(userID, age, email) and it has multiple rows, each representing a customer. When you shard your database you split this table into smaller tables say shard1,shard2,... and decide which row should go to the corresponding shards.
What are the benefits of Database Sharding ?
When thinking about the benefits of Database Sharding, we can recall the benefit of horizontal scaling as they share many commonalities:
1) The system can shorten the query time as all data is now distributed, any query does not have to go into each row sequentially, instead you can look up multiple shards at the same time.
2) Avoid single point of failure, if there is outage, you can be sure that your system will not be corrupted entirely but instead it can still serve the customers until recovery completed.
3) At some point, Database Sharding is the only way to scale your system, since there is a certain limit on vertical scaling, your system has to be scaled horizontally and Database Sharding helps implement horizontal scaling.
What are some disadvantages of Database Sharding ?
Besides its benefits, Database Sharding also comes with its drawbacks, these are some key drawbacks when implementing Database Sharding in a Distributed System:
1) The system becomes more complex which makes it difficult to maintain data integrity and consistency.
2) If designed incorrectly clusters can happen in one table that is one table contains an excessive number of rows than other tables and you will lose the benefits of Sharding.
3) Some database does not natively support Database Sharding and implementing Database Sharding to exploit all the advantages out of it is a challenging task.