Case Study
We've deployed our web application on a web server. We estimated resources based on tentative load. In the beginning, the load on the server was nominal. With time, the load increased and eventually, users started facing delayed response or "Service Not Available" errors. This happens when Web Server can't handle more requests. We can change settings at webserver level to handle more requests but eventually, machine resources are limited and we can't go more than the limit. What to do now to handle more requests?
Solutions
What is Scalability: (From Wikipedia) "Scalability is the property of a system to handle a growing amount of work by adding resources to the system."
Vertical Scaling
One easy option is to increase server resources (e.g. Increase RAM and Processing power). This is called Vertical Scaling or Scaling-Up.
This type doesn't create much complexity or doesn't require any special considerations in the application architecture. Therefore, even if we don't know about scaling, we would be doing this to handle the load. But this has a limitation. We can't add infinite resources to a single machine. What if the same machine fails? We also need to make sure "availability".
Horizontal Scaling
Another option is to use more machines. We deploy the same application on multiple servers. A Load Balancer is used as a middle layer to load balance. The request goes to Load Balancer and it forwards requests to a machine based on a load of that machine. This type of scaling in which we add more machines is called Horizontal Scaling or Scaling-out. Considering load you can add or remove machines. But this solution brings complexity and we need to design our application architecture carefully.
So what issues do we see if we've more than one web server?
Case 1
If we are using Server Side Session (In Process/Memory) & keeping some information in it to use that in future requests, we are going to have a problem. Let's suppose the initial request goes to Server A and Session is created on that server. Now, if the next request goes to Server B (considering Load balancing), a new session will be created again on Server B and this can happen again if the next request goes to Server C.
Solution 1
Use Sticky Session option in Load Balancer. In this case, the load balancer will send all future requests of a user (browser session) to the same server where it would have sent the first request. But in this case, the load will not be equally distributed.
Solution 2
Use "Out Process Session" instead of In Process/Memory. It means you store your session information on some other "state server" or in some "database" for example. So it doesn't matter if the request goes to Server A or B or C, the session will exist on some separate/specific server. This has performance issues but better scalability will be achieved.
Solution 3
Try to have Session-less application like RESTfull Services where we maintain information (which is required in multiple requests) in some token (e.g. JWT). This gives us true flexibility to scale.
Case 2
Normally we store uploaded documents (e.g. user profile picture) on the server inside web application folder and while accessing it, we try to use "relative" paths. Now if the request goes to server A and we save a file on that server. If in future, request to get that document goes to Server B, it will not find that file on Server B (as a file was saved on Server A).
Solution 1
Save all uploaded files in the database instead of Web Server. Note: In single web server application, Saving a file in Web Server vs. Saving a file in the database is a different debate. Both have different pros & cons.
Solution 2
Save files on other servers and save full path of the file in your database. So instead of building path on run time (using relative paths), we'll just get the full path from the database when we want to access/download the file. You may have your own library/api or 3rd party libraries which upload files for you, give you the full path and give you file back when you provide the full paths to them.
Case 3
Reading data from the database every time is performance impacting. To increase performance, we use some type of cache (in-memory data storage) to store frequently used data and read from the cache instead of going to the database again & again. Web Application frameworks provide some type of mechanism to store data in Server memory. For example, ASP.NET provides Application and Cache Objects to store data at the application level. A static class can also be used for this purpose. If we are going to have multiple web servers, we need to make sure that such data is loaded/updated on every web server or we should use some distributed cache system (which is independent of webserver). For example, MemCache or Redis can be used to store data in memory.
So while designing architecture of an application (to make it more scalable), we need to clearly make a decision regarding 1) Session or In Memory storage and 2) Saving uploads + using full Paths.
Consideration of Databases for Scalability
Salability is not limited to web or application servers only but it also applies on the database side. If the database is creating a bottleneck, it will not matter if you have one or millions of web/application servers.
At the database side, a technique of replication is used to create a copy of the same data on multiple machines. There are different types of replications in different DBMS. In short, we get data copied on multiple servers. From an application architecture perspective, in application, one module may read/write to one DB server and other modules may read/write to another DB server to balance the load. Whether you can write only on one server or on any server in the cluster, it all depends on replication type being used by DB Administrators. NoSQL based databases should be considered for high-performance & for high scalability.
As a side note
One of the many advantages of Microservices architecture is scalability. Also use of Dockers provides us flexibility in achieving high scalability.
Summary
If we are going to design an application which may expect a higher load with time, we should carefully design its architecture and give special consideration to scalability. Vertical scaling can give relief for a short time but the actual cure is Horizontal scaling. Scalability is not possible if the application is not architected in the specific way i.e. Making it independent of server resources.