What is Load Balancing?
A load balancer is a core networking solution responsible for distributing incoming traffic among servers hosting the same application content. By balancing application requests across multiple servers, a load balancer prevents any application server from becoming a single point of failure, thus improving overall application availability and responsiveness. For example, when one application server becomes unavailable, the load balancer simply directs all new application requests to other available servers in the pool. Load balancers also improve server utilization and maximize availability. Load balancing is the most straightforward method of scaling out an application server infrastructure. As application demand increases, new servers can be easily added to the resource pool, and the load balancer will immediately begin sending traffic to the new server.
How Load Balancing Works
All load balancers are capable of making traffic decisions based on traditional OSI layer 2 and 3 information. More advanced load balancers, however, can make intelligent traffic management decisions based on specific layer 4 – 7 information contained within the request issued by the client. Such application-layer intelligence is required in many application environments, including those in which a request for application data can only be met by a specific server or set of servers. Load balancing decisions are made quickly, usually in less than one millisecond, and high-performance load balancers can make millions of decisions per second.
Load balancers also typically incorporate network address translation (NAT) to obfuscate the IP address of the back-end application server. For example, application clients connect directly to a “virtual” IP address on the load balancer, rather than to the IP address of an individual server. The load balancer then relays the client request to the right application server. This entire operation is transparent to the client, for whom it appears they are connecting directly to the application server.
An administrator-selected algorithm implemented by the load balancer determines the physical or virtual server and sends the request. Once the request is received and processed, the application server sends its response to the client via the load balancer. The load balancer manages all bi-directional traffic between the client and the server. It maps each application response to the right client connection, ensuring that each user receives the proper response.
Load balancers can also be configured to guarantee that subsequent requests from the same user, and part of the same session, are directed to the same server as the original request. Called persistency, this capability meets a requirement for many applications that must maintain “state.”
Load balancers also monitor the availability, or health, of application servers to avoid the possibility of sending client requests to a server resource that is unable to respond. There are a variety of mechanisms to monitor server resources. For example, the load balancer can construct and issue application-specific requests to each server in its pool. The load balancer then validates the resulting responses to determine whether the server is able to handle incoming traffic. If the load balancer discovers a server that is unable to respond properly, it marks the server as “down” and no longer sends requests to that server.
Load Distribution with Load Balancing Algorithms
Load balancing algorithms define the criteria that a load balancer uses to select the server to which a client request is sent. Different load balancing algorithms use different criteria. For example, when a load balancer applies a least connection algorithm, it sends new requests to the available server with the fewest active connections among servers in the pool. Another popular algorithm for distributing traffic is round robin, which sends incoming requests to the next available server in a fixed sequence, with no consideration of the current load being handled by each server.
These algorithms use values computed from traffic characteristics such as IP address, port numbers and application data tokens. The most common are hashing algorithms, where hashes of certain connection information or header information are used to make load balancing decisions. Some load balancers also use attributes of the back-end server, such as CPU and memory utilization, to make load balancing decisions.
Commonly used load balancing algorithms include:
Weighted round robin
Source IP hash
URL hash
Domain Hash
Least packets
Persistency of Load Distribution
Persistency is a fundamental concept in load balancing; it enables a load balancer to send all requests from a single client to a specific application server, rather than having every request from the client balanced among all servers in the pool. Persistency is required by many applications. A simple example is an e-commerce application with shopping cart. In such an application, the server needs to maintain the state of a connection by making sure that each time a user adds an item to their shopping cart, that information is sent to the specific server managing the shopping cart.
Persistency is usually based on various client attributes such as IP address, cookies issued by the application and resent by the client, and SSL session ID for that user session. Persistency can also be achieved through the use of tokens issued by either the application or by load balancer itself. The token is included in each client request, and is used by the load balancer to identify which server is handling requests from that client.
Heavy use of persistency, however, usually leads to higher memory consumption, since the load balancer is not always aware if a user session has expired. To optimize memory usage, most load balancers allow administrators to configure a time-out value for each session. When a connection has been inactive for a specified period of time, the load balancer discards the persistence information.
Health Checks for Load Balanced Services
Most load balancers monitor the health of back-end application servers to ensure that client requests are sent only to application servers available to handle traffic. There are at least two important aspects to health checks: the health of the server instance (upon which the application is running), and the actual health of the application. While load balancers can use simple mechanisms such as ICMP Ping to check the health of the physical device, they are also able to interrogate the health of the application itself.
For example, to check the health of a web server application, the load balancer might send it a pre-defined HTTP request. The application server will be marked “down” if the load balancer does not receive a proper HTTP response within a configured time period. Many load balancers provide a comprehensive array of health check mechanisms that cover most common applications. Some load balancers also support scriptable health checks, which allow administrators to write custom scripts using different scripting languages.
Common health-check mechanisms include:
Pin
TCP
HTTP GET
UDP
Summary
Load balancers have become integral to application delivery. In fact, they have become so essential in modern networks that the technology has evolved into more powerful solutions commonly dubbed application delivery controllers, or ADCs. In addition to providing very rich load balancing capabilities, ADCs include advanced functionality such as SSL offload, HTTP compression, content caching, application firewall security, TCP connection management, URL rewriting and application performance monitoring. The emergence of cloud computing, along with the rapid proliferation of mobile devices, is sure to spur a new generation of load balancers.
Reference:
http://www.citrix.com
0 comments:
Post a Comment