With the introduction of Social Media such as Facebook, Twitter, Reddit, the world is more connected than ever before. This new era has brought with it a host of issues, namely that relate to scalability. The potential for virality is at its peak with websites experiencing the Slashdot effect more often, and for many the question of scalability has become a real challenge. We can look to traditional literature for help, specifically to the subject of distributed systems since the Worldwide Web is essentially just a very large distributed system.
We can find the answer we are looking for in the Shared Nothing Architecture. This is a distributed computing concept that advocates all nodes in a system to be independent and self-sufficient, such that there is no single point of contention across the system. In simple terms this means that there is no sharing of disk or memory between the nodes. Applied to web-based systems, this architecture provides us with a simple yet very effective technique to achieve massive scalability. The process is explained below.
Before we dive into the specific techniques, it is important to understand what scalability really means. Quoting Wikipedia “[scalability is] the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added.” This definition implies that the total capability of the system is directly proportional to the underlying resources it is provided with. Such scalability can only be achieved practically when certain design constraints are adhered to. Shared Nothing Architecture (SNA) is exactly this set of constraints for scalable systems. The two restrictions imposed are 1) No shared disk and 2) No shared memory.
In a web-based system, the sharing of a disk is typically required for two reasons: for storing files and for database storage. The need for sharing due to file storage can be removed either by using a remote file system and using mount points on systems wanting to access the files, or by using a dedicated storage (or database) for files also referred to as BLOB storage. Contention related to database storage can be removed by moving the Database server software (RDBMS) to a separate, dedicated server and accessing the database from that server.
The issue of sharing of memory is more complicated to circumvent. Two kinds of data are stored in memory: Session data and Cache data. Session data is used in web applications to store user credentials, temporary data, form data and other information. There are two solutions to the storage of Session data. One is the use of a separate database for this purpose and the other is storing all session data in the browser’s cookies (limited to a maximum size of 4 Kb). Caching is employed in web applications to speed up database operations. A solution is using distributed caching systems such as Memcache. Also like Session data, a separate database can be setup to exclusively manage and store Cache data.
By decoupling a web system in this way, it is possible to scale the system with increased load. Since each node of the system would be independent and self-sufficient (prerequisite for a SN architecture), the addition of new nodes would result in increased capacity of the overall system. A load balancer could then be added in front of the node cluster to receive the incoming traffic and distribute requests to the backend nodes. The resultant system would then be able to grow or shrink, hence achieving the goal of scalability. Nothing in the technique presented is limited to any one particular software stack or technology. The Shared Nothing Architecture is successfully being employed in web applications developed in PHP, Ruby (on Rails), Java, Scala, and a plethora of other languages, and it is actively being used in the real world to solve problems related to massive scalability.
Safari Books Online has the content you need
Take advantage of these resources in Safari Books Online:
|Maintain a dynamic enterprise computing infrastructure with expert instruction from an Oracle ACE. Oracle Database 11g Oracle Real Application Clusters Handbook, Second Edition has been fully revised and updated to cover the latest tools and features. Find out how to prepare your hardware, deploy Oracle Real Application Clusters, optimize data integrity, and integrate seamless failover protection. Troubleshooting, performance tuning, and application development are also discussed in this comprehensive Oracle Press guide.|
|Scala is an object-oriented programming language for the Java Virtual Machine. In addition to being object-oriented, Scala is also a functional language, and combines the best approaches to OO and functional programming. Artima is very pleased to publish a new edition of the best-selling book, Programming in Scala, written by the designer of the language, Martin Odersky. Co-authored by Lex Spoon and Bill Venners, this book takes a step-by-step tutorial approach to teaching you Scala. Starting with the fundamental elements of the language, Programming in Scala introduces functional programming from the practitioner’s perspective, and describes advanced language features that can make you a better, more productive developer.|
About the authors
|Salman Ul Haq is a techpreneur, co-founder and CEO of TunaCode, Inc., a startup that delivers GPU-accelerated computing solutions to time-critical application domains. He holds a degree is Computer Systems Engineering. His current focus is on delivering the right solution for cloud security. He can be reached at email@example.com.|
|Shaneeb Kamran is a Computer Engineer from one of the leading universities of Pakistan. His programming journey started at the age of 12 and ever since he has dabbled himself in every new and shiny software technology he could get his hands on. He is currently involved in a startup that is working on cloud computing products.|