The thought of running transactions and analytics in a single database is not completely new, but until recently, limitations in technology and legacy infrastructure have stalled adoption. Now, innovations in database architecture and in-memory computing have made running transactions and analytics in a single database a reality.
Converging transactions and analytics in a single database requires technology advances that traditional database management systems and NoSQL databases are not capable of supporting. To enable converged processing, the following features must be met.
Storing data in memory allows reads and writes to occur orders of magnitude faster than on disk. This is especially valuable for running concurrent transactional and analytical workloads, as it alleviates bottlenecks caused by disk contention. In-memory operation is necessary for converged processing as no purely disk-based system will be able to deliver the input/output (I/O) required with any reasonable amount of hardware.
In addition to speed, converged processing requires the ability to compare real-time data to statistical models and aggregations of historical data. To do so, a database must be designed to facilitate two kinds of workloads: (1) high-throughput operational and (2) fast analytical queries. With two powerful storage engines, real-time and historical data can be converged into one database platform and made available through a single interface.
Without disk I/O, queries execute so quickly that dynamic SQL interpretation can become a bottleneck. This can be addressed by taking SQL statements and generating a compiled query execution plan. Compiled query plans are core to sustaining performance advantages for converged workloads. To tackle this, some databases will use a caching layer on top of their RDBMS. Although sufficient for immutable datasets, this approach runs into cache invalidation issues against a rapidly changing dataset, and ultimately results in little, if any, performance benefit. Executing a query directly in memory is a better approach, as it maintains query performance, even when data is frequently updated (Figure 4-1).
Reaching the throughput necessary to run transactions and analytics in a single database can be achieved with lock-free data structures and multiversion concurrency control (MVCC). This allows the database to avoid locking on both reads and writes, enabling data to be accessed simultaneously. MVCC is especially critical during heavy write workloads such as loading streaming data, where incoming data is continuous and constantly changing (Figure 4-2).
Fault tolerance and ACID compliance are prerequisites for any converged data processing systems, as operational data stores cannot lose data. To ensure data is never lost, a database should include redundancy in the cluster and cross-datacenter replication for disaster recovery. Writing database logs and complete snapshots to disk can also be used to ensure data integrity.
Many organizations are turning to in-memory computing for the ability to run transactions and analytics in a single database of record. For data-centric organizations, this optimized way of processing data results in new sources of revenue and a simplified computing structure that reduces costs and administrative overhead.
Many databases promise to speed up applications and analytics. However, there is a fundamental difference between simply speeding up existing business infrastructure and actually opening up new channels of revenue. True “real-time analytics” does not simply mean faster response times, but analytics that capture the value of data before it reaches a specified time threshold, usually some fraction of a second.
An example of this can be illustrated in financial services, where investors must be able to respond to market volatility in an instant. Any delay is money out of their pockets. Taking a single-database approach makes it possible for these organizations to respond to fluctuating market conditions as they happen, providing more value to investors.
By converging transactions and analytics, data no longer needs to move from an operational database to a siloed data warehouse or data mart to run analytics. This gives data analysts and administrators more time to concentrate efforts on business strategy, as ETL often takes hours, and in some cases longer, to complete.
By serving as a database of record and analytical warehouse, a hybrid database can significantly simplify an organization’s data processing infrastructure by functioning as the source of day-to-day operational workloads.
There are many advantages to maintaining a simple computing infrastructure:
Many innovative organizations are already proving that access to real-time analytics, and the ability to power applications with real-time data, brings a substantial competitive advantage to the table. For businesses to support emerging trends like the Internet of Things and the high expectations of users, they will have to operate in real time. To do so, they will turn to converged data processing, as it offers the ability to forego ETL and simplify database architecture.