Deciding between Cloud Spanner and Microsoft Cosmos DB

November 26, 2020

Creating a globally distributed database would traditionally require a lot of time and effort. You would have to host the database in a personal data center bearing the entire cost. The advancement in cloud computing and platforms as a service (PaaS) has made it easier to create globally distributed and scalable databases.

Particularly, the use of SQL-based databases has become more popular in the recent past. These databases provide you with a customized experience of sorting and processing large amounts of raw data. Many database providers exist, and they work on different nodes of the SQL database engines, including Microsoft Azure Cosmos DB and Google Cloud Spanner.

This article will discuss the features of both Cloud Spanner and Microsoft Cosmos DB and highlight their advantages and disadvantages. It will also compare the two to help you better choose the cloud database to use in the future.

Features of Google Cloud Spanner

Google Spanner is a fully managed, scalable, relational database management service.

The platform can scale millions of nodes across multiple regions. Cloud Spanner nodes are dedicated resources that frequently perform background work to protect and optimize user’s data, even when the user is not running a workload. Besides being integrated with access and identity management, it provides features such as logging and auditing.

Google Cloud Spanner’s instance is comparable to a relational database management system (RDBMS) server. This instance is composed of one or multiple databases, which use the same compute and storage resources. These resources are allocated during instance creation.

Two main configurations, Regional and Multi-Regional, control the compute and storage resources. These configurations decouple resources from data storage. This way, a user can decrease, increase, or relocate processing resources without executing changes to the underlying storage.

Cloud Spanner supports automatic data replication. Instance configuration (the first configuration) determines the number of copies (replicas) to be created and their placement. Regional configuration allows data replication across three zones.

These zones must come from within a single selected region. Multi-Regional configuration, on the other hand, supports data replication across four zones. These zones can come from different regions depending on what continent the user specifies.

GCP zones and regions

Image source

Essentially, these two configurations (Regional and Multi-Regional) provide safety against failure of zones and regions. Mainly, regional configuration ensures the safety of an entire zone. Multi-regional configuration, on the other hand, ensures the safety of an entire region.

Cloud Spanner supports atomicity, consistency, isolation, durability (ACID) transactions.

Spanner offers 99.999 percent availability. It enables users to perform both the read-only and read/write modes of transactions.

In this case, a user can perform modifications such as deletes, updates, and inserts, to the data with the read/write mode. The read-only mode incorporates just read operations that don’t require an update.

Features of Cosmos DB

Microsoft Cosmos DB is a Platform as a Service (PaaS) service in Microsoft Azure.

PaaS is a deployment environment in the cloud that provides a platform for individuals and organizations to develop, manage, and run cloud-enabled enterprise applications. Azure is Microsofts public cloud offering.

With Azure Cosmos DB, users can build and distribute their applications across an Azure data center. Manual work and configuration is eliminated, in this case.

The globally distributed, multi-model database is available in all regions where Azure is available. This is possible considering its turnkey global distribution that automatically scales and replicates data across various data centers in the Azure network.

Cosmos DB regions

Image source

Cosmos DBs multi-model feature supports multiple models, including table storage, graphs, key-values, and document storage in a single database. It offers high availability and consistency across all these data models, irrespective of their nature, for data storage.

Cosmo DB offers comprehensive service level agreements (SLA) of 99.99% for latency, availability, consistency, and throughput. Latency is below 15ms for document write and 10ms for document read operations. When latency is at a minimum, users experience a faster, seamless experience.

Pros and cons of Microsoft Cosmos DB

CosmosDBs use of data containers projectable as different types of datasets is unique. It provides an SQL interface for documenting projections, including stored procedures and triggers. It is a simple to use service.

Microsoft joined the cloud market and repurposed its on-premises software for the cloud. These software include Windows Server, SQL Server, Office, .Net, Microsoft Dynamics 365, and SharePoint.

Many organizations use Windows and other Microsoft associated software, that is essential for Cosmos DBs success. Cosmos DB tightly integrates with other Microsoft applications. Thus, organizations using a lot of Microsoft software find it sensible to also use Cosmos DB. This builds on customer loyalty.

On the flip side, users associate the Cosmos DB platform with some imperfections. Cosmos DB fails to provide a seamless way to switch between database-provisioned throughput and container-provisioned throughput.

You have to recreate the database for you to switch successfully. Besides, Cosmos DB does not allow documents from different logical partition to participate in the same transaction. And writes from different containers cannot be part of the same transaction.

Pros and Cons of Google Cloud Spanner

One of the strengths of Google Cloud Spanner is its strong offering in containers. The Kubernetes standard developed by Google is now also offered by Azure. Google Cloud Platform (GCP) deals with high compute offerings, such as big data, machine learning, and analytics. GCP provides significant scale and load balancing.

Spanner is capable of handling large volumes of data. Its use is not limited to applications of large sizes. Further, it allows the standardization of a single database engine for all workloads requiring an RDBMS. This is very beneficial to organizations.

On the downside, it may be challenging to create an instance in a local environment for Cloud Spanner. A development environment needs to be as close to production as possible. But this is not the case with Spanner because you must rely on a full spanner instance to accomplish this need. You may choose a single region instance to save on costs.

Major differences between Cloud Spanner and Microsoft Cosmos DB

Google Spanner supports concurrent ACID (atomicity, consistency, isolation, and durability) guarantees across more than one operation. Cosmos DB supports ACID only within the limit of a single operation. ACID are properties that ensure reliable processing of database transactions. ACID databases guarantee data correctness and maintainability in extensive production systems.

Scalability is an essential feature of cloud databases. Google Spanner is automatically scalable, while Azure Cosmos DB is not.

A manual database requires you to explicitly direct it to the keys being used for sharding. In sharding, we break large tables down into smaller chunks (shards) spread across several servers. As a subset of the whole dataset, a shard serves a portion of the overall workload.

Cosmos DB offers distributed access aimed at providing redundancy and disaster recovery. Cloud Spanner also offers some similar features. However, its offering is limited across datacenters in a single region, meaning that you can only work with, say, EU- or US- only audience.

This is a setback because cloud services are needed globally. Spanner beats Cosmos DB with its low latency and strong consistency, but this is less valuable where cross-regional data replication is required.

Conclusion

This article compared two cloud databases: Cosmos DB and Cloud Spanner. We went over how these cloud databases can provide you with a customized experience of sorting and processing large amounts of raw data. We hope you are now better informed to make the best choice that best suits your needs.


Peer Review Contributions by: Lalithnarayan C


About the author

Eric Kahuha

Eric is a data scientist interested in using scientific methods, algorithms, and processes to extract insights from both structural and unstructured data. Enjoys converting raw data into meaningful information and contributing to data science topical issues.

This article was contributed by a student member of Section's Engineering Education Program. Please report any errors or innaccuracies to enged@section.io.