As the number of Snapchat users increased to millions, the company required a database solution that could scale and adapt to changing daily needs. Eventually, Snap’s wise selection of databases resulted in lower latency, desired scalability, and infrastructure cost savings.
“Migrating to Amazon DynamoDB has helped Snap optimize annual infrastructure costs significantly—in addition to providing low latency and operational reliability for core use cases like messaging and our friend graph,” says Jain.
Snap saw the migration of Snapchat datasets as an opportunity rather than a challenge. Accordingly, the application was refactored into a microservices architecture, and a live migration was undertaken so as not to disrupt the user experience. And the decisions were made about how to structure datasets for cloud migration.
What did Snapchat accomplish by selecting the right database?
- Snaps were sent with a 20% reduction in median latency.
- The system could process over 10 million searches per second on busy days.
- Consistent, low-latency performance for users worldwide while lowering annual infrastructure costs.
To answer your question, “How to choose the right database on AWS,” read the rest of this blog post. Then, let us know how we can assist you in developing a database strategy for your project.
4 questions to ask while selecting a right database
Before making a final decision on the database, a few factors must be considered. And we’ve compiled a list of practical considerations that have helped businesses achieve amazing things while avoiding potential disasters caused by poor decisions.
Availability vs Consistency vs Partitioning: Which one to prioritize?
Here is a simple breakdown of when ‘consistency’ and ‘availability’ should take priority over ‘partition tolerance.’ Or when ‘consistency’ should come second to ‘availability’ and ‘partition tolerance.’ The third alternative is to prioritize ‘consistency’ and ‘partition tolerance’ in situations when ‘availability’ is not always essential.
- Choose Consistency and Availability for non-distributed systems. The use case sacrifices Partition tolerance and it is mostly used by primarily traditional databases.
- Availability and Partition tolerance is used for truly distributed systems. This use case sacrifices Consistency of data between nodes and prefers Availability over immediate Consistency, used by distributed NoSQL data stores.
- Consistency and Partition tolerance are chosen for distributed systems where the users need a consistent view of data more than the availability of systems.
What kind of scalability do you need, and do you require a scalable database?
Let’s say you need a database system that can handle tens of thousands of users on a global scale. In that case, you must decide whether to scale your database horizontally or vertically.
If you want to expand the capacity of a single system and require more memory, you can do so by adding a more powerful CPU core or more RAM to an existing server, which is known as vertical scaling. However, if you want to increase capacity over time, you can add more machines, known as horizontal scaling.
Read this blog to know more about scaling your Amazon RDS instance vertically and horizontally.
Due to a variety of factors, legacy databases fail to provide the scale required for modern applications.
- Increased operational costs due to incompetence in offering failover capability
- difficulties with accessibility
- limited capacity during rush hour
A scalable database is similar to offering your application elasticity. It grows as your business does. It functions fully to meet business needs, i.e., user requests. When the needs are reduced, it scales down. It’s critical to know if you need a scalable database or not; because scaling isn’t about estimating demand and meeting it. Instead, it’s about being ready to handle that sudden surge in data.
Scaling computation and memory resources is critical in modern applications since the users’ needs change in a fraction of a second. As a result, you must select databases with both computation and storage scaling capabilities.
Amazon RDS, for example, has compute scalability capabilities of up to 32 vCPUs and 244 GiB of RAM.
Check to see whether your modern database system includes additional storage.
The Amazon Aurora database engine, for example, expands automatically as the database capacity and storage requirements grow.
What is the relationship between scalability, database, and application performance?
Slow database performance leads to slow application performance. This is due to your server’s inability to achieve its requirements due to high CPU utilization. Disk usage consumption and a lack of memory are two other bottlenecks. Scaling up your database is the ideal long-term solution when your application demands more disc space for maximum performance. And you can achieve it with modern, AWS purpose-built databases. When the server’s physical memory runs exhausted, the databases begin to use the hard drive to store mission-critical processes that would normally be saved in RAM. The issue here is that the RAM is 100 times faster than the hard disc, resulting in poor performance.
Do you need structured or unstructured data modeling for your application?
Another important element to consider while selecting the correct database is data modeling. It helps map your application’s characteristics to the appropriate data structure. Entities, associated properties, and required relationships can all be identified via data modeling. Therefore, choosing the proper database for your application is a fundamental decision.
Determine whether your application requires structured or unstructured data. For example, suppose you’re creating a social media platform like Facebook. The users will have organized information such as their name, birth date, gender, and so on. However, this is not the case if your application offers environmental or geospatial data with no set structure. No-SQL databases can be used in such applications. This is also true in some cases with social networking platforms like Facebook, LinkedIn, and Instagram, where users post on the news feed. As this type of data is unstructured, it must be kept in No-SQL databases.
For a long time, relational databases have dominated the market. However, in today’s application market, NoSQL databases have become an essential option.
Relational databases, like spreadsheets, organize data in tables with rows and columns. The primary key identifies numerous rows in each table. Keys establish the relationship between these tables, and tables require fixed schemas.
NoSQL or Document databases are more versatile because they don’t require tables to store data in a specific format. Unlike SQL tables, which require predefined schemas, each document stored may contain various fields. Embedded documents are used by several document repositories to handle complicated data structures.
Have you considered the costs associated with the database?
No matter how fancy your database is or what new technology it supports, the price is the most important factor in closing the deal. So it’s crucial to assess the degree of flexibility your database solution offers compared to the costs paid. Building a system on a foundation that is too weak to meet your needs could be a huge mistake. As a result, it is recommended that you do not choose a solution with high prices and premiums, but rather one that provides value for the price you agreed to pay.
Furthermore, it includes the costs of dealing with catastrophic database-related difficulties. While database failure is one of the causes of application failures, it also causes revenue loss, user dissatisfaction, and plenty of other data-related issues that force businesses to pay the price.
Similarly, sharding or splitting data comes at a cost. Consider the situations where you need to scale above the database’s maximum scalability limit. Some databases, such as Amazon Aurora, allow you to scale numerous instances, but you can only scale read-only nodes, limiting your ability to scale write nodes. Databases like MySQL and Postgres need you to partition the database, which leads to high costs. Modification of applications, hardware configuration or addition, establishing new instances, and other ways can all contribute to the database’s cost.
We couldn’t stop ourselves from adding The Cap Theorem! The reason is that database selection without taking this idea into account is ineffective and adds no value, even if many of you are already aware of it. So, if you’re reading about it for the first time, have a look or pass!
CAP theorem
CAP theorem, aka Brewer’s theorem, comprises 3 components: Consistency, Availability, and Partition tolerance.
Availability:
The A in CAP represents ‘availability,’ more commonly used to describe the amount of time a service is available to handle requests or provide a good user experience even in peak traffic cases.
It’s an assurance that every request receives a response, whether it is successful or failed. So, for example, even if the system has node failures, it continues to function and serve data.
Consistency:
Consistency represents the consistency among the data displayed to all the clients.
For example, different clients connect to different nodes, and the data must be replicated or forwarded to other nodes in the system. A practical example of it is, for instance, a social media platform like Facebook, which shows total friends in a user’s own profile, and in other connection’s profiles, it shows some mutual connections. So when an account is added or deleted from a mutual connection’s database in a database stored in India, it takes some time to reflect in a database replica stored in the United States.
Partition tolerance:
Considering distributed applications, these architectures have a standard communication structure between processes to exchange data, such as a business logic or file. In such systems, a partition is used to divide the data into multiple parts so both the read and write operations can be distributed accordingly. These smaller chunks of data are also known as shards.
Partitioning is inevitable for scalable systems as it sustains the need for scalable solutions which require sustainability after a certain point without experiencing system failures.
3 Steps to choosing the right database
#1. Evaluating the requirements
Choosing a database solution can be difficult, especially if you have a lot of criteria to meet. When selecting an appropriate database, you come across a number of things. For example, your application has a range of special criteria that you need your database to meet, such as,
Data needs for business intelligence
Processing large amounts of data at top speed
Structured data requirements
Data models, as well as other features such as querying methodologies, latency, and consistency, are crucial, as explained previously.
Today’s modern applications need everything in one solution.
Because these apps serve millions of people in a variety of smart ways, they require long-term, high-performance database foundations. As a result, you must make a decision and analyze your options. Some databases aren’t up to the mark in providing performance. But have you ever considered the underlying reasons for this?
It might be due to a lack of understanding of one’s own system. It’s important to consider a read-to-write ratio analysis. Or perhaps the low-performing database has superior consistency but poor performance as a trade-off. The structure in which the data is stored and retrieved is also important. The structural requirements differ from one application to another.
Other essential criteria to evaluate a database include:
User-friendliness of the system
Ease of visual analysis of data
Scalability needs
Administrative overhead
Availability and durability
Cost-effectiveness
And sometimes, especially with large-scale systems, you need a database solution that provides all these qualities. A scalable enterprise system, for example, requires structured and consistent order-related data. Furthermore, it involves ACID properties and the query pattern that might differ from what a standard database offers!
So, the first step is to assess the project’s requirements. Then, based on that, learn about the various storage options, which is the next step!
#2. Available storage solutions
When it comes to storage solutions, there are several options to consider. For example, if your application is focused on Big Data analysis, such as banking or financial services, you can use data warehouses as a storage solution. Similarly, if text search is a fundamental requirement in storage solutions, such as those used by ride-sharing apps, choose a solution that includes this feature. Likewise, depending on your use case, you can focus on what matters most to you.
Let’s have a look at a few of the possibilities.
Caching solution:
It’s useful for systems that require a lot of reading, like Facebook or Twitter. Data calls must be made often. Examples of caching solutions include Memcached, Redis, etc.
Search Engine for Text:
It can be used with any search engine in systems that provide search functionality. Consider the Google maps search engine or ride-sharing apps such as Uber. These applications’ databases allow “fuzzy search.” When a user misspells a keyword, Google concludes the true intention. Even if a user writes “flter,” the search will still return relevant results with “Filter.” The same is true for an Uber user who types “Arpot” instead of “Airport,” and the app still returns nearby airport locations.
Warehouses of data:
It refers to applications that need the utilization of Big Data and serves as a central store for data that must be analyzed later. This helps in making more informed decisions. Also, it compiles data and information from various sources into a comprehensive database. Online database query answering systems, such as ATMs, are an example.
Database of time-series:
It’s useful in IoT and operational applications where trillions of events are analyzed every day. It is mostly used to track system health and usage indicators in real time to improve system performance and availability. DevOps, IoT, and analytics applications are some of the most common use cases.
#3. Examining the technology stack
This is an obvious step that you should not skip before deciding on a database solution. If you already have a tech stack and want to rethink the database, look into the several options for selecting your database. For instance, will you need to redesign your application to use a microservices architecture? Alternatively, determine the tech stack you’ll require for the database you’ll be using in your app.
Check for programming languages, compatible integrations, cloud architecture, and a thorough understanding of your application’s data structure.
If you’re a startup, an SME, or a CTO, and your team needs help making this process go more smoothly, don’t hesitate to partner with tech companies that can help with such important business decisions.
The next step is to go over a list of use cases that might fit your requirements. Check if the below list has one of your matching use cases, and then one last step, the tailored strategy!
AWS vs. Azure vs. GCP: A Complete Comparison Guide
How to choose the right database based on the use cases?
Contemporary databases provide everything modern enterprises and applications require. From unlimited scaling to resource utilization, there are numerous database alternatives for each use case. We’ve selected a few scenarios for which obtaining the right database is difficult. Let’s get started!
Use case 1: Enterprise applications with high scalability needs
Many commercial software systems like business intelligence, inventory management, customer relationship management, enterprise resource planning, human resource management, and many others require highly scalable, cloud-native databases.
These scenarios necessitate databases that are simple to scale and operate. Such systems entail a number of activities, including data management system maintenance and implementation. In this case, you’ll need a database that improves data consistency while being cost-effective and fast. For example, you can look for a database solution that allows you to normalize data when designing such high-performance systems. Since those systems will have larger data files that must be broken into smaller tables and linked using relationships, it needs to be divided into smaller tables. Long reporting processes and laborious data analysis must be transformed into simplified day-to-day operations.
For example, Amazon Relational Database Service (Amazon RDS) caters to enterprise applications by automating all database administration operations. It also simplifies migration by supporting six different database engines: Amazon Aurora, MySQL, PostgreSQL, MariaDB, Oracle Database, and SQL Server.
Let me give you an example of this by introducing you to Tonkean, a San Francisco-based software startup. The startup’s SaaS platform is heavily reliant on the performance of the database, Amazon RDS for MySQL.
Tonkean is a San Francisco-based software-as-a-service startup known for providing enterprise-grade solutions with minimal code to simplify difficult business processes.
It had problems with its previous cloud provider since it was taking a long time to manage and difficult to maintain. Tonkean opted to switch to Amazon Web Services (AWS) as a result.
Tonkean went with Amazon Relational Database Service for MySQL since it allowed them to scale the database in the cloud. This move improved the product’s reliability and availability. Additionally, the developers were able to establish database instance replicas in order to handle high-volume read requests from numerous copies of data.
“All the data that drives the engine at Tonkean is inside Amazon RDS. We rely heavily on the performance of our database and the stability of the service housing it. And we’re fully confident in AWS.” Afik Udi, Senior Manager of Production and Infrastructure, Tonkean
– source
Here’s how RDS made it as simple as pie for Tonkean:
- Increased web application uptime to 99.99 percent
- Data integrations now have a 99.94 percent uptime.
- Significantly improved scalability, reliability, and performance
- Increased employee productivity using managed services.
Product owners who are looking for simplicity like open-source databases and yet the cost-effectiveness and speed of high-end commercial databases choose Aurora as their preferred relational database for enterprise-scale applications.
Use case 2: eCommerce platform serving multiple regions
Consider developing an eCommerce platform that can accommodate a variety of regional inventory. You’ll require a simple and functional database design. Plus, you can’t afford to sacrifice customer experience and want to provide consistent service across many regions. In order to create a frictionless shopping experience, such use cases need a database that is highly functional and capable of executing speedy live consumer interactions.
You’ll need a database that supports SQL-like queries and ACID transactions, both of which are essential for the platform.
A couple of AWS database options best meet the required characteristics. For example, Amazon DynamoDB or Amazon RDS can be employed depending on the specific use case needs.
DynamoDB is a high-performance database with ACID attributes and fully automated cross-region replication. It allows you to place many replicas on the table. In order to preserve application reliability, changes made to the replica database can also be replicated in other databases.
Amazon Aurora global database can be utilized for application modules requiring a relational data model. It allows scaling of database reads across the regions in Aurora PostgreSQL and MySQL compatible versions.
Use case 3: IoT-based applications, applications demanding high-performance at scale
IoT-based solutions are distributed across multiple geographical locations and are decentralized. Compared to a single centralized cloud-based solution, more such solutions are now built on edge computing and cloud computing. As a result, your database must allow you to handle and synchronize data between edge servers and the cloud.
Internet of Things applications collect data from a variety of smart devices, and it can be difficult to consider such devices while developing an app. Environmental monitoring apps, for example, collect environmental data such as air quality, weather conditions, real-time data analysis for Smart Grid systems, various devices, and traffic sensors transferring data in intelligent traffic management systems.
Here’s how Coca-Cola İçecek used AWS IoT SiteWise and Amazon DynamoDB to build a scalable analytics system.
Coca-Cola icecek (CCI) used AWS IoT SiteWise to improve its operational performance. Coca-Cola is a major bottler in the Coca-Cola system, producing, distributing, and selling Coca-Cola beverages in ten countries across Central Asia and the Middle East.
They built communication between IoT devices and CCI operators as part of their objective to provide solutions to increase asset optimization, promote sustainability, prevent downtime, and bring more information to industrial processes.
Amazon DynamoDB was chosen for state machine processing and calculating and comparing operational data because of its capacity to manage large amounts of data at scale.
Amazon’s DynamoDB NoSQL database is a reliable database service that makes it simple to store and retrieve device data. It’s well renowned for its massive scale performance and capacity to handle over 10 trillion requests per day, as well as its built-in security and backup features. Furthermore, without maintaining infrastructure, you can achieve millisecond latency at scale.
To give you an idea,
Duolingo, a popular online language learning website and mobile application, relies heavily on DynamoDB for its high performance at a scale of 24,000 read units per second and 3,300 write units per second.
It uses Amazon DynamoDB to store 31 billion items to support an online learning platform that delivers lessons in 80 different languages.
Use case 4: Intuitive mobile and web applications for retrieving information
Scenarios such as developing creative, intuitive frontend for product search and cases of retrieving information in the backend, for example, information related to the product’s warranty, supplier, the origin of the shipment, etc.
In this case, the unstructured data like product databases can be stored on JSON-based databases like DynamoDB or MongoDB. On the other hand, the MySQL database can store structured and relational data like user details.
Amazon Aurora MySQL is a MySQL-compatible, fully managed relational database engine. It combines the performance and dependability of high-end commercial databases with the ease and cost-effectiveness of open-source databases. It’s a drop-in alternative for MySQL that makes setting up, operating, and scaling new and existing MySQL deployments simple and cost-effective, allowing you to focus on core application features. Provisioning, patching, backup, recovery, failure detection, and repair are just a few of the routine database activities handled by Amazon RDS. Amazon RDS also offers one-click migration tools for converting your Amazon RDS for MySQL applications to Aurora MySQL.
To give you an example,
Expedia Group believes in constantly growing as a leading online travel platform and bringing innovation in global payments. Therefore, the company decided to migrate to microservices-based, event-driven architecture to meet the scaling and high traffic, high availability requirements supported by infrastructure provided by AWS.
The group used Aurora along with 20 more AWS services and successfully automated manual processes, and cut costs. It also enabled the workforce to focus more on core business products.
On Aurora PostgreSQL, we pay only for what we use, and it automatically adjusts as our data grows.” Nirupama Jagarlamudi, Senior Director, Software Development, Expedia Group – source
The group owns 20+ booking sites, through which travelers from more than 70 countries book tickets for restaurants, flights, and more in 80+ currencies.
The reason Expedia opted for Amazon Aurora was the delay. As the company started growing, jobs couldn’t run faster and it kind of affected the end users by causing multiple delays in executing requested tasks/queries.
Finally, Expedia decided to move to a microservices-based architecture and designed new systems in a similar architecture pattern in order to scale further with the functionality that AWS offers.
Moreover, after performing the necessary cost analysis and offering a comparison among the SQL servers and cloud services, the group decided to go with Amazon Aurora PostgreSQL as their final choice.
Expedia Group could achieve the following by transferring its traditional system to Aurora:
- Reduce database expenses
- Vendor lock-in was eliminated.
- Designed scalable system to handle traffic spikes
- Database administrator tasks were no longer done manually.
Use case 5: Real-time applications are hyper-sensitive to microsecond latencies
It includes applications requiring real-time updates like advertising for marketing campaigns, streaming apps, real-time updates for gaming dashboards, ride-sharing services, and services that require high throughput. Such applications also require microsecond latency to process millions of requests per day and minute basis.
For these use cases, databases that can scale to meet changing demands and provide caching solutions like Amazon ElastiCache for accessing data from managed, in-memory systems are required. The ideal alternatives for this use case would be – databases that offer seamless scaling from a few gigabytes to hundreds of terabytes per storage.
Amazon provides several managed database services that address the needs of real-time applications. Memory DB for Redis, DynamoDB, and ElastiCache are examples of in-memory databases.
Let’s look at Tinder as an example of Amazon Elasticache’s capabilities.
Tinder, the most popular app for meeting new people, has received over 400 million downloads and is available in 190 countries and 40 languages.
The dating app used Amazon ElastiCache for Redis to deliver over 30 billion matches and 2 billion daily member actions.
Tinder used to handle Redis workloads on Amazon EC2 instances on its own, but as traffic expanded, the difficulty of managing Redis instances increased, as did the overhead.
Here’s what using ElastiCache, a fully managed caching solution, meant for Tinder:
- For Tinder’s specific use case and requirements, ElastiCache was a more efficient and cost-effective option.
- Developers were relieved of the time-consuming task of monitoring and managing cache clusters.
- Important backend needs were met: Scalability and stability
- Scaling with self-managed infrastructure was time-consuming and required far too many manual operations, but scaling with AWS Management Console was as simple as pressing a few keys.
Database Migration – What It Is, Understanding Needs, and Challenges
Make a strategy and move forward!
Finally, you’ve chosen your database solution, analyzed the data format, considered scalability requirements, and assessed the capacities you require.
Now is the time to form project teams, decide timeframes, and connect with architects in order to take the next step. This will be your personalized method for navigating a complex set of KPIs tailored to your business requirements.
You need solution architects and knowledgeable experts who are laser-focused on designing a specific strategy, whether it’s data migration, database selection from scratch, or long-term database solution upgrades!