First of all we want to note that this post is for a select group of people who have DBAs on staff. If you don’t, feel free to talk to Frakture about our data warehousing services.
I want to move my database to the cloud, what are some things I should consider?
Congratulations, moving your database to the cloud is a great plan! Database maintenance will be taken care of by experienced teams, your data will be backed up (in real time), you’ll be able to access the database from anywhere, the infrastructure is highly scalable and flexible, and you’ll most likely save money because you only pay for the amount of storage you use.
1. Do you want the database to be managed or do you want to install and configure it yourself?
One of the main benefits of moving to the cloud is to have your database maintenance (patches, restarts, security updates, backups and restores, etc) managed by someone else. However, there are still some options to launch an unmanaged database in the cloud — for example setting up an Oracle database on an Amazon EC2 machine. This may be useful in some licensing scenarios, or very custom workloads, but we’ve found the “managed” services to be the most valuable, and will mostly be discussing them.
At Frakture we use Amazon’s hosted, managed service known as Relational Database Service (RDS), which allows hosting of MySQL, PostgreSQL, SQLServer, Oracle, etc, in a managed environment.
2. Which SQL engine do you want to use?
For decades, virtual wars have been fought over the “best” database engine to use. The best we can offer is to group them in similar decision areas:
- Open Source – MySQL and PostgreSQL are mature, highly scalable SQL databases that long ago met or exceeded the performance of their for-profit brethren.
- “Standard” databases — SQLServer, Oracle, DB2, and a few other databases are available in different cloud hosting formats — however buyer beware, many of them have limited feature sets. Testing for short contract cycles is the rule here — cloud performance can vary dramatically from older infrastructures in unexpected ways.
- “Optimized”, hosted Databases – Some large enterprises have taken the developer interface, or even the whole set of source code of some existing databases, and souped them up for their own infrastructure. Google Big Query, Microsoft Azure Cloud SQL, and Amazon Aurora and Redshift are examples of SQL databases designed for high performance in their respective infrastructure. Each of these engines has their own pro/con matrix. For example, Amazon Redshift is a column store (vs row store) database with a PostgreSQL front end and is great for aggregate reporting across billions of records of data. Big Query is a heavily sharded database spread across Google’s massive computer farms. Aurora is fantastic for more traditional, high volume, row-based relational data.
3. What company do you choose to host?
Frakture uses Amazon Web Services (AWS) for hosting, with Aurora and SQLServer, so that’s what we’re most familiar with and will talk about here. There are a number of other competitive options, including Microsoft Azure and Google Big Query — the space has become quite mature with the large tech companies.
Overall, the cost of data storage these days is very reasonable. Amazon, Microsoft, and Google data warehousing options are similar in both function and price. At this point the open-source and/or cheaper variants (MySQL, PostgreSQL, BigQuery) meet and often exceed the performance of older platforms such as Oracle.
4. Are you going to be using a business intelligence or reporting tool?
- If you’re going to use a BI or reporting tool, do some research on what you want to use before your finalize your database back end.
- The most common data visualization tools we see nonprofits use are Tableau, PowerBI and Google Data Studio. Based on our experience we’ve discovered:
- Google Data Studio doesn’t work well with Microsoft Azure
- PowerBI doesn’t work well with Amazon
- So, run some tests before you commit!
5. Now that you’ve selected what database you’re going to use, you need to identify the data connections to transfer to the new warehouse.
- Make a list of all the data sources connected to the database. If the database has been around for awhile, it is often helpful to put a tracker on the database for about a week so you have a good idea of all the data feeding into the warehouse.
- Now you’re ready to transfer your data. You can move your data all at once or table by table.
- Then, once all the data is moved over, run both databases in parallel for at least one week to be sure they match each other.