AWS’s Database Migration Service (DMS) is often misunderstood as a service that can only migrate your data to the cloud. But the service could be very useful for replicating data between the two datastores within the cloud as well. In our case, both of our datastores – RDS (MySQL) and Redshift were already in the cloud. The MySQL database is our primary OLTP database and many of our employees login to an internal application to modify data within this database. This database contains all of our dimension tables. Our reporting dashboard connects with our data warehouse built on Redshift. This database contains all of our fact tables. In order to run the reports accurately it’s important that all the modifications done to the dimension tables in MySQL are transferred to Redshift in timely manner.
Enabling general log in your MySQL RDS instance can be very useful. Especially for auditing and accountability purposes. It’s usually useful to debug problems too. General log can log every single query on your database in mysql.general_log table. Alternatively you can also log all the queries to a log file, but in RDS instance, logging them to a table was the best option in our situation. More information about general log can be obtained from MySQL documentation.
Here is a sample records from one of our general log:
This table has the following columns: event_time, user_host, thread_id, server_id, command_type, argument. For us event time, user_host and argument (query) are the most important columns. They can tell us which application/user executed a given query at a given time.
You can enable general log by modifying the Db Parameter Group associated with your RDS instance. To enable table based log, all you really need to do is set general_log parameter to 1. The log_output is by default set to TABLE. Thus unless you want to switch your log to a file based log, you don’t have to modify anything else. If you want to set a file based log, you will need to modify log_output parameter. More detailed information about the parameters can be obtained from RDS documentation. Good news is that the general_log Db Parameter is dynamic and it means you do not have to reboot your RDS instance.
General log is a very good idea if you roll it up regularly. If you don’t roll it up, it can start occupying the disk space pretty fast. Amazon’s default rules for rolling it:
When TABLE logging is enabled, log tables are rotated every 24 hours if the space used by the table logs is more than 20 percent of the allocated storage space or the size of all logs combined is greater than 10 GB. If the amount of space used for a DB instance is greater than 90 percent of the DB instance’s allocated storage space, then the thresholds for log rotation are reduced. Log tables are then rotated if the space used by the table logs is more than 10 percent of the allocated storage space or the size of all logs combined is greater than 5 GB. You can subscribe to the low_free_storage event to be notified when log tables are rotated to free up space.
The problem with the default rules is that if your database is busy, you may accumulate hundreds of millions of rows if you wait for 20% of your disk space to fill up or the total log size to go over 10 GB. If your disk space is too big, 20% of the disk space might be too big for you to wait for rolling. Your backups will have all this data and if you regularly restore your backup to use it as a test database, the restoration process will take much longer time, will use more bandwidth and use more IO operations. Every weekend, we create a new test database from our production database. On one fine Monday we woke up to realize that our test database is still not restored. We found out that it took much longer because of the general_log table. So setting up a rollup policy became a priority.
Amazon actually makes it very easy to roll up general_log. You simply need to execute the following stored procedure – mysql.rds_rotate_general_log. Initially we decided to write a script calling this procedure every 3 days. But soon we realized that it requires master user to execute the procedure. In limited time we spent, we couldn’t figure out exact set of permissions required to roll the log using a different user. So instead of setting up a cron script and adding master user credentials to it, we decided to simply use MySQL’s own event scheduler.
MySQL events can be compared to triggers. Events get executed at a given schedule – like crontab. The event scheduler runs events at a given frequency. I created the following simple event for rolling general_log:
You can execute the above statement to add the event, but by default the event scheduler is not enabled in a RDS instance. You will need to set event_scheduler Db Parameter Group to ON in order to make sure that your event actually gets executed. Do that before you add the event. Fortunately this was also a dynamic parameter and hence we didn’t have to reboot our database. Once we set the scheduler to ON, we simply added the above event to mysql (system) schema. Walla! The log got rolled immediately!