A few weeks ago, we offered up a few tips and tricks to help you better master MySQL, the web’s most popular relational database management system. While the information we offered up in that piece was definitely useful, it really only scratched the surface. There’s a lot more to learn about MySQL, and quite a few more ‘gotchas’ that even veteran administrators need a bit of work in order to avoid.
That’s why today, we’re going to revisit the topic. Here are a few more useful pieces of information to help you better use MySQL.
On Keys and Redundancy
Our first word of advice: avoid redundant keys. Used properly, artificial primary keys can go a long way towards making your schema less volatile. However, if you rely too heavily upon them, it can introduce a ton of redundancy into your code, causing your database to become increasingly cumbersome the more frequently you use them. In certain situations – such as when storing information that’s not subject to regular change – natural keys are a better choice.
Why You Should Never Trust User Input
Regardless of what you’re using your MySQL installation for, there’s one thing you need to remember: no software is ever truly secure. With that in mind, ensure that every single string entered by anyone other than your administrators is fully sanitized and validated – don’t trust in anything less, as SQL injection attacks are actually remarkably simple to pull off if you’re not taking the proper security measures.
Know Your Data Types
There exists a wide range of different numeric, string, and time data types in MySQL. It’s imperative that you understand these various types, and use the right ones for your project. You should never, for example, store a date as a string, or a dollar value as a date/time. Further, you need to avoid inventing your own data formats if you’re using your database for development – it will only introduce needless complications into your code later down the line.
Keep Away From Transience
Transient data – such logs of user actions – is the sort of thing that can add up exponentially in a very short period of time. You need to make sure you’re regularly flushing out stuff like this, otherwise you could end up with hundreds of millions of rows of application data taking up storage space on your system. It’s data that you really shouldn’t have stored in the first place.
And in the event that you do need to store transient data, make sure you keep it separate from the rest of your data.
Always Backup Everything
It’s actually a little baffling to me how frequently people forget to back up their databases. Even in large enterprises, there are hosts of mission-critical databases that aren’t properly backed up. And even in the case that those databases are backed up, those backups aren’t validated or tested.
This can’t happen. MySQL is a finicky thing, and even the smallest change has the potential to cause unexpected data loss or corruption. Without proper backups, you could be looking at a potential catastrophe – I’ve seen databases being rolled back by over a month, as that was the only backup the administrators had available.
Bad Text Fields Can Cause Chaos
If used improperly, text fields can bring even the most well-designed database to its knees. Generally speaking, if you’re working with text tables, there’s never a need to set your max_heap_table size or tmp_table_size to an especially large value – it won’t reduce the number of temporary table builds written to disk. That’s something that will happen in MySQL regardless, and given that the Text datatype has a maximum value of 64K, fiddling with it is a waste of effort.
Remember Little’s Law When Monitoring
When monitoring your database’s performance, there’s a theorem that it’s important to remember. It’s known as Little’s Law, and is maintained by Baron Schwartz of the Percona Blog to be the fundamental element of queuing theory. I won’t bore you with the details, but according to Schwartz, from Little’s Law we have four fundamental metrics that we can track:
- The observation interval.
- The number of queries in the interval.
- The total time during which queries resided in the system — the “busy time.”
- The total execution time of all queries — the “weighted time.”
And four long-term metrics:
- Throughput: divide the number of queries in the observation interval by the length of the interval.
- Execution time: divide the weighted time by the number of queries in the interval.
- Concurrency: divide the weighted time by the length of the interval.
- Utilization: divide the busy time by the length of the interval.
Even now, we’ve only just scratched the surface. There’s a lot more to talk about as far as MySQL is concerned. But that’s a topic for another day.