As a data scientist, you’re likely interacting with a database. If you’re writing to that database, understanding the way it was set up, and the implications of your transactions is even more important in the design of your application. This article extracts the most important topics that this engineer wished she knew about databases as an application developer. She touches on:
- Knowing the promises of ACID in your database selection, since this meaning is slightly differentiated between databases, especially with respect to isolation and consistency.
- Assumptions that your database’s operations behave like your application’s, because frequently they’re not the same. Her example highlights how ordering of sequential transactions is completed in sometimes non-sequentially manners.
- Tricks for keeping your DBA happy and your applications’ transactions more efficient by avoiding auto-incrementing primary key generation and making use of database snapshots for certain analytics tasks when slightly stale data is acceptable.
- Setting up application specific database performance tests that measure tx’s/second when selecting a DB to setup.
For someone who might not have gone into the trenches of databases, this is useful read that may lead you on further investigations to how your database is operating under the hood.
Our Thoughts
As data science practitioners, we’re expected to be versed in a wide array of computing topics, but sometimes knowing the gotchya’s of databases is not our strong suit. At least being aware of the topic areas listed here will enable you to have a fruitful relationship with your DBA and to speak their language. You should work with your DBA to understand the knobs set on your DB and the implications they might have for your application, otherwise you’ll end up with unexpected results that can look poor on your team. Thankfully, most of the concepts will transcend time and be forever useful.
#GlobalAIandDataScience#GlobalDataScience