Global AI and Data Science

 View Only

Things You Should Know About Databases

By Michael Mansour posted Wed May 06, 2020 12:46 AM

  

Things You Should Know About Databases

As a data scientist, you’re likely interacting with a database.  If you’re writing to that database, understanding the way it was set up, and the implications of your transactions is even more important in the design of your application.  This article extracts the most important topics that this engineer wished she knew about databases as an application developer.  She touches on:

  • Knowing the promises of ACID in your database selection, since this meaning is slightly differentiated between databases, especially with respect to isolation and consistency.
  • Assumptions that your database’s operations behave like your application’s, because frequently they’re not the same.  Her example highlights how ordering of sequential transactions is completed in sometimes non-sequentially manners. 
  • Tricks for keeping your DBA happy and your applications’ transactions more efficient by avoiding auto-incrementing primary key generation and making use of database snapshots for certain analytics tasks when slightly stale data is acceptable.
  • Setting up application specific database performance tests that measure tx’s/second when selecting a DB to setup.

For someone who might not have gone into the trenches of databases, this is useful read that may lead you on further investigations to how your database is operating under the hood. 


Our Thoughts

As data science practitioners, we’re expected to be versed in a wide array of computing topics, but sometimes knowing the gotchya’s of databases is not our strong suit.  At least being aware of the topic areas listed here will enable you to have a fruitful relationship with your DBA and to speak their language. You should work with your DBA to understand the knobs set on your DB and the implications they might have for your application, otherwise you’ll end up with unexpected results that can look poor on your team.  Thankfully, most of the concepts will transcend time and be forever useful.


#GlobalAIandDataScience
#GlobalDataScience
1 comment
27 views

Permalink

Comments

Thu May 07, 2020 09:16 AM

This is great! and I must thank you because I had to face this sooner or later.  So basically it looks like it's all good housekeeping rules and keep the data flowing efficiently. So much to study, so many pressing issues to solve.

Thanks