Working in data science and analytics for seven years, I have created and queried many tables. There are numerous times I wonder, “What does this column mean?” “Why are there two columns with the same name in table A and table B? Which one should I use?” “What is the granularity of this table?” etc.
If you’ve faced the same frustration, this article is for you!
In this article, I will share five principles that will help you create tables that your colleagues will appreciate. Please note that this is written from the perspective of a data scientist. Therefore, it will not cover the traditional database design best practices but focus on the strategies to make user-friendly tables.
Maintaining a single source of truth for each key data point or metric is very important for reporting and analysis. There should not be any repeated logic in multiple tables.
For convenience, sometimes we compute the same metric in multiple tables. for example, the Gross Merchandise Value (GMV)
calculation might exist in the customer table, monthly financial report table, merchant table…