Fullscript Logo
Data Engineering

Unleashing the Magic of a SQL Style Guide: Your Data Analyst’s Best Friend

Author

Meinhard Mare - Profile Picture
Meinhard Mare

Date Published

Share this post

Hey there, fellow data enthusiasts! Picture this: you’re knee-deep in SQL queries, unravelling the mysteries of your datasets and weaving insights like a data wizard. But wait! Have you ever considered the power of a SQL-style guide? In this blog post, we’ll explore the captivating benefits of adopting a style guide for SQL queries.

Improved Readability: Clear and concise queries are crucial for efficient data analysis. Following a style guide, We ensure that our SQL queries are consistently formatted and structured, making them more readable. A well-organized code structure allows us to understand complex queries more efficiently, saving valuable time and reducing the risk of errors. This enhanced readability benefits us and facilitates collaboration with other data analysts, developers, or stakeholders who need to understand and work with our queries.

Streamlined Processes and Review Time: Implementing a style guide for SQL queries significantly streamlines our processes and reduces review time. Consistent formatting and styling ensure that queries are standardized, making it easier for members to share their work with teammates and receive feedback. With a clear structure and defined guidelines, the reviewing process becomes more efficient, enabling faster iterations and reducing delays in analysis projects.

Efficient Database Source Location: When working with large databases, complex joins, subqueries, CTEs and multiple tables, it can be challenging to navigate and find relevant sources. To make things easier, a standardized approach can be used. Developing a style guide that outlines how to structure queries can ensure consistent referencing of tables, columns, and aliases. This organized approach can facilitate fast and efficient access to database sources, saving time and minimizing confusion.

Enhanced Bug Detection and Issue Identification: As a data analyst, accuracy is paramount. A style guide helps in detecting bugs and identifying issues within SQL queries. Consistent formatting and naming conventions make spotting syntax errors or logical mistakes easier. By adhering to the guidelines in the style guide, We minimize the chances of introducing errors and ensure that our queries are robust and accurate. Moreover, a standardized approach allows us to quickly identify deprecated or obsolete functions or features being used, enabling us to update our queries proactively.

Conclusion:

There you go, the wonders of a SQL-style guide! With its relatively simple concepts, you’ll conquer the realm of SQL queries like a true data hero. Streamline your processes, delight in readable code, unearth those pesky bugs and find your way through the data maze effortlessly. So, don your wizard’s hat, embrace the magic, and let the SQL style guide be your trusted companion in the quest for data brilliance. Happy querying, my friends!

Bonus:

If you resonate with the post but are unsure how to set up a SQL-style guide, look at how we do it.

General Rules

  • No tabs, 2 spaces per indent
  • No trailing whitespace
  • All keywords are lowercase (e.g., select or as)
  • Only use letters, numbers and underscores in names
  • Never use consecutive underscores, or end a name with an underscore
  • Never have a table with the same name as its column, or vice versa
  • Table names, or an alias to a table name, are always to be included when there is more than one table in the query
  • Queries should start at the lowest level of detail
  • Variable names are to be underscore separated and all lowercase (e.g., total_orders, ordering_patients)
  • Subqueries and CTEs are to be labelled with a descriptive name (e.g., accounts_with_orders)
  • Aggregation functions should only be used as necessary

select

The “select” statement starts the query. Two spaces indent columns within the select statement on new lines. Leading commas separate the columns.

1select
2 table_name.column1
3 , table_name.column2

as

When using aggregates like sum or count, rename the column using “as”.

1select
2 table_name.column1
3 , count(distinct column1.id) as column1_count

case

The indentation of case statements can vary, but following the general structure guidelines is important. This structure aims to group each case statement between the “case” and “end as” points, making complex case statements easier to read by breaking them into distinct code groups. To enhance readability, aligning all “when” and “else” clauses in line is recommended, clearly indicating their beginnings and endings. Additionally, the keywords “or”, “and”, and “then” should be aligned with each other.

When using case statements, defining or grouping all possible outcomes is advisable without relying solely on the “else” clause. This ensures that the “else” clause serves as a fact-check, verifying that no combinations were unintentionally omitted, which could potentially lead to issues later on.

1select
2 table_name.column1
3 , case
4 when table_name.column2 = 'P001'
5 then 'p_one'
6 when table_name.column2 = 'P002'
7 then 'p_two'
8 when (table_name.column2 = 'C003'
9 or table_name.column2 = 'C004')
10 then 'p_three'
11 when table_name.column2 = 'T002'
12 or table_name.column2 = 'T003'
13 and table_name.column3 in ('active')
14 then 'p_four'
15 when (table_name.column2 = 'T002'
16 or table_name.column2 = 'T003')
17 and table_name.column3 not in ('active')
18 then 'p_five'
19 else 'error'
20 end as p_assignment

Case + Aggregation functions

When aggregating on a “case when” statement, it is recommended to line break after the aggregate function call and indent either from the “river” or from the aggregate function name. In this case, the latter approach is used. When finishing the aggregation, place the closing bracket on a new line. The final bracket should be aligned with either the “river” or the aggregate function, depending on how you indented your “case when” statement.

1, sum(
2 case
3 when table_name.column1 = 'active'
4 then 1
5 else 0
6 end
7 ) as active

from

Only one table should be specified when using the “from” clause.

1select
2 table_name.column1
3 , count(distinct stable_name.column1) as column1_count
4from
5 table_name

join

When using the “from” clause, specify the join type, such as “inner join” or “left join” to indicate the type of join operation to be performed.

Each join should be written on its own line, aligned with the select statement. The table name should be placed on a separate line, indented by one level, to enhance readability. Similarly, the “on” statement, which specifies the join condition, should also be placed on its own line, further indented from the table name. In the case of a left join, if there are additional filters, they should be written on their own line, directly beneath the “on” statement.

1select
2 table_name1.column1
3 , count(distinct table_name2.column1) as column1_count
4from
5 table_name1
6left join
7 table_name2
8 on table_name1.column1 = table_name2.column1
9 and table_name2.column1 = 'active'

where

The “where” clause in the query should contain all filters that apply to the entire query. Each individual filter within the “where” clause should have its own line to enhance readability and maintain a clear structure.

1select
2 table_name1.column1
3 , count(distinct table_name2.column1) as column1_count
4from
5 table_name1
6left join
7 table_name2
8 on table_name1.column1 = table_name2.column1
9 and table_name2.column1 = 'active'
10where
11 table_name1.column1 in ('complete','returned')
12 and year(and table_name2.column1) = 'returned'

group by

When listing the fields for grouping, it is considered best practice to use the field names and place each field on its own line. This approach enhances readability and clarity. However, it is also acceptable to use column numbers instead of field names for convenience and ease of understanding. It is important to note that although using column numbers is not considered best practice if requested by a reviewer or colleague, obliging and changing field names is recommended.

grouping by field names

Each field gets its own line with leading commas like so:

1select
2 table_name1.column1
3 , table_name1.column2
4 , count(table_name1.column1) as orders
5from
6 table_name1
7group by
8 column1
9 , column1

grouping by numbers

The numbers are in ascending order and are all on one line with spaces between the numbers

1select
2 table_name1.column1
3 , table_name1.column2
4 , count(table_name1.column1) as orders
5from
6 table_name1
7group by
8 1, 2

order by

In the “group by” clause, it is not customary to specify an explicit ordering. Therefore, including an “order by” within the “group by” clause is not recommended.

However, the “order by” clause is advised to follow similar conventions as in the “group by” clause. Use field names instead of numbers to specify the order. Each field should be listed on its own line to enhance readability. To indicate the sorting order, include the “asc” or “desc” keywords at the end of the field name. Although “asc” (ascending) is the default sorting order, it is still considered best practice to include the keyword for clarity and improved legibility.

1select
2 table_name1.column1
3 , table_name1.column2
4 , count(table_name1.column1) as orders
5from
6 table_name1
7group by
8 column1
9 , column1
10order by
11 column1 asc
12 , column2 desc

Subqueries and Common Table Expressions

Subqueries are a valuable tool for formatting data into the desired format or level of detail, especially when Common Table Expressions (CTEs) are unsupported. It is important to ensure that subqueries are correctly formatted, particularly when dealing with multiple or nested subqueries.

The start of a subquery should be indicated by an opening parenthesis immediately following the relevant clause (e.g., “where”, “from”, or “having”). The subquery itself should be indented with an additional level of indentation.

To clearly demarcate the boundaries of the subquery and maintain a structured format, the closing parenthesis should be placed on a new line and aligned with the opening parenthesis.

Descriptive names should be used for subqueries and CTEs, reflecting their content and purpose within the overall query. It is recommended to keep the contents of a subquery and CTEs minimal, including only the necessary and relevant fields. Unnecessary fields should be avoided, and if needed later on, they can be added during a revision of the query instead of including them “just in case.”

1left join
2 (
3 select
4 table_name2.column1
5 , table_name3.column1 as column2
6 from
7 table_name2
8 inner join
9 table_name3
10 on table_name2.column1 = table_name3.column1
11 where
12 table_name2.column1 in ('complete')
13 and table_name3.column1 is not null
14 group by
15 table_name2.column1
16 ) as sub_query1
17 on sub_query1.column1 = table_name1.column1
18 and sub_query1.column2 = table_name1.column2

Common Table Expressions (with statements)

1with cte_1 as (
2 select
3 table_name2.column1
4 , table_name3.column1 as column2
5 from
6 table_name2
7 inner join
8 table_name3
9 on table_name2.column1 = table_name3.column1
10 where
11 table_name2.column1 in ('complete')
12 and table_name3.column1 is not null
13 group by
14 table_name2.column1
15)
16select
17 table_name1.column1
18 , cte_1.column1
19 , cte_1.column2
20from
21 table_name1
22left join
23 cte_1
24 on cte_1.column1 = table_name1.column1
25 and cte_1.column2 = table_name1.column2

Share this post