Data Modeling · · 6 min read

Recursion 101

How Recursive CTEs Unlock Practical, Scalable Navigation of Organizational Hierarchies in SQL

Diagram showing an organizational hierarchy with layers of employees reporting upwards

How Recursive CTEs Unlock Practical, Scalable Navigation of Organizational Hierarchies in SQL

Ludek Stehlik, Ph.D.'s recent article, "A graph approach to reporting structures: stop wrestling recursion," offers an insightful exploration of graph-based roll-up metrics. His approach prompted me to reflect on additional ways teams can navigate reporting hierarchies in SQL. In this post, I’d like to add to the conversation by sharing how recursive CTEs can also play a valuable role in these scenarios. While graph-based metrics are excellent for complex network-like hierarchies, recursive CTEs are particularly suited for straightforward, layer-by-layer hierarchical navigation. By considering both methods side by side, analysts can select the most suitable tool for their specific scenario and expand the toolkit available for addressing various organizational data challenges. Here’s why recursive CTEs are often a practical fit for modeling organizational hierarchies:

What is a Recursive CTE?

In technical terms:

A recursive CTE (Common Table Expression) is a SQL construct that enables a query to reference itself, allowing for the retrieval of hierarchical or recursive data structures, such as organizational charts or tree structures.

In less technical terms:

A recursive CTE is like having a set of instructions to identify everyone in a company who reports directly or indirectly to the CEO. You start with the CEO at the top. Then, you look for all the people who report to the CEO. Next, you look for everyone who reports to those people, and you continue this process, going level by level, until you’ve found everyone in the organization. It’s a method of working step by step through the company’s structure, ensuring that no one is missed, regardless of the number of layers between the CEO and the rest of the staff.

Breaking Down the Structure of a Recursive CTE

A recursive CTE has four main parts:

  1. The first SELECT  statement (the anchor member): Consider this the starting point. It finds the CEO, the person at the very top of the organization.
  2. The second SELECT statement (the recursive member): This part is responsible for building out the rest of the organization. It looks for everyone who reports directly to the CEO. Then, for each of those people, it repeats the process, finding their direct reports, and so on. It continues, layer by layer, until it has reached everyone in the company, regardless of their position in the hierarchy.
  3. The UNION ALL statement: This combines the CEO’s record with those of everyone else.
  4. The search boundary (this is important!): Despite our best intentions, it’s possible with recursive CTEs to end up in a situation where the query is in an infinite loop. This commonly happens if a record in your data has the employee reporting to themselves. If this happens, the query will continue to join the employee back to themselves, and the process will never end. We use the WHERE clause in the recursive member to prevent this. I like to use a level restriction for this. The CEO starts at level 0, and for each iteration down the hierarchy, the level increases.

Termination of the recursive CTE occurs when either there are no more records to join on or the condition of the WHERE clause in the recursive member evaluates to TRUE.

Here’s a sample of what a basic recursive CTE may look like:

with recursive org (worker_id, manager_id, level) as (
--Anchor member
select
  emp.worker_id,
  emp.manager_id,
  0 as level
from employee as emp
where emp.manager_id is null

union all

--Recursive member
select
  emp.worker_id,
  emp.manager_id,
  org.level + 1 as level
from employee as emp
  inner join org
    on emp.manager_id = org.worker_id
where org.level < 10
)

select * from org

Going Beyond Basics: Capturing Reporting Paths

I like to add two path fields to my recursive CTEs for employee hierarchies.

Here’s what these fields end up looking like in practice:

worker_id
preferred_fullname
manager_id
job_title
level
path_id
path_name 
1001
David Wallace
null
CEO
0
null
null
1002
Michael Scott
1001
Regional Manager
1
1001
David Wallace
1003
Dwight Schrute
1002
Assistant to the Regional Manager
2
1001 / 1002
David Wallace / Michael Scott
1004
Jim Halpert
1002
Senior Sales Representative
2
1001 / 1002
David Wallace / Michael Scott
1005
Pam Beesly
1002
Receptionist
2
1001 / 1002
David Wallace / Michael Scott
1006
Ryan Howard
1002
Intern
2
1001 / 1002
David Wallace / Michael Scott

These fields come in handy later in subsequent work when we want to start calculating metrics for each leader, such as:

  1. Total Number of Reports - The total number of workers reporting to a leader, including both direct and indirect reports within their organizational span.
  2. Number of Direct Reports - The count of workers who report directly and immediately to a leader.
  3. Number of Indirect Reports - The count of workers who report to the leader through one or more layers of management (i.e., not directly).
  4. Level From Top - The number of steps or layers separating a leader from the topmost position in the organization (such as the CEO).
  5. Levels In Org - The number of hierarchical layers beneath a leader, down to the most junior employee within their organization. This measures the depth of the leader’s reporting structure.

Example Implementation

Here’s a SQL code sample that you can modify and adapt to use in your own organization. 

I optimized this query for a specific dataset and engine. Grouping Sets outperformed other methods by a large margin despite the unconventional code. Your experience may vary depending on your organization’s data and systems.
with recursive org (worker_id, preferred_fullname, manager_id, job_title, level, path_id, path_name) as (
--Anchor member
select
  emp.worker_id,
  emp.preferred_fullname,
  emp.manager_id,
  emp.job_title,
  0 as level,
  cast(emp.worker_id as varchar(1000)) as path_id,
  cast(emp.preferred_fullname as varchar(1000)) as path_name
from employee as emp
where emp.manager_id is null

union all

--Recursive member
select
  emp.worker_id,
  emp.preferred_fullname,
  emp.manager_id,
  emp.job_title,
  org.level + 1 as level,
  concat(concat(org.path_id, ' / '), emp.worker_id) as path_id,
  concat(concat(org.path_name, ' / '), emp.preferred_fullname) as path_name
from employee as emp
  inner join org
    on emp.manager_id = org.worker_id
where org.level < 10
),

--Calculate total reports & max level by organization level
total_unclean as (
  select
    trim(split_part(rtrim(replace(path_id, worker_id, ''), '/'), '/', 1)) as ceo,
    trim(split_part(rtrim(replace(path_id, worker_id, ''), '/'), '/', 2)) as l1,
    trim(split_part(rtrim(replace(path_id, worker_id, ''), '/'), '/', 3)) as l2,
    trim(split_part(rtrim(replace(path_id, worker_id, ''), '/'), '/', 4)) as l3,
    count(worker_id) as num_total_reports,
    max(level) as max_level
  from org
  group by grouping sets(ceo, l1, l2, l3) order by 1, 2, 3
),

--Consolidate manager IDs
total as (
  select distinct
    coalesce(nullif(ceo, ''), nullif(l1, ''), nullif(l2, ''), nullif(l3, '')) as manager_id,
    num_total_reports,
    max_level
  from total_unclean
),

--Calculate direct reports
direct as (
  select
    manager_id,
    count(worker_id) as num_direct_reports
  from org
  group by manager_id
)

--Join it back together
select
  org.worker_id,
  org.preferred_fullname,
  org.manager_id,
  org.job_title,
  org.level,
  nullif(rtrim(replace(org.path_id, org.worker_id, ''), ' / '), '') as path_id,
  nullif(rtrim(replace(org.path_name, org.preferred_fullname, ''), ' / '), '') as path_name,
  nullif(trim(split_part(rtrim(replace(org.path_name, org.preferred_fullname, ''), ' / '), '/', 2)), '') as l1,
  nullif(trim(split_part(rtrim(replace(org.path_name, org.preferred_fullname, ''), ' / '), '/', 3)), '') as l2,
  nullif(trim(split_part(rtrim(replace(org.path_name, org.preferred_fullname, ''), ' / '), '/', 4)), '') as l3,
  coalesce(dir.num_direct_reports > 0, false) as is_manager,
  coalesce(tot.num_total_reports, 0) as num_total_reports,
  coalesce(dir.num_direct_reports, 0) as num_direct_reports,
  tot.max_level - org.level as levels_in_org
from org
  left join total as tot
    on org.worker_id = tot.manager_id
  left join direct as dir
    on org.worker_id = dir.manager_id

Closing Thoughts

Once you get comfortable with the code, there are countless ways to expand its use. For instance, you can modify it to build employee rosters, incorporate logic to count only specific types of employees in your metrics, or layer in additional data to enable metrics such as the number of top performers for each leader. You might even pivot the data for a deeper look at span of control across your organization.

Ultimately, recursive CTEs remain a powerful and accessible tool for navigating organizational hierarchies in SQL. By understanding their structure and capabilities, you can surface new insights into your company’s reporting dynamics. Whether used alone or in conjunction with graph-based approaches, recursive CTEs provide analysts and engineers with a flexible and scalable method for modeling, analyzing, and adapting to the evolving needs of modern organizations.

Reference:

Stehlík, L. (2025, October 31). Ludek's blog about people analytics: A graph approach to reporting structures: Stop wrestling recursion. https://blog-about-people-analytics.netlify.app/posts/2025-10-31-org-stats-and-graph-analysis/