How modern trends can be traced back to Conway’s Law
This article was originally posted on my blog.
The article was triggered by and riffs on the “Beware of silo specialization” section of Bernd Wessely’s post Data Architecture: Lessons Learned.
Conway’s Law:
“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.”— Melvin Conway
This is playing out worldwide across hundreds of thousands of organizations, and it is no more evident than in the split between software development and data analytics teams. These two groups usually have a different reporting structure, right up to, or immediately below, the executive team.
This is a problem now and is only growing.
Jay Kreps remarked five years ago that organizations are becoming software:
“It isn’t just that businesses use more software, but that, increasingly, a business is defined in software. That is, the core processes a business executes — from how it produces a product, to how it interacts with customers, to how it delivers services — are increasingly specified, monitored, and executed in software.” — Jay Kreps
The effectiveness of this software is directly tied to the organization’s success. If the software is dysfunctional, the organization is dysfunctional. The same can play out in reverse, as organizational structure dysfunction plays out in the software. This means that a company that wants to win in its category can end up executing poorly compared to its competitors and being too slow to respond to market conditions.
When “software engineering” teams and the “data” teams operate in their own bubbles within their reporting structures, a kind of tragic comedy ensues where the biggest loser is the business as a whole.
There are more and more signs that point to a change in attitudes to the current status quo of “us and them”, of software and data teams working at cross purposes or completely oblivious to each other’s needs, incentives, and contributions to the business’s success. There are three key trends that have emerged over the last two years in the data analytics space that have the potential to make real improvements. Each is still quite nascent but gaining momentum.
- Data engineering is a discipline of software engineering.
- Data contracts and data products.
- Shift Left.
After reading this article, I think you’ll agree that all three are tightly interwoven.
Data engineering has evolved as a separate discipline from that of software engineering for numerous reasons:
- Data analytics / BI, where data engineering is practiced, has historically been a separate business function from software development. This has caused a cultural divergence where the two sides don’t listen to or learn from each other.
- Data engineering solves a different set of problems from traditional software development and thus has different tools.
- Data engineering has changed dramatically over the last 25 years. Many new problems arose that required rethinking the technologies from the ground up, which resulted in a long, chaotic period of experimentation and innovation.
The dust has largely settled, though technologies are still evolving. We’ve had time to consolidate and take stock of where we are. The data community is starting to realize that many of the current problems are not actually so different from the problems of the software development side. Data teams are writing software and interacting with software systems just as software engineers do.
The types of software can look different, but many of the practices from software engineering apply to data and analytics engineering as well:
- Testing.
- Good stable APIs.
- Observability/monitoring.
- Modularity and reuse.
- Fixing bugs late in the development process is more costly than addressing them early on.
It’s time for data and analytics engineers to identify as software engineers and regularly apply the practices of the wider software engineering discipline to their own sub-discipline.
Data contracts exploded onto the data scene in 2022/2023 as a response to the frustration of the constant break-fix work of broken pipelines and underperforming data teams. It went viral and everyone was talking about data contracts, though the concrete details of how one would implement them were scarce. But the objective was clear: fix the broken pipelines problem.
Broken pipelines for many reasons:
- Software engineers had no idea what data engineers were building on top of their application databases and therefore provided no guarantees around table schema changes nor even warned of impending changes that would break the pipelines.
- Data engineers had been largely unable (due to organizational dysfunction or organizational isolation) to develop healthy peer relationships with the software teams they depend on.
Data products are very similar to the REST APIs on the software side. It comes down to the opening up of communication channels between teams, the rigorous specification of the shape of the data, careful evolution as inevitable changes occur, and the commitment of the data producers to maintain stable data APIs for the consumers.
Shift Left came out of the cybersecurity space. Security has also historically been another silo where software and security teams operate under different reporting structures. The idea of Shift Left is to shift the security focus left to where software is being developed, rather than being applied after the fact.
Organizations are becoming software, and software is organized according to the communication structure of the business; if we want to fix the software/data/security silo problem, then the solution is in the communication structure.
Data engineering is software engineering, data contracts/products, and the emergence of Shift Left are all leading indicators.