What do data projects and underpants-stealing gnomes have in common?

ERGIN TUGANAY, PARTNER, HEAD OF INDUSTRY 4.0, April 21, 2021

The importance of data to business is constantly growing. As the amount of data increases, the accessibility of data services and a smooth user experience will become a lifeline for companies in the future. Therefore, the implementation of data platforms is increasingly moving to modern cloud-based and API-based architectures.

A typical data project issue is well described in the 17th episode of the second season of South Park, Gnomes, which had already gained cult status in 1998. In that episode, South Park heroes Stan, Kyle, Cartman, and Kenny, and their buddy Tweek found themselves losing their underwear. Eventually, the boys find a community of small gnomes snatching their underpants, with money and wealth in mind. When asked about the genius of their business plan, the undies gnomes describe their idea as follows:

  • Phase 1: Collect underpants
  • Phase 2: ?
  • Phase 3: Profit

The absence of Phase 2 indicates that the gnomes did not have an actual plan for monetizing the sets of underwear they cunningly collected.

Incomplete business plan. Picture: Wikipedia (“Gnomes” South Park)

As with the South Park underwear gnomes, the lack of Phase 2 has been a typical problem in data platform projects for many actual companies. Specific challenges faced in data platform projects include:

  • complex business processes
  • complex source systems, closed interfaces, and integration challenges
  • siloed mentality in the business environment and knowledge ownership
  • siloed mentality in technology infrastructure (e.g., the separation of IT/OT)
  • inadequately documented data sources
  • a competence gap brought about by new technologies and the lack of skills

Data architectures in puberty

Compared to many other fields of engineering, information systems are still at an early age. For example, bridge construction, one of the oldest engineering fields, is known to date back to 850 B.C. However, the first steps of digital information systems date back to 1935, when Alan Turing’s works were published at the University of Cambridge.

The Data Warehouse and Data Lake concepts related to Big Data and Data Platform issues are even younger within the information systems industry, almost in their infancy. For example, the term ‘Big Data’ did not appear in academic publications until 1997. Likewise, the first widely used commercial products such as Apache Hadoop, AWS S3, Google Storage, and Azure Data Lake did not become common until the 2010s.

One of the basic rules of engineering is that knowledge and precision increase through experience, which is to say, through trial and error. This is also reflected in the maturity of the data architecture implemented so far.

Bridge construction skills date back about 3,000 years, but the first steps in digital information systems were taken at the University of Cambridge in only 1935. 

Although the concepts under the term Data Platform have barely reached their early teens, no company wants to stand still and wait for the industry to evolve. Indeed, several companies have set off with a strong “data is the new oil” agenda, intending to provide different user communities with a buffet of data on a familiar self-service principle.

The armament race of companies has been boosted by strong promises in the marketing of technology companies about the quick profit that can be achieved with analytics based on AI and machine learning. Although service providers sometimes come out with somewhat unrealistic promises, the reality is undeniable: in the future, data and its exploitation will inevitably play an even greater role in improving the competitiveness of companies. For many companies, data and its skillful utilization are already a standard.

Typical data warehouse architecture and its challenges

Most of today’s data warehousing projects are still built on a centralized organization and monolithic architecture. Its key components are:

  • file-based data archive
  • relational database
  • batch-based ETL integration tool
  • BI and reporting tool

Typically, projects like this meet the very traditional DW, reporting and analytics needs of organizations. However, the data platform architecture of the future must meet the needs of operational monitoring and real-time analytics as well. The need for real-time applications in production can be seen as a growing trend, especially in industrial companies. In the past, this need has been met by various applications from traditional automation vendors. Still, in the future, the application development in production plants will also increasingly focus on modern hybrid/cloud-based data platforms:

  • Production plant operators must be provided with real-time, up-to-date monitoring options that can be viewed in a factory’s control room, nearby to production machines, and in the mobile applications of the management.
  • Maintenance specialists should have a self-service-based view of the process measurement data generated by production machines so that the user can easily examine both long-term trends and raw data by milliseconds.
  • The maintenance service team on-call must be able to receive real-time notifications, e.g., via SMS or Teams messages about acute events and alarms.
  • Partners must be provided with secure REST APIs, data and event interfaces, so the company can implement innovative development projects with its chosen partners based on the data already collected.

“The data platform architectures of the future must meet the needs of, not only reporting but also, operational monitoring and real-time analytics.”

The examples mentioned above present challenges for traditional data warehousing architecture, incurred purely from DW and business analytics needs. As a result, new, more comprehensive data platform architectures and the technologies that support them are emerging – and have already occurred.

Future data platform projects follow product thinking, microservices architecture and DevOps practices

One of the new approach milestones emerged in spring 2019 when Zhamak Dehghani published his views on the future Data Mesh concept. Organizations within the Data Mesh architecture will also move from their monolithic data warehouses to logically distributed data services implemented according to microservices principles. In contrast, modern application architectures have already moved from monolithic systems to microservices and APIs.

Gartner also represents the same architectural mindset. For example, shortly after the publication of Dehghani’s article at the 2020 Gartner IT Symposium/XPO, almost every presentation was based on two themes: Composable Business Architecture and Packaged Business Capabilities.

PBC (Packaged Business Capability) has become the most popular theme based on Zhamak Dehghan’s views on the Data Mesh concept. 

In Gartner’s view, all business applications can be implemented in the future as a collection of packaged digital business capabilities. As a result, they are being recombined with each other repeatedly, which in turn creates completely new innovative capabilities.

Both issues, the Data Mesh presented by Zhamak Dehghani and Gartner’s Composable Business Architecture, are firmly based on the modern application development and integration principles. A distributed architecture and an API-based ecosystem approach play a significant role there. It can be said that future data platform projects will comply with the following principles:

  • Product Thinking – Instead of individual data projects, diverse data products are created, which may have internal or external clients. As with any product, data products must have a designated product owner and team that owns it throughout its lifecycle, from definition to design, implementation, publishing, and maintenance, and, as a beautiful end, the controlled phase out. There are numerous excellent articles and books on the design of user-driven products and services, which have recently been published and offer interesting perspectives and useful tips on the subject.
  • Domain-Driven Design (DDD) – Eric Evans wrote his name in history in 2003 with the publication of one of the most significant works of our era: Domain-Driven Design: Tackling Complexity in the Heart of Software. In his book, Evans summed up the principles, concepts, and a set of methods for how software should be modeled and designed for business first. Although the original work was written nearly 20 years ago, many of the basics presented by Evans still live on firmly in the community of software architects. As a result, several different books and courses have since been refined on the subject.
  • Lambda Architecture – Both batch- and event-based streaming methods are used to process data. Some data are periodically processed using the traditional batch method. Still, the growing portion is of such a nature that it must be processed as the data is generated to minimize delays. For example, the real-time process data (IoT and telemetry measurements) of production plants are of such a nature.
  • Microservices and API Architecture – Microservices and APIs are a modern way to build applications and integrations as part of a data architecture. Among other things, software engineering legend Martin Fowler, known for advocating agile methods, has crystallized several key ideas about microservice architecture in recent decades. The business capability resulting from modeling and microservice architecture is published as a productized data service that provides APIs hiding the complexity of the underlying technologies and providing the business’s capabilities in a consumer-friendly form.
  • DevOps – Microservices are virtually impossible to manage without seamless DevOps and a high level of automation. Therefore, microservices and DevOps go hand in hand. DevOps pipelines are carefully built as part of the solution, and reusable templates are created that can be utilized in various microservices. In addition, the management of environments, networks, server clusters, data warehouses and other resources are implemented programmatically and automatically with IaC (Infrastructure-as-Code) solutions. This also applies to future data architectures.

Summary

Will the data warehouse based on the centralized relational database technologies continue to be at the heart of corporate information management, and what are the mutual roles between it and, for example, the API platform, which provides digital services? Today, both have their distinct place in architecture, but we believe this distinction shall shrink in many ways in the future.

No doubt, we will see more and more companies soon move from their centralized, monolithic relational data warehouses to distributed API architectures. As such, productized, API-enabled data services present clear and understandable business capabilities for various needs: from monthly management reporting to factory production line real-time dashboards and maintenance workers’ mobile app notifications. And why not also to the underpants gnomes tightening up the details of their important underwear business.

 

 

References:

At Nortal, we specialize in the design and implementation of both modern application development and data platforms. We also consult on challenging business ideas to bring about their success.

Get in touch!

Ergin Tuganay

Ergin Tuganay

Partner, Head of Data & IoT

Ergin Tuganay, Partner and Head of Data & IoT at Nortal, has 15+ years of experience in a wide range of areas in industrial automation and data-driven technology. By combining a business development and leadership background, he has helped several global-scale industrial clients digitalize their...

Related content