Search
Explore digital transformation resources.
Uncover insights, best practises and case studies.
Search
Explore digital transformation resources.
Uncover insights, best practises and case studies.
We understand that building a data-driven enterprise is about more than just technology — it’s about assembling the right team to create a strong foundation for innovation, agility, and informed decision-making. With 25+ years of experience in forming high-performing tech teams, we partner with organizations to unlock the full potential of their data through expertly designed solutions.
Our dedicated teams, made up of data engineers, DevOps specialists, and data scientists, bring expertise to every step of your data journey. From architecting streamlined data lake solutions to consolidating data into secure, unified repositories, and implementing governance frameworks for accuracy and compliance, our experts are equipped to handle the complexities of modern enterprise data needs. These scalable, resilient solutions adapt as your business grows, transforming your data infrastructure into a strategic advantage.
But expertise doesn’t stop with delivery — we’re passionate about sharing our knowledge to help businesses thrive in their data-driven transformations.
Data is a powerful enabler across industries. Hospitals use it to maximize resources and save lives, while financial organizations leverage it to prevent IT failures, ensuring smooth operations. For businesses, more data equals more opportunities to learn, save, and improve. However, as enterprises generate vast amounts of data — over 500 petabytes for some — the question arises: Where do they store it all?
An increasingly popular solution is a data lake — a centralized repository capable of storing raw, diverse data in various formats, with the flexibility needed for advanced analytics, machine learning, and other data-driven innovations. In this article, I’ll share our insights into this powerful approach to data management. Together, we’ll explore the key challenges in building and managing data lakes, uncover common mistakes that can derail projects, and highlight how expert teams can help you overcome these obstacles to achieve lasting success. Let’s dive in!
The potential for data lake innovation is endless, with the following examples of data types being stored and managed across industries:
Electronic health records, medical device logs, research data, audio and written doctor notes and interviews, DNA sequencing and diagnostic imaging data.
Stock prices, forex rates, economic indicators, transaction and ATM logs, bank and credit history, call centre recordings, reports, surveys, and forecasts.
Booking details, traveller profiles, demographics, hotel occupancy, flight schedules and delays, transportation logs, ticketing and check-in data, behavioural data, GPS data, beacons in airports and hotels, weather forecasts, etc.
GPS location, fuel consummation, engine performance, real-time traffic flow data, road conditions, IoT device data, delivery routes and shipment tracking logs.
Machine logs and sensor readings, assembly line throughput, quality control data, supplier performance metrics, shipment tracking, route optimisation, energy consumption patterns, CAD files and blueprints, market demand forecasts, regulatory compliance documents, order history and trends.
Invoices and receipts, product images, marketing collateral, IoT in-store sensor data, inventory logs from warehouse management systems, user browsing history and product views, etc.
Event logs, error reports, crash analytics, user interaction data, API logs, firewall, router and network traffic monitoring logs, and machine learning data sets.
Population demographics, text and audio from public hearings and community meetings, imagery from mapping and urban planning, scanned historical documents and multimedia content.
What do these have in common: an insurer adjusting premiums for weather risks, a traveler’s app rebooking a delayed flight, and a marketer fine-tuning local campaigns and discounts for e-commerce? They all rely on data lakes to power these capabilities.
Unlike traditional data warehouses that only work with structured, organized data, data lakes can store raw, unprocessed data in its original format. This flexibility allows companies to save time and reduce costs by skipping the need for extensive data extraction, transformation, and loading (ETL). Data lakes also support real-time use of structured and semi-structured data, making them a strong foundation for data analytics and AI-driven services.
Moreover, data lakes make data accessible to non-technical users. Without requiring SQL knowledge or constant reliance on IT teams, they enable business users to explore and leverage data through self-service dashboards, query tools, or visualization applications. Features like data catalogs and glossaries further simplify the process, making it easy to browse, search, and analyze data sets using indexes and keywords.
Data lake architecture is essential for modern enterprises aiming to make decisions based on data, anticipate trends and user behavior, respond dynamically to changes and events, and handle huge volumes of streaming data at scale. Without it, enterprise data use is restricted to structured information, which massively hinders innovation and progress.
In healthcare, for example, only 20% of electronic health records are structured, making the remaining 80% of insights completely useless if no solution is available to pull, store, and analyze them. With data lakes that combine diverse data types, healthcare organizations unlock invaluable insights from data query and management platforms, social media, IoT and connected devices, and other systems spanning patients, medical providers, hospital operators, clinical researchers, and healthcare insurers.
This capability is particularly impactful in pharmacogenomics, for example, where vast genomic data and clinical trials inform personalized medicine, or in medical imaging analytics, working with large, unstructured image files from MRI, CT, and X-ray scans.
In retail, data lake architectures support omnichannel experiences by creating comprehensive buyer profiles favored by 7 out of 10 shoppers. They also collect and collate data from point-of-sales systems, IoT sensors, and transportation platforms for real-time inventory management and tracking, as well as power dynamic pricing and promotions.
Data lakes also enhance real-time traffic management by integrating live feeds from cameras and sensors, improving traffic flow, safety, and public transport reliability, and reducing congestion and pollution. Another interesting use of unstructured data unlocked by enterprise data lakes is precision farming, integrating satellite imagery, IoT sensors, and market data to optimize crops and reduce waste.
Data lakes enable faster decision-making, providing near/real-time access to information, empowering teams, and offering flexible growth. They can be more cost-effective than traditional data warehouses, allowing for high-volume data storage and raw data ingestion.
However, implementing a data lake solution in an enterprise isn’t free from challenges. Like all innovations, it requires a multidisciplinary team of data engineers, architects, governance experts, security professionals, cloud architects, and data scientists for effective setup and implementation. Strong project leadership and change management are essential to ensure alignment and progress. I’ll get to how to find and build these capabilities later, but let’s first look at the challenges this team will face.
A seamless flow of real-time, complex data is the X factor in the race for efficiency and data-driven decisions. Yet, outdated solutions are often an obstacle, incapable of handling all the complexity a modern enterprise deals with daily.
One of our clients, a global mining leader, faced this frequent scenario before adopting a data lake solution. They were using an old Azure Data Factory set up to collect data for efficiency and sustainability decisions.
Anton Gurkovsky
Solutions Architect
To unify data from globally distributed departments and embed scalability and flexibility into the client’s enterprise data management infrastructure, Nortal assembled a multifunctional, scalable expert team. The team implemented a modern enterprise data lake, a Databricks solution running in Microsoft Azure cloud. Mixing different data types, like JSON logs from web servers with SQL tables from an ERP system, required complex transformations, powerful ETL (Extract, Transform, Load) tools, and a reliable framework for security and governance to avoid the risk of unauthorized access and misuse.
Approaching the integration strategically and knowledgeably ensured data quality and consistency, which are essential for reliable and timely data analyses. Centralizing the previously distributed and siloed data into a single lake with Unity Catalog improved control over access, metadata, and governance.
Workflow automation, impossible with the previous setup, reduced operational costs by 50% while improving data ingestion and distribution across operational systems, business intelligence, and ETL tools, social media platforms, ERP, CRM, and other platforms in near/real-time.
Without proper governance, a centralized data lake can become a significant liability instead of an asset. Consider this scenario:
A large retail chain collects vast amounts of customer data — purchase histories, loyalty program data, heatmaps, clickstreams, and inventory records — to enhance customer experiences. While this approach sounds promising, it introduces significant risks without governance measures like access controls, data quality standards, and metadata management. For example, customers might receive duplicate promotional emails, irrelevant offers, or emails addressed to them incorrectly. These minor issues are just the tip of the iceberg.
The more severe consequences include unauthorized access to sensitive customer information, exposure of purchase and payment histories, and costly regulatory fines for non-compliance with standards like HIPAA, GDPR, or PCI-DSS.
Data governance experts must prioritize transparency and traceability across all data types and formats to mitigate these risks. A robust governance framework can address legal and financial risks while ensuring data is both secure and usable. Here's how we helped a mining industry client strengthen their data governance during a major infrastructure upgrade:
With these measures in place, the client transformed their data lake into a secure, transparent, and compliant resource that supports both operational and strategic goals.
As single points of failure, data lakes are a major target for cyber threats and accidental breaches. They are meant to hold vast volumes of information and scale infinitely, making it impossible to apply one-size-fits-all security measures. Moreover, different data types and formats also require different protections, which further adds to the complexity of securing a lake.
Many things could go wrong when working with data stored in a lake. For example, a financial analyst who pulls up a sales report for an external vendor can accidentally share a dataset containing sensitive customer information. Similarly, a marketing manager in a hospital might view confidential patient health records due to misconfigured access settings. Additionally, ransomware can target data lakes, threatening all company operations by locking out access to all data.
Securing a data lake against these scenarios starts with thoughtful design. Services like Active Directory, Okta, and AWS IAM help manage user permissions, while encryption tools like Azure Key Vault and AWS Key Management Service help keep your information safe. Tokenization replaces sensitive data with tokens for added security. But the list of safeguards for enterprise data lakes doesn’t stop there.
You’ve also got data loss prevention systems, classification tools, monitoring and auditing platforms, backup and disaster recovery apps, and network security tools, all working together to ensure secure data handling.
Managing these security elements can be complex, and human factors remain a common cause of breaches, with three in four events caused by human error. To make matters worse, 21% of enterprise cyberattacks target remote employees’ home environments, making it essential to embed relevant protections in your data lake setup to protect online teams, too.
Ultimately, protecting your data lake isn’t just about embedding the right tools; it also requires regular training, raising awareness, and constant communication. Your team of data lake experts should lead the way, ensuring everyone understands the risks and how to minimize them.
Scalability is one of the promises of data lakes, but your infrastructure needs to be properly configured and maintained to realize it. Otherwise, you may encounter problems that can slow down your data work and affect reliability. This can lead to flawed business decisions, missed opportunities, and slower innovation, affecting your market share and profits:
To keep your data lake running smoothly, no matter how much data you have, it’s important to use strategies like indexing, dividing your data into manageable parts, and optimizing your queries. This requires a mix of tools – from cloud storage solutions and large-scale data processing frameworks to tools for data ingestion and workflow management. Pulling everything together effectively requiresa combination of expert skills, including data engineering, cloud computing, data science, analytics, and big data management.
Building an enterprise data lake is complex. It’s not a one-person job. You need a diverse team of skilled professionals to succeed:
Given the scarcity of talent and the complexities of modern tech hiring, assembling a cohesive team of experts with all the necessary roles and skills can be a significant challenge. This is particularly true for positions like Data Scientists, Big Data Engineers, and Data Security Analysts, who are among the most in-demand roles in IT and tech. Meanwhile, the European tech talent shortage persists.
In these circumstances, partnering with a trusted, dedicated team provider can be a smart move. It allows you to quickly fill roles with experienced, vetted tech teams (within 4-8 weeks, on average) committed to your data lake project and business goals.
Challenges abound in enterprise data lake architectures but so do the advantages. To unlock the latter, you need the skills and talent to tackle the former.
Based on my experience with clients, I’ve noticed several common mistakes in data lake implementations that often end in failed projects or poor performance and high costs. The good news is that these issues can be avoided by getting expert guidance before starting your data lake project.
Data lakes offer a simpler alternative to traditional enterprise data management systems, but deploying them isn’t as easy as it seems. While they can be set up faster than other solutions, the implementation still requires careful planning and an experienced team.
A common pitfall for companies is failing to involve DevOps experts and to hire the right data engineers. This can lead to slow development and errors, resulting in issues with data quality and performance. A lack of automation increases the risk of mistakes and potential data loss.
Moreover, your company may struggle with compliance and governance without automated monitoring. Relying on manual processes can raise project costs, often exceeding the cost of hiring skilled DevOps professionals. Therefore, it’s important to invest in the right expertise for a successful data lake implementation
A big mistake in setting up data lakes is not having a governance framework. Without proper oversight, data issues like inconsistencies, missing values, and duplicates can arise, creating 'data swamps' – chaotic repositories that make data access difficult instead of easy.
Poor governance can also weaken security. If access controls and procedures are unclear, it can lead to unauthorized access and data loss, resulting in expensive compliance fines.
These governance failures lead to flawed decision-making and delays. Employees waste time retrieving, validating, and analyzing data, which reduces their productivity. As a result, organizations face operational inefficiencies, resulting in slower, less reliable products and services. This erosion of customer trust ultimately undermines stakeholder and investor confidence, seriously threatening your project and teams.
Many companies make their data lake systems too complex, often due to a lack of expertise. This complexity can lead to slower data processing and more errors.
For example, having too many data layers slows data processing and increases the likelihood of errors, while developing complex data ingestion pipelines requires more time and effort to build and maintain. Additionally, overly strict security measures can limit access to essential data for users who need it.
In the end, these issues can make it harder to access important insights, slow down operations, and increase costs. By working with skilled data engineers and DevOps teams, companies can simplify their data pipelines, create a more manageable security model, and design a clearer data structure. This makes systems more efficient, increases flexibility, and lowers costs.
Data lakes offer immense potential, but navigating their complexity can be daunting. To unlock their value, you need a strong team and a well-defined plan. At Nortal, we've spent 25+ years assembling high-performing tech teams for global companies across industries. With expertise gained from over 300 team partnerships, we understand the integration, governance, and security challenges of building and managing data lake architectures.
By partnering with us, you can access expert data engineers, DevOps specialists, and data scientists who will integrate seamlessly with your organization to help you:
Nortal is a strategic innovation and technology company with an unparalleled track-record of delivering successful transformation projects over 20 years.