In mid 2018, after nearly a decade of service, the application was marked for deprecation as it was dependent on legacy on-premise infrastructure as well as costly databases and other third-party technologies. A modernization of the application would allow Zappos to not only reduce its licensing costs but also leverage the scalability, reliability and performance that modern AWS cloud infrastructure and services provide.
While the Analytics data loader was cutting edge at its inception, leveraging industry standards such as C++ and dedicated database infrastructure, its age was beginning to show. C++ as a language continues to have a strong role in software development but is generally no longer used for web technology or highly scalable applications. These factors were making maintenance of the application a challenge as the majority of their engineering team was no longer proficient in C++ and intimate knowledge of the applications inner workings were deteriorating as it was handed off from team to team over time. Additionally, the application was hosted on legacy on-premise hardware limiting its ability to scale without significant cost and time implications, let alone scale dynamically as AWS cloud-based infrastructure can. At times, latency caused the system output to be out of synch with the data at the sources. Business-critical decisions in Marketing, SEM spend and SEO were becoming problematic and the system was beginning to be perceived as unreliable. The culmination of all these factors led the business stakeholders and technology teams to agree that a rewrite and modernization of the system was critical for Zappos’ continued success in leading the marketplace.
Nortal, a technology professional services partner of both Zappos and Amazon as well as an AWS Advanced Partner, was selected to design and implement a new solution to replace the existing Analytics Data Loader. Nortal’s involvement was a result of their proven delivery within the greater Amazon organization as one of a handful of partners trusted by Amazon and AWS to work on Amazon internal systems and projects. Nortal is deeply experienced with Amazon’s proprietary Continuous Integration and Delivery (CI/CD) pipeline, as well as other internal tool sets used to build and deploy applications to AWS. Nortal’s proven track record both at Amazon and with other clients assured stakeholders that their projects will be completed successfully.
The Nortal development team was charged with the task of assessing the functionality of the existing C++ based implementation and then designing and implementing an AWS cloud-based solution that would scale with Zappos in the coming years. Due to the widely varying site traffic levels, a logical solution for data ingestion from the incoming data feed was to use AWS Lambda. AWS Lambda would allow the application to scale wide quickly based off the amount of incoming site data. It would also allow wide parallelization of the processing of the incoming protobuf data. The data processed from the incoming streams by the “Reader Lambda(s)” is then stored in S3 for hourly session based processing. Each region’s “Reader” Lambda(s)” is triggered every time new incoming site data is detected. Consolidation of the data into a single AWS region ensures that each user session is processed in its entirety.
At the top of each hour a pre-configured number of “core” Lambda functions spin up to further process the data that was previously parsed and stored in S3 into meaningful customer interaction data. This data is divided into logical “visit” groupings via a proprietary set of business rules that has been honed and fine-tuned by Zappos over the last decade. These “core” Lambda functions are designed to be highly parallelizable based off the organization’s current needs such as cost limiting, incoming dataset size and others; thousands of Lambda functions can run at a time during this process. This “visit” data is once again stored in the S3 results buckets.
At the start of each day, when the “core” Lambdas finish processing the last hour of the previous day, an EMR (Elastic Map Reduce) cluster is spun up on demand in order to finalize the day’s interaction data and load it into the proprietary BI tool. The data is read into the EMR cluster and additional logic is applied to “stitch” visit data together that crosses the hour mark as well as to merge all of the data together into a single source for ingestion by the BI tool. As of this writing, the system processes over 10Gb per day, and much more during the holiday shopping season.
Zappos is continually tweaking the algorithmic logic and adding new features to its BI tool with every release. In order to avoid stale data and be able to backfill new fields with historical data, Nortal added a “Replay” function with the ability to reprocess an entire day’s worth of data. Using the same architecture as the redesigned application and feeding the “Reader” Lambda a specific date, the reader will then reread and process every file for that day in parallel with the systems normal daily and hourly functionality. The “Reader” Lambda then partitions data into the “Staging” S3 bucket. DynamoDB is used to track the lifecycle stages of the “Replay”.
Zappos involvement and direction was crucial to the success of this project. Zappos facilitated multiple SMEs and data experts, while supporting the agile setup to help guide the project. As with most aging applications, documentation was outdated or, in some cases, missing entirely. This made the initial team ramp up process slower and much more difficult as the team had to derive some of the technical and even some of the business requirements from the existing application code. Another lesson was learned in the architecture design for the “Replay” functionality. After a few initial attempts at re-architecting, the team discovered that using EMR to process the results was a superior architecture design than existing serverless implementation. This architecture would then lead the team to begin to push the limits of S3’s read and write capabilities at scale, in direct relation to the highly parallelized serverless architecture. The Nortal team also started to run into deployment failures when the Lambda versioning system failed. By investigating the root cause of these failed deploys, Nortal learned to delete the Lambda entirely from CloudFormation and redeploy from scratch in production, which would ultimately create a more cumbersome development process.
Nortal and Zappos’ dedicated efforts and deep industry expertise enabled us to deliver the new Analytics Loader with zero data loss or system outages. In addition to the new system, Nortal was also able to deliver a number of enhancements in the new AWS cloud-based application as well as repair some long running known bugs in the original system. The new Analytics loader now leverages AWS Lambda serverless architecture, EMR processing and scalability, as well as S3 storage technologies. With the new cloud-based loader, Zappos has seen improved performance, scale, data accuracy, flexible data error correction, a full CI/CD pipeline, and lightweight infrastructure-as-code (IaC) management. Because of these enhancements, Zappos is more confident that decisions made based on their new cloud-based BI platform are founded on the most accurate and timely data, and will have long-term positive impacts on relevance, customer experience, and overall business performance.