With storage becoming cheaper by the day companies are collecting more and
more customer data. The retail industry is not an exception to the new trend and is gradually
using more technology solutions to boost sales on all verticals. If 10 years ago most retailers
were using simple statistical software, nowadays they leverage the power of data through complex
and machine learning algorithms which flourished in the last decade.
Retailers are trying to increase sales, yes. But not only that! They’re also seeking to stay relevant to their customers. The nature of the beast changed and the competition between retailers does not only mean cheaper or high-quality products but also products that actually match customer needs.
Recommender systems also known as personalization engines became a must have for everyone who wants to succeed in retail. One of the big questions is how efficient is the engineering team at processing all the data in order to provide the business team with useful information to generate relevant marketing strategies.
10-15 years ago we would have said that our challenge is to scale the storage system in order to cope with the high volume of data being acquired. Nowadays, we have multiple solutions for scaling the storage system and processing the data in an efficient manner. The challenge is to leverage existing solutions by efficiently combining them together in one big data intensive application that satisfies the business requirements.
Time to market is crucial and even more so when we want to deploy a new marketing strategy in production. Leveraging existing solutions not only reduces development and maintenance time, but it also reduces development and operational costs. Retail business requirements can shift from one week to another and new marketing strategies are developed in order to address customer behaviour: being able to ingest new data sources and make them available to data scientists and business analysts is of the essence.
Recommendation systems are all around us in our everyday lives and have redefined the way customers look at their favorite subscriptions. The shopping experience has been greatly impacted for the better. Retailers collecting sufficient data from enough channels provide relevant, matching recommendations and greatly improve the customer experience. But even the best recommendation systems fall short when they don't have a constant data flow and supply, when the data isn't properly ingested or stored. Without the technology to process and interpret massive amounts of data quickly, the system is just not able to provide meaningful and actionable insights in real-time, the basis of all recommendations.
A solution proven to work for most of our customers is Apache Spark deployed alongside custom open source extensions based on the specific needs. For this use case we chose to use pySpark and Jupyter Notebook. We did so because we wanted to benefit from the Python ecosystem.
Given that the engineering team using the system will be made out of business analysts, data scientists and data engineers it made perfect sense to join them using Python. Apache Spark is flexible enough and comes with diverse builtin adapters for ingesting data from a large range of storage systems, batch or stream. Spark is also highly recommended for iterative algorithms such as the ones we will be using because it tries to keep computation in memory as much as possible. For this reason it comes with a builtin machine learning toolkit which the team can leverage without additional integrations.
Therefore our pipelines are built on top of Apache Spark:
Python for the data engineers to implement business requirements and APIs
Jupyter Notebook for the data scientist to crunch numbers and generate insights and models by using pySpark APIs, Pandas or custom APIs built by the engineering team
and last but not least Spark SQL for business analysts which are more versatile in using SQL to look at data
We are big fans of cloud platforms, be it Azure, AWS or GCP. Every cloud platform has its particularities, but they all converge to the same concept: solving the operations overhead. In this particular case we chose Azure Batch, because our client was already using Azure to run other solutions, but any of the other two would have been as good for our purpose. They both come with an equivalent service to Batch and support Spark, which is why we think Apache Spark is a core component for so many analytics solutions out there.
Our jobs are batch and relatively small, therefore it makes sense to cut costs by using low priority nodes. The chances for our machines to drop and our job to fail because of it are very small, therefore the extra cost for dedicated nodes is not worth it. Dedicated nodes are better suited for long running jobs for which we do not afford losing machines and crushing the Spark jobs.
As for storage, we ingest from and write to Azure Storage which works out of the box with Azure Batch and Spark and also provides data versioning if required. The format we use is Apache Parquet, which is excellent for us since all the data is tabular. Parquet also helps by persisting data partitioning and other metadata between Spark jobs. It is actually so good that in some instances it replaces databases altogether.
Overall this gives us a flexible and extensible solution. We were able to understand our client's use case and business metrics and translate these into the right technology. We provided them with a recommender system which was the missing link between their existing technology systems and marketing plans. As of August 2020, the system has been in production for 1 year and we can clearly state that it was able to deliver new marketing strategies on a monthly basis, keeping focus on business problems rather than tech problems. This aspect alone is an essential gain, which helped our client be extremely responsive to the market’s volatility. But there is more! By using highly personalized marketing they are seeing similar ROI results to Netlflix and Amazon:
Increased basket size
By looking at these measurements we can clearly state that the people's purchasing behaviour has changed for the better. They are more loyal with a higher satisfaction rate than before.
We consult with you, discuss all outcomes for your projects. We propose enhancements to your existing data infrastructure. We build production-ready data-intensive solutions
Subscribe To Our Newsletter. We’ll Send Email Notification Everytime We Release A New Article Or Upgrade Our Services.