#BIZML: Customer Lifetime Value Modeling as a Win-Win for Both the Vendor and the Customer

Customer Lifetime Value Modeling as a Win-Win for Both the Vendor and the Customer

 

Author: Janne Flinck, Codento

Introduction to Customer Lifetime Value

Customer analytics is not about squeezing out every penny from a customer, nor should it be about short-term thinking and actions. Customer analytics should seek to maximize the full value of every customer relationship. This metric of “full value” is called the lifetime value (LTV) of a customer. 

Obviously a business should look at how valuable customers have been in the past, but purely extrapolating that value into the future might not be the most accurate metric.

The more valuable a customer is likely to be to a business, the more that business should invest in that relationship. One should think about customer lifetime value as a win-win situation for the business and the customer. The higher a customer’s LTV is to your business, the more likely your business should be to address their needs.

A so-called Pareto principle is often used here, which states that 20% of your customers represent 80% of your sales. What if you could identify these customers, not just in the past but in the future as well? Predicting LTV is a way of identifying those customers in a data centric manner.

 

Business Strategy and LTV

There are some more or less “standard” ways of calculating LTV that I will touch upon in this article a little later. These out-of-the-box calculation methods can be good but more importantly, they provide good examples to start with.

What I mean by this is that determining the factors that are included in calculating LTV is something that a business leader will have to consider and weigh in on. LTV should be something that will set the direction for your business as LTV is also about business strategy, meaning that it will not be the same for every business and it might even change over time  for the same business.

If your business strategy is about sustainability, then the LTV should include some factors that measure it. Perhaps a customer has more strategic value to your business if they buy the more sustainable version of your product. This is not a set-and-forget metric either, the metric should be revisited over time to see if it reflects your business strategy and goals.

The LTV is also important because other major metrics and decision thresholds can be derived from it. For example, the LTV is naturally an upper limit on the spending to acquire a customer, and the sum of the LTVs for all of the customers of a brand, known as the customer equity, is a major metric for business valuations.

 

Methods of Calculating LTV

At their core, LTV models can be used to answer these types of questions about customers:

  • How many transactions will the customer make in a given future time window?
  • How much value will the customer generate in a given future time window?
  • Is the customer in danger of becoming permanently inactive?

When you are predicting LTV, there are two distinct problems which require different data and modeling strategies:

  • Predict the future value for existing customers
  • Predict the future value for new customers

Many companies predict LTV only by looking at the total monetary amount of sales, without using context. For example, a customer who makes one big order might be less valuable than another customer who buys multiple times, but in smaller amounts.

LTV modeling can help you better understand the buying profile of your customers and help you value your business more accurately. By modeling LTV,  an organization can prioritize their actions by:

  • Decide how much to invest in advertising
  • Decide which customers to target with advertising
  • Plan how to move customers from one segment to another
  • Plan pricing strategies
  • Decide which customers to dedicate more resources to

LTV models are used to quantify the value of a customer and estimate the impact of actions that a business might take. Let us take a look at two example scenarios for LTV calculation.

Non-contractual businesses and contractual businesses are two common ways of approaching LTV for two different types of businesses or products. Other types include multi-tier products, cross-selling of products or ad-supported products among others.

 

Non-contractual Business

One of the most basic ways of calculating LTV is by looking at your historical figures of purchases and customer interactions and calculating the number of transactions per customer and the average value of a transaction.

Then by using the data available, you need to build a model that is able to calculate the probability of purchase in a future time window per customer. Once you have the following three metrics, you can get the LTV by multiplying them:

LTV = Number of transactions x Value of transactions x Probability of purchase

There are some gotchas in this way of modeling the problem. First of all, as discussed earlier, what is value? Is it revenue or profit or quantity sold? Does a certain feature of a product increase the value of a transaction? 

The value should be something that adheres to your business strategy and discourages short-term profit seeking and instead fosters long-term customer relationships.

Second, as mentioned earlier, predicting LTV for new customers will require different methods as they do not have a historical record of transactions.

 

Contractual Business

For a contractual business with a subscription model, the LTV calculation will be different as a customer is locked into buying from you for the time of the contract. Also, you can directly observe churn, since the customers who churn won’t re-subscribe. For example, a magazine with a monthly subscription or a streaming service etc. 

For such products, one can calculate the LTV by the expected number of months for which the customer will re-subscribe.

LTV = Survival rate x Value of subscription x Discount rate

The survival rate by month would be the proportion of customers that maintain their subscription. This can be estimated from the data by customer segment using, for example, survival analysis. The value of a subscription could be revenue minus cost of providing the service and minus customer acquisition cost.

Again, your business has to decide what is considered value. Then the discount rate is there because the subscription lasts into the future.

 

Actions and Measures

So you now have an LTV metric that decision makers in your organization are happy with. Now what? Do you just slap it on a dashboard? Do you recalculate the metric once a month and show the evolution of this metric on a dashboard?

Is LTV just another metric that the data analysis team provides to stakeholders and expects them to somehow use it to “drive business results”? Those are fine ideas but they don’t drive action by themselves. 

LTV metric can be used in multiple ways. For example, in marketing one can design treatments by segments and run experiments to see what kind of treatments maximize LTV instead of short-term profit.

The multiplication of probability to react favorably to a designed treatment with LTV is the expected reward. That reward minus the treatment cost gives us the expected business value. Thus, one gets the expected business value of each treatment and can choose the one with the best effect for each customer or customer segment.

Doing this calculation for our entire customer base will give a list of customers for whom to provide a specific treatment that maximizes LTV given our marketing budget. LTV can also be used to move customers from one segment to another.

For pricing, one could estimate how different segments of customers react to different pricing strategies and use price to affect the LTV trajectory of their customer base towards a more optimal LTV. For example, if using dynamic pricing algorithms, the LTV can be taken into account in the reward function.

Internal teams should track KPIs that will have an effect on the LTV calculation over which they have control. For example, in a non-contractual context, the product team can be measured on how well they increase the average number of transactions, or in a contractual context, the number of months that a typical customer stays subscribed.

The support team can be measured on the way that they provide customer service to reduce customer churn. The product development team can be measured on how well they increase the value per transaction by reducing costs or by adding features. The marketing team can be measured on the effectiveness of treatments to customer segments to increase the probability of purchase. 

After all, you get what you measure for. 

 

A Word on Data

LTV models generally aim to predict customer behavior as a function of observed customer features. This means that it is important to collect data about interactions, treatments and behaviors. 

Purchasing behavior is driven by fundamental factors such as valuation of a product or service compared with competing products or services. These factors may or may not be directly measurable but gathering information about competitor prices and actions can be crucial when analyzing customer behavior.

Other important data is created by the interaction between a customer and a brand. These properties characterize the overall customer experience, including customer satisfaction and loyalty scores.

The most important category of data is observed behavioral data. This can be in the form of purchase events, website visits, browsing history, and email clicks. This data often captures interactions with individual products or campaigns at specific points in time. From purchases one can quantify metrics like frequency or recency of purchases. 

Behavioral data carry the most important signals needed for modeling as customer behavior is at the core of our modeling practice for predicting LTV.

The data described above should also be augmented with additional features from your businesses side of the equation, such as catalog data, seasonality, prices, discounts, and store specific information.

 

Prerequisites for Implementing LTV

Thus far in this article we have discussed why LTV is important, we have shown some examples of how to calculate it and then discussed shortly how to make it actionable. Here are some questions that need to be answered before implementing an LTV calculation method:

  • Do we know who our customers are?
  • What is the best measure of value?
  • How to incorporate business strategy into the calculation?
  • Is the product a contractual or non-contractual product?

If you can answer these questions then you can start to implement your first actionable version of LTV.

 

 

About the author: Janne Flinck is an AI & Data Lead at Codento. Janne joined Codento from Accenture 2022 with extensive experience in Google Cloud Platform, Data Science, and Data Engineering. His interests are in creating and architecting data-intensive applications and tooling. Janne has three professional certifications and one associate certification in Google Cloud and a Master’s Degree in Economics.

 

Please contact us for more information on how to utilize machine learning to optimize your customers’ LTV.

#GOOGLE­CLOUD­JOURNEY: Cloud Digital Leader Certification – Why’s and How’s?

#GOOGLECLOUDJOURNEY: Cloud Digital Leader Certification – Why’s and How’s?

Author: Anthony Gyursanszky, CEO, Codento

 

Foreword

As our technical consultants here at Codento have been busy in completing their professional Google certifications, me and my colleagues in business roles have tried to keep up with the pace by obtaining Google’s sales credentials (which were required for company-level partner status) and studying the basics with Coursera’s Google Cloud Fundamental Courses. While the technical labs in latter courses were interesting and concrete, they were not really needed in our roles, and a small source for frustration.

Then the question arose: what is the the proper way to obtain adequate knowledge of cloud technology and digital transformation from the business perspective as well as to learn latest with Google Cloud products and roadmap?

I have recently learned many of my  colleagues in other ecosystem companies have earned their Google’s Cloud Digital Leader certifications. My curiosity arose: would this be one for me as well?

 

Why to bother in the first place?

In Google’s words “a Cloud Digital Leader is an entry level certification exam and a certified leader can articulate the capabilities of Google Cloud core products and services and how they benefit organizations. The Cloud Digital Leader can also describe common business use cases and how cloud solutions support an enterprise.”

I earlier assumed that this certification covers both Google Cloud and Google Workspace, and especially how the cultural transformation is lead in Workspace area, but this assumption turned out to be completely wrong. There is nothing at all covering Workspace here, it is all about Google Cloud.  This was good news to me as even though we are satisfied Workspace users internally our consultancy business is solely with Google Cloud.

So what does the certificate cover? I would describe the content as follows:

  • Fundamentals of cloud technology impact and opportunities for organizations
  • Different data challenges and opportunities and how cloud and Google Cloud could be of help including ML and AI
  • Various paths how organizations should move to the cloud and how Google Cloud can utilized in modernizing their applications
  • How to design, run and optimize cloud mainly from business and compliance perspective

If these topics are relevant to you and you want to take the certification challenge  Cloud Digital Leader is for you.

 

How to prepare for the exam?

As I moved on with my goal to obtain the actual certification I learned that Google offers free training modules for partners. The full partner technical training catalog is available for partners on Google Cloud Skills Boost for Partners. If you are not a Google Cloud partner the same training is also available free of charge here.

Training modules are of high quality, super clear and easy to follow. There is a student slide deck for each of the four modules with about 70 slides in each. The amount of text and information per slide is limited and it does not take many minutes to go them through.

The actual videos can be run through in a double-speed mode and one requires passing rate of 80% in quizes after each section. Contrary to the actual certification test the quizes turn out to be slightly more difficult as multi-choice answers were also presented.

In my experience, it will take about 4-6 hours to go through the training and to ensure good chances of obtaining the actual certification. So this is far from the extent required to passing  a professional technical certification where we are talking about weeks of effort and plenty of prerequisite knowledge.

 

How to register to a test?

The easiest way is to book online proctored test through Webasessor. The cost is 99 USD plus VAT which you need to pay in advance. There are plenty of  available time slots for remote tests with 15 min intervals basically any weekday. And yes, if you are wondering, the time slots are presented in your local time even though not mentioned anywhere.

How to complete the online test? There are few prerequisites before the test:

  • Room where you can work in privacy 
  • Your table needs to clean
  • IDs to be available
  • You need to install secure browser and upload your photo in advance (minimum 24h as I learned)
  • Other instructions as in registration process

The exam link will appear at Webassessor site few minutes before the scheduled slot. Then you will be first waiting 5-15 minutes in a lobby and then guided through few steps like showing your ID and showing your room and table with your web camera. This part will take some 5-10 minutes.

After you enroll the test, the timer will be shown throughout the exam. While the maximum time is 90 minutes it will likely take only some 30 minutes to answer all 50-60 questions. The questions are pretty short and simple. Four alternatives are proposed and only one is correct. If you hesitate between two possible correct answers (as it happened to me few times) you can come back to them in the end. Some sources on web indicate that 70% of questions need to be answered correctly.

Once you submit your answers you will be immediately notified whether you pass or not. No information of grades or right/wrong answers will be provided though. Google will come back to you with an actual certification letter in a few business days. A possible new test  can be scheduled earliest in 14 days.

 

Was it worthwhile – my few cents

A Cloud Digital Leader certification is not counted as a professional certification and included to any of the company level partner statuses or specializations. This  might, however,  change in the future.

I would assume that Google has the following objectives for this certification:

  • To provide role-independant enrty certifications, also for general management,  as in other ecoystems (Azure / AWS Fundamentals) 
  • To bring Google Cloud ecosystem better together with proper common language and vision including partners, developers, Google employees and customer decision makers
  • To align business and technical people to work better together to speak the same language and understand high level concepts in the same way
  • To provide basic sales training to wider audience so that sales people can feel ”certified” like technical people

The certification is valid for thee years, but while the basic principle will apply in the future, the Google Cloud product knowledge will become obsolete pretty quickly. 

Was it worth it? For me definitely yes. I practiclally went through the material in one afternoon and booked a cert test for the next morning so not too much time spent in vain. But as I am already sort-of a cloud veteran and Google Cloud advocate I would assume that this would be more a valuable eye-opener for AWS/Azure lovers who have not yet understood the broad potential of Google Cloud. Thumbs up also for all of us business people in Google ecosystem – this is a must entry point to work in our ecosystem.

 

 

About the author:

Anthony Gyursanszky, CEO, joined Codento in late 2019 with more than 30 years of experience in the IT and software industry. Anthony has previously held management positions at F-Secure, SSH, Knowit / Endero, Microsoft Finland, Tellabs, Innofactor and Elisa. Gyursanszky has also served on the boards of software companies, including Arc Technology and Creanord. Anthony also works as a senior consultant for Value Mapping Services. Anthony’s experience covers business management, product management, product development, software business, SaaS business, process management and software development outsourcing. And now Anthony is also a certified Cloud Digital Leader.

 

 

Contact us for more information about Codento services:

#NEXTGENCLOUD: Codento Community Blog: Six Pitfalls of Digitalization – and How to Avoid Them

Codento Community Blog: Six Pitfalls of Digitalization – and How to Avoid Them

By Codento consultants

 

Introduction

We at Codento have been working hard over the last few months on various digitization projects as consultants and have faced dozens of different customer situations. At the same time, we have stopped to see how much of the same pitfalls are encountered at these sites that could have been avoided in advance.

The life mission of a consulting firm like Codento is likely to provide a two-pronged vision for our clients: to replicate the successes generally observed and, on the other hand, to avoid pitfalls.

Drifting into avoidable repetitive pitfalls always causes a lot of disappointment and frustration, so we stopped against the entire Codento team of consultants to reflect and put together our own ideas, especially to avoid these pitfalls.

A lively and multifaceted communal exchange of ideas was born, which, based on our own experience and vision, was condensed into six root causes and wholes:

  1. Let’s start by solving the wrong problem
  2. Remaining bound to existing applications and infrastructure
  3. Being stuck with the current operating models and processes
  4. The potential of new cloud technologies is not being optimally exploited
  5. Data is not sufficiently utilized in business
  6. The utilization of machine learning and artificial intelligence does not lead to a competitive advantage

Next, we will go through this interesting dialogue with Codento consultants.

 

Pitfall 1: Let’s start by solving the originally wrong problem

How many Design Sprints and MVPs in the world have been implemented to create new solutions in such a way that the original problem setting and customer needs were based on false assumptions or otherwise incomplete?

Or that many problems more valuable to the business have remained unresolved when they are left in the backlog? Choosing a technology between a manufactured product or custom software, for example, is often the easiest step.

There is nothing wrong with the Design Sprint or Minimum Viable Product methodology per se: they are very well suited to uncertainty and an experimental approach and to avoid unnecessary productive work, but there is certainly room for improvement in what problems they apply to.

Veera also recalls one situation: “Let’s start solving the problem in an MVP-minded way without thinking very far about how the app should work in different use cases. The application can become a collection of different special cases and the connecting factor between them is missing. Later, major renovations may be required when the original architecture or data model does not go far enough. ”

Markku smoothly lists the typical problems associated with the conceptualization and MVP phase: “A certain rigidity in rapid and continuous experimentation, a tendency to perfection, a misunderstanding of the end customer, the wrong technology or operating model.”

“My own solution is always to reduce the definition of a problem to such a small sub-problem that it is faster to solve and more effective to learn. At the same time, the positive mood grows when something visible is always achieved, ”adds Anthony.

Toni sees three essential steps as a solution: “A lot of different problem candidates are needed. One of them will be selected for clarification on the basis of common criteria. Work on problem definition both extensively and deeply. Only then should you go to Design Sprint. ”

 

Pitfall 2: Trapped with existing applications and infrastructure

It’s easy in “greenfield” projects where the “table is clean,” but what to do when the dusty application and IT environment of the years is an obstacle to ambitious digital vision?

Olli-Pekka starts: “Software is not ready until it is taken out of production. Until then, more or less money will sink in, which would be nice to get back, either in terms of working time saved, or just as income. If the systems in production are not kept on track, then the costs that will sink into them are guaranteed to surpass the benefits sooner or later. This is due to inflation and the exponential development of technology. ”

“A really old system that supports a company’s business and is virtually impossible to replace,” continues Jari T. “The low turnover and technology age of it means that the system is not worth replacing. The system will be shut down as soon as the last parts of the business have been phased out. ”

“A monolithic system comes to mind that cannot be renewed part by part. Renewing the entire system would be too much of a cost, ”adds Veera.

Olli-Pekka outlines three different situations: “Depending on the user base, the pressures for modernization are different, but the need for it will not disappear at any stage. Let’s take a few examples.

Consumer products – There is no market for antiques in this industry unless your business is based on the sale of NFTs from Doom’s original source code, and even then. Or when was the last time you admired Win-XP CDs on a store shelf?

Business products – a slightly more complicated case. The point here is that in order for the system you use to be relevant to your business, it needs to play kindly with other systems your organization uses. Otherwise, a replacement will be drawn for it, because manual steps in the process are both expensive and error-prone. However, there is no problem if no one updates their products. I would not lull myself into this.

Internal use – no need to modernize? All you have to do here is train yourself to replace the new ones, because no one else is doing it to your stack anymore. Also, remember to hope that not everyone who manages to entice you into this technological impasse will come up with a peek over the fence. And also remember to set aside a little extra funds for maintenance contracts, as outside vendors may raise their prices when the number of users for their sunset products drops. ”

A few concepts immediately came to mind by Iiro: “Path dependency and Sunk cost fallacy. Could one write own blog about both of them? ”

“What are the reasons or inconveniences for different studies?” ask Sami and Marika.

“I have at least remembered the budgetary challenges, the complexity of the environments, the lack of integration capacity, data security and legislation. So what would be the solution? ”Anthony answers at the same time.

Olli-Pekka’s three ideas emerge quickly: “Map your system – you should also use external pairs of eyes for this, because they know how to identify even the details that your own eye is already used to. An external expert can also ask the right questions and fish for the answers. Plan your route out of the trap – less often you should rush blindly in every direction at the same time. It is enough to pierce the opening where the fence is weakest. From here you can then start expanding and building new pastures at a pace that suits you. Invest in know-how – the easiest way to make a hole in a fence is with the right tools. And a skilled worker will pierce the opening so that it will continue to be easy to pass through without tearing his clothes. It is not worth lulling yourself to find this factor inside the house, because if that were the case, that opening would already be in it. Or the process rots. In any case, help is needed. ”

 

Pitfall 3: Remaining captive to current policies

“Which is the bigger obstacle in the end: infrastructure and applications or our own operating models and lack of capacity for change?”, Tommi ponders.

“I would be leaning towards operating models myself,” Samuel sees. “I am strongly reminded of the silo between business and IT, the high level of risk aversion, the lack of resilience, the vagueness of the guiding digital vision, and the lack of vision.”

Veera adds, “Let’s start modeling old processes as they are for a new application, instead of thinking about how to change the processes and benefit from better processes at the same time.”

Elmo immediately lists a few practical examples: “Word + Sharepoint documentation is limiting because “this is always the case”. Resistance to change means that modern practices and the latest tools cannot be used, thereby excluding some of the contribution from being made. This limits the user base, as it is not possible to use the organisation’s cross-border expertise. ”

Anne continues: “Excel + word documentation models result in information that is widespread and difficult to maintain. The flow of information by e-mail. The biggest obstacle is culture and the way we do it, not the technology itself. ”

“What should I do and where can I get motivation?” Perttu ponders and continues with the proposed solution: “Small profits quickly – low-hanging-fruits should be picked. The longer the inefficient operation lasts, the more expensive it is to get out of there. Sunk Cost Fallacy could be loosely combined with this. ”

“There are limitless areas to improve.” Markku opens a range of options: “Business collaboration, product management, application development, DevOps, testing, integration, outsourcing, further development, management, resourcing, subcontracting, tools, processes, documentation, metrics. There is no need to be world-class in everything, but it is good to improve the area or areas that have the greatest impact with optimal investment. ”

 

Pitfall 4: The potential of new cloud technologies is not being exploited

Google Cloud, Azure, AWS or multi-cloud? Is this the most important question?

Markku answers: “I don’t think so. The indicators of financial control move cloud costs away from the depreciation side directly higher up the lines of the income statement, and the target setting of many companies does not bend to this, although in reality it would have a much positive effect on cash flow in the long run. ”

Sanna comes to mind a few new situations: “Choose the technology that is believed to best suit your needs. This is because there is not enough comprehensive knowledge and experience about existing technologies and their potential. Therefore, one may end up with a situation where a lot of logic and features have already been built on top of the chosen technology when it is found that another model would have been better suited to the use case. Real-life experience: “With these functions, this can be done quickly”, two years later: “Why wasn’t the IoT hub chosen?”

Perttu emphasizes: “The use of digital platforms at work (eg drive, meet, teams, etc.) can be found closer to everyday business than in the cold and technical core of cloud technology. Especially as the public debate has recently revolved around the guidelines of a few big companies instructing employees to return to local work. ”

Perttu continues: “Compared to this, the services offered by digital platforms make operations more agile and enable a wider range of lifestyles, as well as streamlining business operations. It must be remembered, of course, that physical encounters are also important to people, but it could be assumed that experts in any field are best at defining effective ways of working themselves. Win-win, right? ”

So what’s the solution?

“I think the most important thing is that the features to be deployed in the cloud capabilities are adapted to the selected short- and long-term use cases,” concludes Markku.

 

Pitfall 5: Data is not sufficiently utilized in business

Aren’t there just companies that can avoid having the bulk of their data in good possession and integrity? But what are the different challenges involved?

Aleksi explains: “The practical obstacle to the wider use of data in an organization is quite often the poor visibility of the available data. There may be many hidden data sets whose existence is known to only a couple of people. These may only be found by chance by talking to the right people.

Another similar problem is that for some data sets, the content, structure, origin or mode of origin of the data is no longer really known – and there is little documentation of it. ”

Aleksi continues, “An overly absolute and early-applied business case approach prevents data from being exploited in experiments and development involving a“ research aspect ”. This is the case, for example, in many new cases of machine learning: it is not clear in advance what can be expected, or even if anything usable can be achieved. Thus, such early action is difficult to justify using a normal business case.

It could be better to assess the potential benefits that the approach could have if successful. If these benefits are large enough, you can start experimenting, look at the situation constantly, and snatch ideas that turn out to be bad quickly. The time of the business case may be later. ”

 

Pitfall 6: The use of machine learning and artificial intelligence will not lead to a competitive advantage

It seems to be fashionable in modern times for a business manager to attend various machine learning courses and a varying number of experiments are underway in organizations. However, it is not very far yet, is it?

Aleksi opens his experiences: “Over time, the current“ traditional ”approach has been filed quite well, and there is very little potential for improvement. The first experiments in machine learning do not produce a better result than at present, so it is decided to stop examining and developing them. In many cases, however, the situation may be that the potential of the current operating model has been almost completely exhausted over time, while on the machine learning side the potential for improvement would reach a much higher level. It is as if we are locked in the current way only because the first attempts did not immediately bring about improvement. ”

Anthony summarizes the challenges into three components: “Business value is unclear, data is not available and there is not enough expertise to utilize machine learning.”

Jari R. wants to promote his own previous speech at the spring business-oriented online machine learning event. “If I remember correctly, I have compiled a list of as many as ten pitfalls suitable for this topic. In this event material, they are easy to read:

  1. The specific business problem is not properly defined.
  2. No target is defined for model reliability or the target is unrealistic.
  3. The choice of data sources is left to data scientists and engineers and the expertise of the business area’s experts is not utilized.
  4. The ML project is carried out exclusively by the IT department itself. Experts from the business area will not be involved in the project.
  5. The data needed to build and utilize the model is considered fragmented across different systems, and cloud platform data solutions are not utilized.
  6. The retraining of the model in the cloud platform is not taken into account already in the development phase.
  7. The most fashionable algorithms are chosen for the model. The appropriateness of the algorithms is not considered.
  8. The root causes of the errors made by the model are not analyzed but blindly rely on statistical accuracy parameters.
  9. The model will be built to run on Data Scientist’s own machine and its portability to the cloud platform will not be considered during the development phase.
  10. The ability of the model to analyze real business data is not systematically monitored and the model is not retrained. ”

This would serve as a good example of the thoroughness of our data scientists. It is easy to agree with that list and believe that we at Codento have a vision for avoiding pitfalls in this area as well.

 

Summary – Avoid pitfalls in a timely manner

To prevent you from falling into the pitfalls, Codento consultants have promised to offer two-hour free workshops to willing organizations, always focusing on one of these pitfalls at a time:

  1. Digital Value Workshop: Clarified and understandable business problem to be solved in the concept phase
  2. Application Renewal Workshop: A prioritized roadmap for modernizing applications
  3. Process Workshop: Identifying potential policy challenges for the evaluation phase
  4. Cloud Architecture Workshop: Helps identify concrete steps toward high-quality cloud architecture and its further development
  5. Data Architecture Workshop: Preliminary current situation of data architecture and potential developments for further design
  6. Artificial Intelligence Workshop: Prioritized use case descriptions for more detailed planning from a business feasibility perspective

Ask us for more information and we will make an appointment for August, so the autumn will start comfortably, avoiding the pitfalls.

 

#BIZML: Piloting Machine Learning at Speed – Utilizing Google Cloud and AutoML

Piloting machine learning at speed – Utilizing Google Cloud and AutoML

 

Can modern machine learning tools do one-weeks work in an afternoon? The development of machine learning models has traditionally been a very iterative process. The traditional machine learning project starts with the selection and pre-processing of data sets: cleaning and pre-processing. Only then can the actual development work of the machine learning model be started.

It is very rare, virtually impossible, for a new machine learning model to be able to make sufficiently good predictions on the first try. Indeed, development work traditionally involves a significant number of failures both in the selection of algorithms and their fine-tuning, in technical language in the tuning of hyperparameters.

All of this requires working time, in other words, money. What if, after cleaning the data, all the steps of development could be automated? What if the development project could be carried through at an over-paced sprint per day?

 

Machine learning and automation

In recent years, the automation of building machine learning models (AutoML) has taken significant leaps. Roughly described in traditional machine learning, the Data Scientist builds a machine learning model and trains it with a large dataset. AutoML, on the other hand, is a relatively new approach in which the machine learning model builds and trains itself using a large dataset.

All the Data Scientist needs to do is tell you what the problem is. This can be a problem with machine vision, pricing or text analysis, for example. However, Data Scientists will not be unemployed due to AutoML models. The workload shifts from fine-tuning the model to validating and using Explainable-AI tools.

 

Google Cloud and AutoML used to sole a practical challenge

Some time ago, we at Codento tested Google Cloud AutoML-based machine learning tools [1]. Our goal was to find out how well Google Cloud AutoML tool solves the Kaggle House Prices – Advanced Regression Techniques challenge [2].

The goal of the challenge is to build the most accurate tool possible to predict the selling prices of real estates based on their properties. The data set used in the building of the pricing model contained data on approximately 1,400 real estates: In total 80 different parameters that could potentially affect the price, as well as their actual sales prices. Some of the parameters were numerical, some were categorical.

 

Building a model in practice

The data used was pre-cleaned. The first phase of building the machine learning model was thus completed. First, the data set, a file in csv format, was uploaded as is to Google Cloud BigQuery data warehouse. The download took advantage of BigQuery’s ability to identify the database schema directly from the file structure. The AutoML Tabular feature found in the VertexAI tool was used to build the actual model.

After some clicking, the tool was told which of the price predictive parameters were numeric and which were categorical variables. In addition, the tool was told which column contains the predicted parameter. It all took about an hour to work. After that, the training was started and we started waiting for the results. About 2.5 hours later, the Google Cloud robot sent an email stating that the model was ready.

 

The final result was a positive surprise

The accuracy of the model created by AutoML surprised the developers. Google Cloud AutoML was able to independently build a pricing model that predicts home prices with approximately 90% accuracy. The level of accuracy per se does not differ from the general level of accuracy of pricing models. It is noteworthy here, however, that the development of this model took a total of half a working day.

However, the benefits of GCP AutoML do not end there. It would be possible to integrate this model with very little effort into the Google Cloud data pipeline. The model could also be loaded as a container and deployed in other cloud platforms.

 

Approach which pays off in the future as well

For good reason, tools based on AutoML can be considered the latest major development in machine learning. Thanks to the tools, the development of an individual machine learning model no longer has to be thought of as a project or an investment. Utilizing the full potential of these tools, models can be built with an approximately zero budget. New forecasting models based on machine learning can be built almost on a whim

However, the effective deployment of AutoML tools requires a significant initial investment. The entire data infrastructure, data warehouses and lakes, data pipelines, and visualization layers, must first be built with cloud-native tools. Codento’s certified cloud architects and data engineers can help with these challenges.

 

Sources:

Google Cloud AutoML, https://cloud.google.com/automl/ 

Kaggle, House Prices – Advanced Regression Techniques, https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/

 

The author of the article is Jari Rinta-aho, Senior Data Scientist & Consultant, Codento. Jari is a consultant and physicist interested in machine learning and mathematics, with extensive experience in utilizing machine learning in nuclear energy. He has also taught physics at several universities and led international research projects. Jari’s interests include ML-Ops, AutoML, Explainable AI and Industry 4.0.

 

Ask more about Codento’s AI and data services:

#GOOGLECLOUDJOURNEY, Certificates Create Purpose

#GCPJOURNEY, Certificates Create Purpose

Author: Jari Timonen, Codento Oy

What are IT certifications?

Personal certifications provide an opportunity for IT service companies to describe the level and scope of expertise of their own consultants. For an IT service provider, certifications, at least in theory, guarantee that a person knows their stuff.

The certificate test is performed under controlled conditions and usually includes multiple-choice questions. In addition, there are also task-based exams on the market, in which case the required assignment is done freely at home or at work.

There are many levels of certifications for different target groups. Usually they are hierarchical, so you can start with a completely foreign topic from the easiest way. At the highest level are the most difficult and most respected certificates.

At Codento, personal certifications are an integral part of self-development. They are one measure of competence. We support the completion of certificates by enabling you to spend your working time studying and by paying for the courses and the exam itself. Google’s selection has the right level and subject matter certification for everyone to complete.

An up-to-date list of certifications can be found on the Google Cloud website.

Purposefulness at the center

Executing certificates for the sake of “posters” alone is not a very sensible approach. Achieving certifications should be seen as a goal to be read structurally when studying. This means that there is some red thread in self-development to follow.

The goal may be to complete only one certificate or, for example, a planned path through three different levels. This way, self-development is much easier than reading an article here and there without a goal.

Schedule as a basis for commitment

After setting the goal, a schedule for the exam should be chosen. This really varies a lot depending on the entry level and the certification to be performed. If you already have existing knowledge, reading may be a mere recap. Generally speaking, a few months should be set aside for reading. In the longer term, studying will be more memorable and thus more useful.

Test exams should be taken from time to time. They help to determine which part of the experiment should be read more and which areas are already in possession. Test exams should be done in the early stages of reading, even if the result is poor. This is how you gain experience for the actual exam and the questions in the exam don’t come as a complete surprise.

The exam should be booked approximately 3-4 weeks before the scheduled completion date. During this time, you have time to take enough test exams and strengthen your skills.

Reading both at work and in your free time

It is a good idea to start reading by understanding the test area. This means finding out the different emphases of the experiment and listing things. It is a good idea to make a rough plan for reading, scheduled according to different areas

After the plan, you can start studying one topic at a time. Topics can be approached from top to bottom, that is, first try to understand the whole, then go into the details. One of the most important tools for cloud service certifications in learning is doing. Things should be done by yourself, and not just read from books. The memory footprint is much stronger when you get to experiment with how the services work yourself.

Reading and doing should be done both at work and in your free time. It is usually a good idea to set aside time in your calendar to study. The same should be scheduled for leisure, if possible. In this case, the study must be done with a higher probability.

Studying regularly is worth it

Over the years, I have completed several different certifications in various subject areas: Sun Microsystems, Oracle, AWS, and GCP. In all of these, your own passion and desire to learn is decisive. The previous certifications always provide a basis for the next one, so reading becomes easier over time. For example, if you have completed AWS Architect certifications, you can use them to work on the corresponding Google Cloud certifications. The technologies are different, but there is little difference in architecture because cloud-native architecture is not cloud-dependent.

The most important thing I’ve learned: Study regularly and one thing at a time.

Concluding remarks: Certificates and hands-on experience together guarantee success

Certificates are useful tools for self-development. They do not yet guarantee full competence, but provide a good basis for striving to become a professional. Certification combined with everyday life is one of the strongest ways to learn about modern cloud services that benefit everyone – employee, employer and customer – regardless of skill level.

The author of the blog, Jari Timonen, is an experienced software professional with more than 20 years of experience in the IT field. Jari’s passion is to build bridges between the business and the technical teams, where he has worked in his previous position at Cargotec, for example. At Codento, he is at his element in piloting customers towards future-compatible cloud and hybrid cloud environments.

#BIZML, Business-driven Machine Learner with Google Cloud

Business-driven Machine Learner with Google Cloud: Multilingual Customer Feedback Classifier

Author: Jari Rinta-aho, Codento

At Codento, we have rapidly expanded our services to demanding implementations and services for data and machine learning. When discussing with our customers, the following business goals and expectations have often come to the fore:

  • Disclosure of hidden regularities in data
  • Automation of analysis
  • Minimizing human error
  • New business models and opportunities
  • Improving and safeguarding competitiveness
  • Processing of multidimensional and versatile data material

In this blog post, I will  go through the lessons from our recent customer case.

Competitive advantage from deep understanding customer feedback

A very concrete business need arose this spring for a Finnish B-to-C player: huge amounts of customer feedback data come, but how to utilize feedback intelligently in decision-making to make the right business decisions.

Codento recommended the use of machine learning

Codento’s recommendation was to take advantage of the challenging machine learning approach and Google Cloud off-the-shelf features to get the customer feedback classifier ready by the week.

The goal was to automatically classify short Customer Feedback into three baskets: Positive, Neutral, and Negative. Customer feedback was mainly short Finnish texts. However, there were also a few texts written in Swedish and English. The classifier must therefore also be able to recognize the language of the source text automatically.

Can you really expect results in a week?

At the same time, the project was tight on schedule and ambitious. There was no time to waste in the project, but in practice the results had to be obtained on the first try. Codento therefore decided to make the most of the ready-made cognitive services.

Google Cloud plays a key role

It was decided to implement the classifier by combining two ready-made tools found in the Google Cloud Platform: Translate API and Natural Language API. The purpose was to mechanically translate the texts into English and determine their tone. Because the Translate API is able to automatically detect the source language from about a hundred different languages, the tool met the requirements, at least on paper.

Were the results useful?

Random sampling and craftsmanship were used to validate the results. From the existing data, 150 texts were selected at random for the validation of the classifier. First, these texts were sorted by hand into three categories: positive, neutral, and negative. After that, the same classification was made with the tool we developed. In the end, the results of the tool and the craft were compared.

What was achieved?

The tool and the analyzer agreed on about 80% of the feedback. There was no contrary view. The validation results were pooled into a confusion matrix.

The numbers 18, 30, and 75 on the diagonal of the image confusion matrix describe the feedback in which the Validator and the tool agreed on the tone of the feedback. A total of 11 feedbacks were those in which Validator considered the tone positive but the tool neutral.

 

The most significant factor that explains the different interpretation made by the tool is the cultural relevance of the wording of the customer feedback, and when a Finn says “No complaining”, he praises.

Heard from an American, this is neutral feedback. This cultural difference alone is sufficient to explain why the largest single error group was “positive in the view of the validator, neutral in the view of the tool.” Otherwise, the error is explained by the difficulty of distinguishing between borderline cases. It is impossible to say unambiguously when slightly positive feedback will turn neutral and vice versa.

Utilizing the solution in business

The data-validated approach was well suited to solve the challenge and is an excellent starting point for understanding the nature of feedback in the future, developing further models for more detailed analysis, speeding up analysis and reducing manual work. The solution can also be applied to a wide range of similar situations and needs in other processes or industries.

The author of the article is Jari Rinta-aho, Senior Data Scientist & Consultant, Codento. Jari is a consultant and physicist interested in machine learning and mathematics, who has extensive experience in utilizing machine learning, e.g. nuclear technologies. He has also taught physics at the university and led international research projects.