What Data Science Is Good For?
In light of the recent ai-hype I often hear ideas to use neural networks or ML to solve problems that are better off without it. Suprisingly, those ideas often come from “the tech guys” who should’ve known better, but for some reason started to believe in silver bullets. There’s nothing “magical” about machine learning and data science and the fundamental ideas behind it are simple. Here I’ll try to explain for bussiness owners in simple words when the ML techniques shine and when to chose another approach.
It’s all about averages
First thing everyone should understand is that for the most part data science is about approximation. The ML algorithms are operating on assumptions, but not on exact instructions. A funny example. If you create a calcualtor using a neural net, it would solve 2 + 2 = "is likely 4"
, in fact you’ll never get an exact answer, but it is always “good enough”. In fact we considner a model good if it gets most of the answers right for the data it has never seen, and we always expect errors. (Go away nerds, I know what you’d say)
What we use ML for?
Approximation works good at scale. Having enough voice samples we can create speech recognition. Having enough health data, a smart watch will be able to predict a heart attack before it happens. But it is not always about scale or big data - sometimes it’s enough to have a “good enough choice” because it would work in 90% cases. Some recomendation engines and trading algorithms work this way because it is enough to cut losses and the training data may expire next period. This is why most of the time ML is used in combination with other techniques as part of a bigger strategy.
When to choose AI?
Of course, everything depends, but as rule of thumb: “If your business can bare some small operational errors in automated decisions, then ML can do wonders for you. Espeially if you know how to mitigate such errors.” As a part of a system, strategy or an algorithm - the applications of ML may be endless. ML by itself, as a standalone algorithm is usually worthless, because you need to “mitigate ai errors in the same way as human errors”.
When not to use ML/AI?
Instead of debating ai-cultists, I’ll just remind that there’s no a “silver bullet”. If someone tells you that (AI, Blockchain, etc) solves all your problems - run away.
Rule of thumb here is what every good decidion maker does - considering everything trough a “gains vs loses perspective”. Consider what could be improved or mitigated with ai, how much would you spend and what you gain. Yes, it is as simple.
Life examples
I am not at liberty to discuss my clients, but I can describe similar sutuations. Consider following as common cases unrelated to any particular person or business:
A negative example
Because I use LLMs daily and in my case I get ~25-40% wrong suggestions from it. In my case it means that 75-60% routine is offloaded to LLMs. Good, huh?. So usually depending on the task I give a good trained LLM optimistacally a 25% error chance, and if a businness use case cannot bare with it then the idea is not viable. A year ago I gave a similar conclusion to a businees owner who wanted to create AI reports in their ERP. The problem was in the way they wanted to implement it, so they’ve got a “no” from me. Eventually someone promised them that GPT would solve it so they went with hype and spent a 250k on something that doesn’t work as they want. I am not at liberty, but you can guess the cost of a hallucination, and in this case it turned out higher than a human error. The funny thing is that the problem is in implementation and the requirement the business set, the problem is not in AI itself, and in fact, an AI could help a lot if used right way. So the story goes as usual - a business wanted some magic so a yes-man became a magician and it is not his fault (seriously, it isn’t).
A positive example
A popular web resource hired me to implement a better system for working with search engines (Google, Bing, etc). You might know that google has a “crawling budget” per site and their website at the time had ~12 million unindexed pages and growing. Questions: How would you prioritize the pages that have to get first into the search index?. What pages shoud be updated daily, monthly, etc? The business understood some parameters to improve SEO, but previous attempts to implement it failed. So we wrote an algorithm to classify and prioritize the pages. It was a complex task that consisted of different improvements. Training a neural net was a small but important part. So integrating machine learning in their system was ~10% of overall work. As a result the business got ad revenue +3% in a month, +15% six month after, which were big numbers at scale. AI is just a part of an algorithm that was integrated in their business processes. It was 5 years ago, and my only regret is that this algorithm is not my IP, but could be if I were more experienced back then.
Peace ✌️