Let's talk Data

A few weeks back I was speaking with a colleague who had just bought a new laptop. Unsurprisingly, a lot had changed about laptops since his last purchase. They’re slimmer and faster, the screens brighter, the battery life longer. They hold more data, can connect to your wireless headphones with ease, and built-in disc drives are long gone. Most everything about a laptop purchased today is better than one purchased in 2016.

But my colleague did have one complaint that felt fairly unexpected — the machine’s built-in spell check function was markedly worse. Not only was it missing mistakes, it was actively recommending spelling, punctuation and tenses that would have rendered the content grammatically incorrect.

Now, spell check may very well be the first automation software most people over the age of 30 ever interacted with. We were seeing the industry’s first consumer-focused foray into Artificial Intelligence, the simple recognition of repeating patterns based on historical data. It’s incredibly useful and difficult to improve upon, so it has remained part of our personal computers ever since and continues to today. But we’re used to a reality in which ubiquitous and long-lasting technologies get better as time goes on. The camera continues to get better and better with each iPhone generation. Gas mileage continues to improve in cars. Home security systems have become more accessible and user-friendly. Even most basic toaster ovens now have air fryers built-in. Consumers expect the technologies they use to improve over time. So why then would a technology as basic and foundational as spell check take steps backward, particularly at a time when technological innovation is advancing significantly across the board?

The answer is simple —Data, data, and more data.

If we can’t develop the intelligence to glean the right insights and act on them effectively, then the data itself is meaningless.

Name

Title

Company

On its surface, this assertion might not make a whole ton of sense. We live in the most data-rich era in human history. Shouldn’t more data mean more insight? Shouldn’t it mean smarter decisions, and better outcomes? Sure, in theory. But not if it’s the wrong data – or worse, bad data. Artificial Intelligence is only as good as the data on which it was trained.

Which brings us back to spellcheck. The earliest iterations were trained on a wide array of published texts that had passed through a stringent set of editorial standards simply to be published in the first place. Today, nearly every human being in the world has their own personal publishing platform — the Internet. Even publications with strict editorial standards have had to let them relax in the name of the 24-hour news cycle and consumer demand for information delivered quickly. In short, there is infinitely more written text available in the world today than there was 30 years ago and the editorial quality of that text is objectively lower. The result? Legacy software that comes standard with every personal computer in the world no longer performs like it once did.

Why is data integrity important?

When people think of data integrity, they’re typically envisioning one of two things. The first is privacy — everything it entails to ensure that customer information is safe and secure. No bad actors getting access or using that data for illegitimate purposes. This has become table stakes across the industry, and anyone who aspires to be a provider of modern software has built enough internal controls both in terms of the product and practices to ensure that data is governed in accordance with industry best practices.

Data Integrity is critical to more accurate and useful outcomes in AI

The second is about the visualization of data, what are most often referred to as insights. This is where both the challenge and the opportunity lie. We need to know how and why a customer is consuming a product or service in order to deliver a continually improving experience. Again, these insights are only as good as the data being analyzed. Accuracy of that insight is paramount to building legitimacy and trust with the user.

As a runner, I’ve always found it helpful to have a Garmin watch with me that can track my vitals. I’ve invested in six or seven over the years, upgrading to new models as they’ve become available. I find the insights the watch provides about my health to be invaluable; long distance running is hard on the body, and monitoring one’s own health is critical. Recently, I bought the latest and greatest watch. But on my next run, I was both surprised and concerned to find that my heart rate was unusually high. This continued for several more runs, and I had long business travel on the horizon. It made me worry enough that I drove myself to urgent care to get checked out before I got on a plane and flew thousands of miles away from home. The doctor gave me a clean bill of health, and I later came to find that my new watch simply had a software glitch. Inaccurate insights erode trust and confidence in the technology. At least for me, it was enough to make me consider abandoning a brand I’d been loyal to for nearly 20 years.

What has changed, and why does that matter?

In the early days of data analysis the incoming data was very structured. It was a simple matter of math and computation that would spit out a chart or insight. As the world has evolved to produce more unstructured data, the statistical modeling around that data can sometimes lead to anomalies in the interpretation of that data. For example, I was driving with my co-founder Henry in his Tesla recently and he expressed concern that a few of the new software updates seemed to have regressed the Full Self-Driving (FSD) experience in his car. It was enough to make him worry about using the feature entirely. A lot of engineering goes into improving the algorithms and models, but if the accuracy of the data on which they’re built gets compromised then instead of being helpful, automation simply creates new pain points.

The influx of data from numerous devices is only useful with proper analysis

There is no such thing as too much data in today’s world – if we can harness it in a meaningful way. Otherwise, it can have an adverse impact on our ability to analyze and act on that data in the right way. Every connected device in use spits out tons of data. There are millions of sensors, industrial and IoT devices generating more data than we can possibly care about. If we can’t develop the intelligence to glean the right insights and act on them effectively, then the data itself is meaningless.

The world of statistical models and machine learning is rife with experimentation and innovation that is consistently improving algorithms and outcomes. I’m optimistic that over time, they will continue to get better. The challenge is ingesting new information. Oftentimes it’s those incoming data sets that are outside the boundaries of what the model is trained for that lead to erroneous behavior. It feels regressive, and customers become upset. Trust erodes.

At Ushur, we get a firsthand look at the importance of maintaining trust every day. Customers in the finance, healthcare and insurance spaces are incredibly risk averse. For the vast majority of professionals working in these industries, it is better not to take any action than to take an action that might lead to business risk. It’s one of the most commonly referenced barriers against the adoption of new technology within these highly regulated industries. Attention to detail, taking great care and providing accurate results are a must. If a hospital is applying AI to read a CT scan that will determine someone’s health diagnosis, that result simply cannot be anything less than 100 percent accurate.

But fascinating and life-altering advancements have been made in the way machines are being trained to detect diseases or conditions that the human eye cannot. As long as the training models are incorporated correctly, the outcomes we can drive for real people will be both predictable and transformative. But 80% accuracy won’t do the trick. 95% accuracy won’t even cut it. Nobody wants to be the victim of that 5%.

The way we leverage data, how we analyze and present it to others, and the actions we take on those insights represent a massive opportunity for the world. But it also poses a huge risk if done incorrectly, sloppily or thoughtlessly. These are powerful tools and applications being put into the hands of the general public. If the data on which our assumptions are based lacks integrity, the results will be ineffectual at best and catastrophic at worst. Which is why the next frontier of innovation in AI revolves around eliminating bias, hallucinations and defining guard rails for LLMs, along with considerations for data privacy and data security. It’s all about data!

Blog Post

Let's talk Data

Blog Post

Simha Sadasiva

Why is data integrity important?

What has changed, and why does that matter?

Table of Contents

Latest Content you might like

Streamline Onboarding for Group and Worksite Benefits with Automation

Automating efficient and empathetic servicing experiences for your borrower

IEHP Reaches Members with Ushur, Launching 265+ Campaigns to Enhance Engagement Using Automation