How to find & use deviation information hidden in unstructured data


Unravelling threads and finding patterns in data to understand quality deviations is a key fundamental in manufacturing productivity. Yet here’s a surprising fact: only 3% of the data associated with quality deviations is held in structured databases. That means 97% is hidden in your organisation’s unstructured data and can't be found. Finding the information and linking the reasons for deviations is hard because there’s no way to discover what’s in unstructured data at scale and in detail.

Life Sciences technology veteran, John Harris and Tom Owen, Solutions Consultant at Exonar spoke at Pharma Tech Integrates. They explored solving the problem of discovering and revealing what’s in an organisation’s structured and unstructured data estate at petabyte scale to increase data management and manufacturing efficiencies and drive business value. 

Watch the live presentation or if you don't have time, skip to the full transcript below: 

John Harris: Hello everyone – great to see you all today. I’m John Harris, former CTO at Glaxo, Mundipharma and other life sciences organisations, currently advising an exciting startup Exonar. I’m speaking alongside my colleague from Exonar, Tom Owen.

Tom Owen: Hello everyone

John: Today Tom and I will be talking about how to unravel the tangled threads in your unstructured data to find and talk about quality management and using use deviation management information.

Exonar is developing a ground-breaking data discovery technology that that organisations globally, including pharma companies, are using to find and understand their data at scale, so they can keep it safe and realise its value.

Every organisation faces the same problem; not knowing what's in their data, particularly across legacy, unmanaged and unstructured data stores. We think there's a huge opportunity to find, secure and use unstructured data to measure and improve operational processes and supply chain quality.

As a bit of background, I was a client of Exonar’s in the past and was intrigued and impressed by their software. It’s not that often that a technology comes along that changes the way you think about an issue, but Exonar had done exactly that in terms of how you can use your data. For too long everyone has thought of data as needing to be structured, managed, processed and catalogued before it can be used to derive insight, but Exonar changes the game and allows you to see, interrogate and use both structured and unstructured data for business insights and value.

As an example, we’ve been working with a large pharma company on a data cleansing exercise which is critical to a merger & acqusition. Whether the data’s in Sharepoint, SAP, email or any other database, the purpose of the project is to make sure the right data moves with the right entities.

I’ve also seen cases where we used Exonar’s technology to find technical and difficult R&D data. We needed to find a patent lost in time, which traditionally would been found through people and long-term organisational memory - that is interviewing the people involved or trawling through as much old paperwork as we could get our hands on and eventually finding the answer. In this case we gave Exonar a real-life challenge that had taken us 2 months to solve yet using Exonar we found the answer in 10 minutes. It was a game changer and suddenly we saw you could ask different questions of your data in a different way.

We believe this has huge overlaps with technical manufacturing, and we can see there are so many use cases which it could be applied to. When you look at deviations and quality management process, typically 3% of the data that can be interrogated is in structured systems, which means of course that 97% is in unstructured data, so just the ability for a quality officer to find the root cause of a deviation is powerful. More than that you can’t usually go deep into related events and information outside the normal quality systems. So just imagine being able to search across the estate and join the dots. For the first time we can do some of this interrogation of unstructured data using Exonar.

I’m going to get my colleague Tom to show you how that might work, live in the Exonar technology.

Tom is using the Exonar demo system which is populated with data from public sources - from the Enron & Sony hacks and WikiLeaks. We are going use an example related to manufacturing or engineering processes – in this case it’s actually turbines as this is the data in the demo system that’s the most analogous to manufacturing.

So, Tom, imagine I’m an engineer who’s had a machine failure or quality issue with a turbine, and I want to understand if this has happened elsewhere in the past, so I can understand why it’s happening in my plant and what the solution was. How would you help me?

Tom: Ok, first let me start by showing the extent of my data. There’s 10TB in the estate and here’s how it’s broken down in the Overview Dashboard. This is an aggregate of all the data in my organisation, structured and unstructured. However I’m going to be looking for only a handful of documents amongst that massive 10TB.

John: Tom, why don’t you search for a list of turbine faults.

: Ok there are various ways I can try to find the data for instance by clicking through any of these charts, but I’m going to create a search with a list of known turbine faults.

Here are 102 documents that have been returned by the system that relate to my search which I’m going to visualise as charts so it’s easier to understand what’s in them.


You’ll see a timeline of when the documents were created, author names and words that represent document Topic extraction.

I’m going to click on the word ‘turbines’ in the Topics which takes me to a smaller subset of documents relating to turbines.

In terms of authors, I recognise Michelle as previously working at my site. If I click on her name, I’m down to only 2 documents.

John: Bear in mind that what you’re looking at here is live data, this isn’t a canned demo. So we’ve gone from a million documents to 2 in just a few seconds. The system is incredibly powerful and incredibly fast. In most systems these kind of searches would take hours.

Tom: Thanks John. I’m going to click into the first of the two reports here and can see we don’t have much information on faults except for tower vibration – and we can see we’ve got some keywords around it.

On the left hand side you can see the body of the document we are looking at, in plain text, and on the right hand side is a list of the keywords we searched for and where the system has found them in the document.

I’m actually interested in the gear box oil pressure so let’s have a look and see if there are any more references to it.

I do a quick search through the document and there are a few more references, but I can see mainly that it’s an outstanding issue.

If I go back to our shortlist of 2 documents, the second is for a site called Somerset which is a different site to the one I work at.

On the right here we can see what the issues are eg gearbox oil pressure. So I’m searching through the document to find the context. And I can see we’ve got a list of the turbines and this was an issue for Turbine 5 and 6. So it looks like the solution was that a fine filter was installed.

If I come out and refine my search to include fine filters and search again, let’s take another look at the Somerset monthly report. So now I can see ‘fine filters’ as a keyword, and I can jump straight to where this is where we can see the serial number of the fine filter which was used to fix the fault.

I can tag these two documents so they can be found again, so I labelled them with the serial number and fine filter.

John: Thanks Tom. How about if you wanted to be able to track that search over time, because it’s something that comes up frequently or you think it could help others with the same issue?

Tom: That’s a good point John. We can turn any of these searches into a Workflow, so if you want to track over time any documents that are being created with the term “Turbine fine filters”, you can create it here and run it with the frequency you require. As documents are created, the system is picking them up and showing them on a graph that a user can click into to get the details and view the documents.


Here’s a Workflow that’s been running for a year. It shows any items that have been found and any new items that have been created in that time. Below is the full history of all the documents that came in on 27th November 2020. You can share this with other users, or give them rights to edit it.

John: Thank you, Tom. The important thing to say here is all the information Tom has just shown you has been dug out of unstructured, legacy data and couldn’t have been found any other way or so quickly and accurately. Exonar can ingest any data set, so if we decided we wanted to pull in information from the Data Historian system we could do that and we could drill into all of it including the free text fields.

John: We would love to have the chance to talk to you all about this more. As I’ve said, we are a start-up with credentials and experience in Life Sciences, keen to have the chance to explore what your challenges are around finding Deviation Management information.

We are looking for the right partners willing to do early adopter work and test how this would work and would love to have the chance to run a pilot or a ‘Test Drive’ on the Deviations use case so we can prove our hypothesis that our technology will solve that problem.

If you’re interested in doing something pioneering with data, or just want to find out more, please do come and visit us on our virtual exhibition stand! Just head over to the Exhibition Hall, click on Meet Exhibitors and scroll down - you’ll find us there and we’d love to say hello in person. Or alternatively go to the Meeting Hub and ping me or Tom a direct message from there.

Thank you all again for coming today. And look forward to chatting further!