What is data discovery?

x-icon

Data discovery involves systematically searching data repositories within organisations, to reveal everything about what's inside and then act on it to protect or extract value from that information. It applies to unstructured information such as documents, emails and files, as well as database content. Companies hold huge amounts of data of all sorts, which makes data discovery and visibility a huge technical challenge to overcome at scale.

Every organisation has become a 'data company' in recent years, do to the explosion of technology. This has been driven by the convenience, cost-saving and utility of tech in all corners of our lives.  The adoption of technology means that around 98% of all digital data was created in the last 10 years. 

In 2021 mankind is likely to create around 59 Zetabytes (that's a lot) of information and it's expected to grow exponentially every year.  A huge percentage of this data is collected, generated and stored by organisations in the service of their customers or in carrying on their business activities.

A growing problem to address

Until now, organisations simply stored the data they created or collected as their IT infrastructure grew and evolved. While newer systems have begun to improve in terms of protecting data as it's generated, legacy data has became 'dark' and lost along the way in terrifying volumes. 

As with all 'revolutions', regulation tends to lag innovation. After strings of data breaches involving sensitive and personal information, the regulatory environment is fast catching up with organisations and holding companies to account.

Why organisations are embarking on data discovery

The problem is, while organisations know that their data isn't in good shape, it's only recently that this lack of visibility of data at scale seriously needed to be addressed.  Cleaning up the data estate has been an almost impossible task, lacking the technology to make it possible, let alone the willpower.  So it's been something to be put off to a future date.

From around 2019 we noticed a big increase in the number of organisations that realised they need to take action on their data - that doing nothing is no longer an option. 

There are many everyday data-related business activities that our customers are tackling more effectively through the use of data discovery technology, such as:

  • Understanding and preparing data prior to cloud migration;
  • Splitting or combining data sets after re-organisation of business activities;
  • Pre and Post activity related to data as part of business mergers & acquisitions;
  • Discovering, protecting and removing sensitive data for privacy or data governance;
  • Investigations and audits of data for legal or commercial reasons;
  • Data management and reduction in volume of data storage
  • Extracting hidden corporate knowledge trapped in legacy data; and
  • Understanding and connecting data into a 'single view' as part of digital transformation.

At Exonar, we believe that once discovered, the creation of a single index of everything your organisation holds is the best way to ensure it remains visible to the business and able to be protected or extracted in the future.

The challenge of managing unstructured data

In the course of everyday business, people create, share, edit and store files containing a range of data and information. This is known as 'unstructured data' and it includes documents, emails, spreadsheets, letters, contracts, creates a challenge for organisations, because employees need access to information to do their jobs, but the nature of unstructured data means that it's hard to keep track of.

The problem is that when people save documents, files and data, they do so in various systems, applications, shared file stores and databases, making data protection and confidentiality a challenge. Some of this data is neatly structured in protected systems and databases, but the majority of it is lying unstructured across the estate, potentially unprotected against a breach.   

It is estimated that unstructured data accounts for about 90% of the digital universe [1], and it’s growing at a rate of 65% per year [2]Perhaps unsurprisingly, the majority (95%) of businesses cite the need to manage unstructured data as a problem for their business [3].

The biggest issue with unstructured data is that it poses a high risk to the organisation – two-thirds (65%) of IT professionals believe unstructured data makes it a struggle to keep their business secure against an information security breach [4].  

It’s only when organisations know what data they have and where it is storedthat they can start to do something to protect it. That’s why the smart ones use data discovery technology. 

What is data discovery technology? 

Good data discovery software provides unprecedented visibility of organisational data at scale, coverage and detail. This means it has the ability to identify ALL data types – structured, semi-structured and unstructured – across all data repositories, regardless of whether they’re based in the cloud or on-premise. The data estate is visualised through clickable dashboards that give an overview of what is where, and visibility of the detail of every document and data type with drill-down capabilities.

That snapshot of the data estate in its entirety is going to contain:

  • The good: where data is nicely organised and properly secured.
  • The bad: where data has fallen out of governance so action needs taking.
  • The ugly: data that has been over retained, lost or forgotten.

The good news is that once the data is discovered, appropriate remediation actions can be taken to govern what’s sensitive, delete what’s no longer needed and use what’s valuable. 

Most data discovery tools will simply scan an organisation’s data estate upon command to show where information is located and who has access to it. The limitation with scanning technology is that it is slow (every item must be re-scanned with every search) and requires the user to know what they are searching for.

There's only one data discovery product that takes a different approach - Exonar. The software ingests an organisation's data from any data store at all, to categorise information intelligently in an instantly searchable index. The software uses pattern matching and machine learning to identify characteristics within the data, such as personal information, and understands its context, for example a CV.

Now organisations can see the information they knew they had, as well as the data they didn't know was there and can query their data in any way they choose, to produce real-time search results.

There are a number of valuable use cases for data discovery: 

  • Mitigate risk: identify unstructured data that needs moving to a secure location or deleting. 
  • Find company sensitive information: find a single document lost for years within a huge data estate with laser precision. 
  • Compliance: 'know your data' for both regulatory and contractual purposes. 
  • Power the organisation: distil valuable insights in the data to drive the business forward.  

The benefits of data discovery 

Data discovery mitigates risk 
When an organisation has an index of its data at scale, it can see exactly what’s in its estate. With full visibility of all unstructured and structured data, the organisation can be confident about what data it has and why, where that data is and who has access to it. 

In the event of an external breach, even if someone did enter the infrastructure, it would be very hard for them to access and extract sensitive/personal/confidential data, because appropriate remediation actions would have been made to secure unstructured datasets.

The organisation is afforded the opportunity to nudge individuals into displaying the right behaviours in order to reduce accidental internal data breaches, by flagging up where someone is acting against corporate policy, such as emailing an encrypted file and password together. 

And if the organisation is embarking on a digital transformation project, data discovery will significantly mitigate the risks associated with migrating data to the cloud – rather than ‘lift and shift’ everything, they are afforded the opportunity to clean data first, and only move suitable workloads. 

Data discovery provides laser precision 
Once an organisation has created an index of its data, it’s very easy for users to query that data in real-time such as searching for a specific piece of information, intellectual property, research or insight in a huge data estate.

For some sectors, like legal, pharmaceuticals and technology companies, data discovery can save valuable time and resources in eDiscovery.

Read how a global pharmaceutical company use data discovery to save 7 figure sums in litigation...

Data discovery supports compliance  
Compliance to data protection legislation is never a tick-box exercise. It must be ongoing and operationalised because data estates are always changing and expanding.

Under data protection and privacy legislation, like GDPR, organisations have certain legal responsibilities for their data. But every business also has contractual obligations to their clients who want to know what data they have, and what they’re doing with it.

With complete oversight of the data estate, data discovery enables organisations to satisfy both requirements. Not only does data discovery demonstrate that an organisation has good data governance, it also makes certain processes, like subject access requests, effortless – simply run the query and every result is instantly returned.

Read how Arrow Global uses data discovery for compliance… 

Data discovery powers the organisation 
Because data discovery technology can classify data down as far as the metadata level, it enables immediate search results with virtually no latency, which makes data exploration both rapid and complete. Now, as well as protecting data, organisations can commercialise it for competitive gain.

Massive scalability, combined with speed and data ingestion, mean that organisations gain the ability to interrogate and analyse content-rich datasets in bulk, to distil valuable insights. Furthermore, when pattern matching and machine learning are added to the mix, it improves the contextual understanding of the information, which in turn improves search accuracy.

Data discovery with Exonar 

Good data discovery software should continuously index every aspect of the data on a network, or within externally hosted cloud systems, so that it is always up to date.

Exonar is unique in the way it discovers data and creates an always-up-to-date index of everything. We think of Exonar like a reference book. Rather than start at the beginning every time to search for a precise piece of information (assuming the user even knows what this piece of information is), Exonar unlocks the power of the index. Now every piece of data is identified and recorded, with the technology constantly re-indexing the data to identify what’s changed, what’s new and to understand the context of the data.

Content and meta descriptions are added to enrich the index. And then it’s augmented through: 

  • Pattern matching: resolves number and text patterns to identify information, like credit card numbers, government IDs, or phone numbers.  
  • Natural language processing: understanding and extracting the context of words, such as names, ethnicity and places.  
  • Machine learning: identifies or resolves additional contexts, such as topics and types of document, like contracts or CVs.  

Exonar is the layer in the technology stack that allows organisations to get the most out of their enforcement technologies. Because Exonar is focused on discovering data at scale, it’s specifically built to plug into third-party systems to carry out the appropriate remediation actions, so users retain full control of their data estate and reduce the security threat.  

You can find out more about Exonar Reveal here.

The best way to discover the power of data discovery for your business is to see it in action. But unlike other technology companies who prefer to plug in dummy data, we’re happy to show you what Exonar can do with your real data. 

Book a demo or talk to us today and see Exonar in action!

References

  1. https://www.cio.com/article/3406806/ai-unleashes-the-power-of-unstructured-data.html 
  2. https://www.forbes.com/sites/bernardmarr/2019/10/16/what-is-unstructured-data-and-why-is-it-so-important-to-businesses-an-easy-explanation-for-anyone/?sh=aee7ba715f64 
  3. https://www.forbes.com/sites/rkulkarni/2019/02/07/big-data-goes-big/?sh=477fd7c420d7 
  4. https://blog.exonar.com/news-and-opinion/infographic-it-professionals-disagree-over-their-ability-to-keep-businesses-secure