Leveraging text analysis techniques within ControlTower

With the rise of technology like ChatGPT, people around the world are interacting with text in a way they have never done before. Users can use an interface which sits on top of a complex algorithm which returns a response based on the prompts provided. ControlTower is no different to this, but it has a specific task in mind.

ControlTower is an accelerator developed by Dufrain that can help organisations quickly and accurately scan through large amounts of data, particularly sensitive data. It does this to ensure compliance with various regulations and policies as well as provide benefits across the wider business. ControlTower can extract important information from unstructured data using advanced text analysis and allow organisations to make informed decisions.

There are various methods in which files can be analysed and ControlTower groups them into two categories: metadata and sensitivity, which we will break down in this article.


Assessing file properties

How does ControlTower assess file properties?

The first stage of the process and one of the key features of ControlTower is its metadata group of functions. It is the first component a file reaches when we do a scan and can also be the last, if a basic scan is selected in the configuration.

The metadata scan looks at the properties of files being scanned, such as date creation, last modified and file extension, and analyses this information to ensure compliance with various business and compliance rules.

This information is passed through our custom-built metadata engine which will inform the Data Owner whether the file(s) are within compliance or not.


Assessing file contents

How does ControlTower assess file contents?

The second stage to the scanning process is our sensitivity group of functions which look at the contents of each file and performs a variety of analysis techniques, using Natural Language Processing (NLP). Dufrain has developed a sophisticated algorithm which takes the results of these techniques and predicts the sensitivity & classification of each file. ControlTower can identify sensitive information in all types of unstructured data such as documents, images, and videos, and automatically classify them accordingly. ControlTower can use the following techniques:
Regular expressions & keyword matching

  • Regular expression and keyword pattern matching is a technique used to identify specific patterns or keywords that indicate the presence of sensitive information, such as credit card numbers or Social Security numbers. This allows ControlTower to flag files that contain sensitive information and ensure they are managed appropriately.
  • Expressions can be used where the exact match (keyword) is not known.

Named entity recognition (NER) 

  • Named Entity Recognition (NER) is used to identify and classify named entities within the text, such as people, organisations, and locations using artificial intelligence.
  • This technique helps ControlTower identify potential compliance issues and sensitive information. This is used in conjunction with our expressions and patterns to increase accuracy and confidence.

Sentiment analysis

  • This technique identifies the emotional tone or sentiment of a document, which can be useful in understanding its context and identifying potential issues.
  • Our sentiment analysis engine can be used for compliance as well as other business needs, such as assessing customer reviews.

Text summarisation

  • This technique identifies the most important information within a document and summarises it in a shorter form. This is helpful in identifying the main topics and themes of a document and determining whether it contains sensitive information.
  • A great use case is where Dufrain can use this information to feed into vital repositories such as Data Dictionaries, Catalogues and Inventories. This speeds up employees’ efforts in searching through large documents for relevant information.

OCR

  • OCR converts scanned images into text, which then can be analysed by ControlTower’s other techniques. This is useful in identifying sensitive information in scanned documents, such as invoices or contracts where the document follows a fixed structure or layout.

Conclusion

As you can see, ControlTower is incredibly powerful with a combination of techniques working in harmony to produce great outcomes for our clients. By leveraging these techniques within ControlTower, organisations can automate and streamline compliance and data protection efforts, reducing the risk of breaches and violations.

To discover more about Dufrain’s leading accelerators, read our recent overview.

[km-cta-block padding=20 block-classes=”has-dark-teal-background-colour has-white-colour” label=”Contact us for a free data health check” ]

Gain control over your data

Contact Dufrain today or call us on 0800 130 3656 to schedule a demonstration or find out more about ControlTower

[km_button link=”https://www.dufrain.co.uk/contact/” classes=”cta-2″]Contact us[/km_button] or [km_button link=”tel:08001303656″ classes=”cta-2″]Call us on 0800 130 3656[/km_button][/km-cta-block]