14 Aug 2020

Cluster similar documents with HighQ AI

Product Filter HighQ Collaborate
Product Area Filter AI Hub, Files

Contract clustering uses AI to group similar documents. This feature allows you to identify contracts that are, for example, part of a series of revisions or branches of the same original contract, and helps to clarify the history of the signed and executed contract.

Similar to contract deviation analysis, HighQ AI bases clustering on a deviation score. You can adjust the acceptable deviation to reflect changes that are typical in your workflow. You may organise clustered files as required, for example, storing signed contracts in a contract 'database' for quick reference and security.

Clustering similar documents can also find duplicate copies of a contract and allow you to easily assign a contract template to further analyse revisions.

Please do not use clustering on sites that contain placeholder files, as they can interfere with the clustering process.

Opening the Clusters window

If you are a site admin, click Admin, then under AI Hub choose Clusters to open the Clusters window. If this option is not available, check your AI Hub configuration to enable file clustering.

When you first open the Clusters Window, the list of files will be empty.

Setting a deviation level

Adjust the deviation level to control how similar the files must be to be grouped together. If you select a deviation that is 0, or close to 0, then the files must be very similar, if you choose a higher number then more variation will be included. 

For most purposes, start with a deviation level of 30; this should give an indication of how varied your files are. You may adjust the value after the first analysis.

Click Save and cluster to start the analysis.

A Cluster window confirms how many files will be analysed. Click Cluster to continue.

All Word and PDF documents on your site will be analysed. You can chose to ignore a selction of documents after they are added to the list of clusters.

As the analysis may take some time, a status message is displayed in the top navigation bar:

If you need to stop or change the analysis, click Cancel in the Clusters window. You may then change the deviation and restart the process.

When the results are ready, a list of documents is shown.

Click on the arrow next to a clustered document to see the files in that cluster, each with a reason and a percentage score for the match:

The top-level 'representative' file is often the signed contract.

The Template column shows which standard template the file is linked to.

Contracts and templates must be in Word, PDF or TXT format. 
If a Word document is compared to a PDF equivalent, then the match may be lower than expected (i.e. more changes) as differences in the underlying file formats may be included as changes to the text.

Exact matches

If a cluster includes an Exact match, then it is likely that the files are identical. This gives you an opportunity to manage duplicate files.

New documents

Select Save and Cluster again to include any documents added to the Files module since the last analysis.

This also removes any manual changes; except documents that have been ignored (see below), which will not be included in any future clusters.

Viewing a document from the Cluster window

Click on a document to open it in the document viewer. Additional actions are available in the document viewer window:


Actions in the Cluster window

Select one or more files in the Cluster window and click Action to manage the files.

  • Move or copy - move or copy the file to an existing folder in the Files module.
  • Set as representative - select a document then choose this option to define that file as the representative document. 
  • Start new cluster - move the document out of a cluster. It moves to the top level in the cluster list.
  • Move to another cluster - select to move files that have not been clustered, or have been clustered with inappropriate files, to a different cluster. Select a cluster and click Move to other cluster.
  • Ignore this file - remove the selected document from the list of clusters.
  • Assign contract template - assign a template to the document. Click Save and analyse to apply the change.

If you select the representative document for a cluster and then Assign contract template, then all documents in that cluster are also assigned that template.

If a document is moved, or the representative document changes, then the percentage match will be reset to zero as no comparison has been made for the new relationship.

System and site admin

If the AI Hub is enabled on your instance, Clustering is normally enabled by default for the instance, but is off for each site. If you want to enable or disable the feature, or check the settings, both system and site options are provided.

If clustering is not available on your instance, check with HighQ support.

System admin

The HighQ AI Hub must be enabled and Enable file clustering must be ON in the Third party services section of System AdminSystem Settings to enable file clustering on your instance.

Site admin

At the site level, enable or disable AI file clustering in Admin, AI Hub, Configure, Advanced settings.


Was this article helpful?