2023: Update 1
The original post is quite dated and has not kept pace with the innovation we have been doing in this space for last 1 year. LLMs power a number of different contract review and drafting usecases inside ContractKen. There are 4 major initiatives within our engineering team around LLMs:
Older post:
Let us start with opening a sample contract, in MS Word:
Here is a Merger Agreement between two media companies, focused on a variety of issues, transactions, etc. This is a massive document spanning 82+ pages, not including a large number of exhibits & schedules.
Typically, the execution of such a contract is the result of months, if not years of contracting work between all parties involved. Easy to imagine the amount of drafting, review, and iterations that such a large agreement would take.
Reviewing such a large contract is surely not for the weak-hearted or the impatient! This is where an area of AI called Natural Language Processing (NLP) steps in.
Clause 1:
During the Term and for a period of two years thereafter, or for a period of seven years from the date of creation of the Records (whichever is longer) the Supplier shall keep full, true and accurate Records to show compliance with its obligations under this Agreement together with any other records that are required by any professional rules of any regulatory body which apply to the activities of the Supplier or as may from time to time be agreed in writing between the Company and the Supplier.
Clause 2:
During the Term and for a period of two years thereafter, or for a period of seven years from the date of creation of the Records (whichever is longer) the Supplier shall keep full, true and accurate Records to show compliance with its obligations under this Agreement together with such other records as may from time to time be agreed in writing between the Company and the Supplier.
All of this functionality has multiple NLP models working in unison in the background. However, there are two broad types of algorithms deployed - Pattern Recognition & Deep Learning.
Let's take a look under each one's hood.
We use algorithms like K-Nearest Neighbors(KNN) to recognize patterns in training data. Following (oversimplified to 3 dimensions) diagrams show how a pattern recognition algorithm solves the (relatively) easier problem of identifying contract metadata
KNN is a type of algorithm known as 'Unsupervised Learning' - i.e. the machine will automatically detect patterns of similarity or dissimilarity (across n-dimensions) and sort the data points out into various 'clusters'. In this example, after our data pipelines pre-process and tokenize the data in the training documents dataset and feed it into this algorithm, the model creates 3 distinct clusters - belonging to the key terms like ‘Governing Law’, and ‘Effective Date’ & ‘Expiry Date’.
When a new data point is fed into the system (in production use), the model calculates the distance of the new data point from the center (in an n-dimensional space) of each of the clusters that the model has identified. The model will assign this new data point to the nearest cluster.
This is an over-simplified example of how basic pattern recognition algorithms can be deployed to detect contract terms on the basis of their meanings, not through a keyword search type of approach
There are broadly 2 types of models being used here:
Q&A
This is to detect the presence of key contract clauses and identify their location in the document. We are leveraging the SQuAD approach to fine-tune several pre-trained language models using the HuggingFace Transformers library. Because the prediction task is similar to extractive question and answering tasks, we use the QuestionAnswering models in the Transformers library. Each ‘Question’ identifies the label category (clause) under consideration. This technique is called ‘Transfer Learning’ in ML.
Take these sentences, for example, 1, “I like to play football” and 2, “I am watching the Julius Cesar play”. The word ‘play’ has different meanings. These models use neural networks as their foundation and consider the semantics of the text.
The model returns the precise location of the clauses (ones which are detected) in the document (starting position and length), which is then used by our word add-in to highlight the relevant text.
We’re using a Transformers-based DL algorithm to detect the presence and location of many key commercial clauses and terms. To understand more about Transformers, the following article is perhaps the best out there: https://jalammar.github.io/illustrated-transformer/
Primary task formulation: The model should predict which substrings of the contract document are related to each clause label category. The model learns the start and end token positions of the substring. This formulation is built from SQuAD 2.0 setup.
The algorithm that we’re using is BERT (short for Bidirectional Encoder Representations from Transformers). This is the original BERT paper created by Google research. We have used multiple variations of BERT were used to optimize the overall Precision & Recall scores, and continue to test variations of simple algorithms, new data, and model parameters to get higher coverage (i.e. more terms/clauses getting predicted), better accuracy, and superior inference performance.
Named Entity Recognition (NER)
This is to identify key business entities in the contract document. For e.g. ‘Parties Names’, Financial values, etc. At ContractKen, we’ve deployed multiple variants of the NER algorithm for specific commercial entities.
This is a fast-changing domain with ever larger and better language models coming into the open source domain every month. At ContractKen, we’re excited and committed to deploying the best-in-class technology to solve a wide variety of challenging problems with the document review process.
Further Readings:
More Like This
Day 6 of 20 - Contract Summaries Part II
Most comprehensive write-up on Contract Summaries - what are these, how to create them, how to use AI to generate different types of contract summaries and what are the key benefits and considerations in using contract summaries .
Read MoreDay 5 of 20: What is an AI Copilot?
We discuss why Copilot is the right paradigm to infuse AI into knowledge work. We delve deeper into factors driving the development and adoption of Copilots in all knowledge work areas like coding, analytics, contract review and drafting, copywriting, etc.
Read MoreCopy Paste Contract Clauses
Explore contract law's impact on partnerships, risks tied to clauses, and real-life cases. Leverage ContractKen's robust clause library for streamlined contract management, risk mitigation, and successful partnerships.
Read More