Hey there,
Welcome to the new subscribers who joined our mortgage tech newsletter last week.
Today, let's talk about operational mortgage data and data pipelines.
At this point, Data is a buzzword in operations. Everyone has heard about the importance of being data-driven and all the lofty promises that come with it.
But how can this abstract concept be applied to improve mortgage operations and move the bottom line?
In this issue, I'll share an overview of operational data and try to make its application to mortgage operations more practical.
The fact that the term Data is overused doesn't make it less accurate. The mortgage industry is highly competitive. Lenders who know how to leverage their operational data will have an edge.
The issue below consists of 3 parts.
In the 1st part, you can find an overview of what operational data is:
In the 2nd part, you can find my analysis of how it applies to mortgage companies:
And in the 3rd part, you can find info on how to collect operational data:
1. OPERATIONAL DATA
The term “data” has been used for a while. Way before computers existed.
There are plenty of definitions of data out there. Some are pretty difficult to understand.
I find it easier to grasp its practical applications by distilling it down to:
Data is raw, unprocessed facts about the world.
When we observe an event or an action, we’re collecting raw data.
We process data to extract insights, make decisions, and perform actions.
Some data is recorded, and some is not.
We collect a lot more data than we record.
Data can be recorded in text, numbers, dates, images, audio, video, etc.
Data records can be in digital or physical formats.
Digital data records can be processed by software.
Data is one of the three core layers of most software applications.
The software transforms data, searches data, updates data, and uses data to perform actions.
Digital data can be classified into different types based on the way it is represented in computer systems:
Let’s have a look into what makes data operational.
If Data is unprocessed facts about the world.
Then, Operational data is unprocessed facts about day-to-day business operations.
This Data includes customer interactions, actions within processes, and external facts brought into the business from customers, employees, etc.
Operational data is essential for businesses to carry out their core functions.
Facts Operational data can be characterized by two types of facts:
There is a significant difference in how these data types are collected, stored, and processed.
Understanding this distinction is crucial for effectively using it.
Below is a more detailed overview of each type, along with examples.
Definition:
Event data refers to facts describing occurrences of an event or action at a specific time. This data type records the interactions, transactions, or events over time.
Characteristics:
Examples:
Facts that describe an event:
An event can be characterized by the following facts:
Purpose:
Event data is usually used for:
Definition:
Entity data refers to facts that describe the properties of objects, people, or concepts existing within business operations over time. This type of data characterizes the attributes or characteristics of these entities.
Characteristics:
Examples:
Facts that describe an entity:
There are few shared facts between entities, just between events.
Types of facts are highly dependent on the entity you're describing.
Purpose:
Entity data provides the context needed to interpret event data and perform actions.
Operational data, in its broadest sense, refers to facts about day-to-day business operations.
Software-ready operational data consists of facts about day-to-day business operations that software applications can directly process without further manipulation or transformation.
For data to be effectively processed by operational software, it should be:
The value of software-ready operational data lies in its ability to enable organizations to automate workflows, make better decisions using analytics, and build more effective tools.
As is true for “data” generally, only a fraction of “operational data” is recorded.
An even smaller fraction is recorded in a way that software can directly process.
This discrepancy is what I refer to as the operational data gap.
An Operational data gap is the difference between operational data produced and operational data ready for software processing.
The smaller the gap, the more operational data the software can process.
The more data available for software processing, the higher the potential for automation and analytics.
2. OPERATIONAL DATA AND MORTGAGE
Operational data is present in the day-to-day operations of any business.
So, what's different about operational data in mortgage companies?
Mortgage companies handle the same basic data types, including entities and events, as other businesses. However, the main differences lie in:
Let's explore what's unique about each of these aspects.
So, the first difference in operational mortgage data is where data is recorded and produced.
Each industry has its own software suite that they rely on to run day-to-day operations.
This software is usually where operational data is produced and captured
In the case of the mortgage industry, these are usually:
The second difference is in the format in which operational data is recorded.
Every industry operates with the same formats, e.g., documents, XML files, and structured data.
The difference is the ratio which formate is more dominant than others.
In mortgage, the most common data formats are:
The types of entities businesses interact with in day-to-day operations are mainly unique to each industry.
Therefore, operational data in the mortgage industry deals with entities specifically relevant to its domain.
Here are some examples of entity types unique to the mortgage industry:
In the same way, entity types are unique to each industry, and event types are, too.
Mortgage operational data describe events specific to the mortgage industry, such as:
Mortgage companies already utilize operational data daily.
The real question is how reducing the software-ready data gap can enhance mortgage operations.
More software-ready operational data enables companies to leverage more software and technology.
Implementing software or technology aims to increase operational efficiency, reduce human error, save time, and lower costs, all while ensuring compliance with regulatory standards.
So, use cases of operational data in mortgage operations are defined by HOW software uses this data to achieve it.
I see that software can help increase operational efficiency in 3 ways:
Below is a deeper overview of each of them.
Operational mortgage analytics refers to applying data analysis tools and methodologies specifically to the operational aspects of mortgage lending.
Operational mortgage analytics uses operational data to answer questions about workflow efficiency, process bottlenecks, and staff productivity.
Insights from operational mortgage analytics can inform decision-making and facilitate operational improvements.
Operational analytics can then measure the impact of the changes, creating a continuous loop of operational improvements.
If you want to learn more about operational mortgage analytics, here’s an in-depth article.
Automation refers to using software to perform tasks that typically require human intervention.
Operational automation refers to using software to automate routine and repetitive tasks in business operations.
Operational mortgage automation specifically applies automation to the operational aspects of mortgage lending.
Operational mortgage automation uses operational data to trigger and perform an action.
Real-time activity primarily triggers an action and transfers data between steps in workflows and multiple workflows.
While entity data provides the required data for automation to perform an action.
The value of operational automation is that it reduces the number of manual tasks within a workflow that humans need to complete.
If you want to learn more about operational mortgage automation, here’s an in-depth article.
An operational mortgage tool is software designed to facilitate, execute, or manage specific tasks within mortgage operation workflows.
Unlike automation, a tool is a facilitator of work designed to be used by humans to accomplish tasks more effectively. Think digital hammer.
Operational mortgage tools enable users to access, interact, and manage operational data.
Individuals use tools to perform tasks more efficiently, accurately, or effectively than manually.
If you want to learn more about operational mortgage tools, here’s an in-depth article.
3. HOW TO COLLECT OPERATIONAL MORTGAGE DATA
Data engineering is a subset of software engineering focused on building infrastructure for collecting, storing, and transforming digital data.
In operations, a primary use case for data engineering is to bridge the operational data gap.
An Operational data gap is the difference between operational data produced and operational data ready for software processing.
Data engineers close operational data gaps through operational data pipelines.
The operational data pipeline is a series of operations that automate raw operational data collection, storage, and transformation into a format ready for software processing.
An operational data pipeline systematically moves data from its sources to the destinations where software can utilize it for operational purposes, referred to as data consumers.
In operational software, data consumers include analytics platforms, automation systems, and internal tools.
A modern operational data pipeline consists of multiple:
Data moves through these Data stores in operational pipelines:
The processing layers in the data pipeline facilitate the movement of data:
Below is a detailed overview of the data stores and processing layers.
Data sources are the origins from which data is gathered.
Data is collected from various operational sources, such as applications, databases, flat files, APIs, or real-time data sources like IoT devices (sensors).
In the mortgage industry, shared data sources include:
An extraction layer is a node in a data pipeline that facilitates data movement from the Data source to the Data lake.
This layer performs two primary operations:
An extraction layer usually consists of the Source and Destination connectors.
Source connectors know how to extract data from the specific Data source (e.g., Encompass).
The destination connector knows how to load to the specific Data lake (e.g., Big Query).
In most cases, the extraction layer doesn’t transform the extracted data. Instead, it replicates the data as-is into the Data lake.
A data lake is a data storage designed to store structured and unstructured data.
It serves as the first destination for data, where the Extraction layer loads data extracted from Data sources.
Unlike a data warehouse, you can load data as-is without first transforming it.
Having an intermediary data store like a data lake before transforming data enables companies to:
A data lake stores the extracted data in its native format. This means that data lakes can handle many data types and structures—from CSV and JSON files to more complex and unstructured data like images and documents.
The transformation layer in a data pipeline is the stage where data undergoes processing to convert from its raw form into a format ready for consumption by operational software.
The transformation layer is critical for ensuring that data is accurate, consistent, and in a usable format for operational software.
Here are the primary operations that happen in the transformation layer:
A Data warehouse is a data storage designed to store structured data.
A Data lake is 2nd data destination where the Transformation layer loads transformed data.
Unlike Data lake, you can’t load data as-is; data warehouses require data to be structured and follow a predefined schema.
The strict requirements for data schema in a data warehouse ensure that the data is in a format immediately usable by operational software.
Data warehouses are highly optimized for fast query performance on structured data, making them ideal for operational software.
A Semantic layer controls how data moves out of the data warehouse.
You can think of a Semantic layer as a "kitchen" in a "data restaurant."
It ensures that when Data consumers order a "Data burger," they receive exactly what they expect, following a consistent recipe and using the same ingredients, without needing to venture into the restaurant's kitchen.
A semantic layer usually consists of 4 components:
An operational data pipeline development should start with identifying what you are trying to achieve with operational data. The use case of the data is the biggest thing that defines what data you need to collect and how.
Here’s a high-level overview of the process:
There are plenty of data engineering products for each of the stages that speed up the process of development:
An in-depth article on building operational mortgage apps is coming soon.
I hope this post gave you insight into Operational Mortgage Data and how it can be used in mortgage operations.
If you’d like to stay on top of the latest mortgage tech and how it can be applied to mortgage operations, consider joining our mortgage technology newsletter.