Operational Mortgage Data Pipeline

What it is, how it works, and how to use it in mortgage operations.
Vova Pylypchatin
CTO @ MortgageFlow

Hey there,

Welcome to the new subscribers who joined our mortgage tech newsletter last week.

Today, let's talk about operational mortgage data and data pipelines.

At this point, Data is a buzzword in operations. Everyone has heard about the importance of being data-driven and all the lofty promises that come with it.

But how can this abstract concept be applied to improve mortgage operations and move the bottom line?

In this issue, I'll share an overview of operational data and try to make its application to mortgage operations more practical.

The fact that the term Data is overused doesn't make it less accurate. The mortgage industry is highly competitive. Lenders who know how to leverage their operational data will have an edge.

The issue below consists of 3 parts.

In the 1st part, you can find an overview of what operational data is:

  • What's data and practical data definition
  • What operational data is, and how it's different
  • What is software-ready data and the operational data gap

In the 2nd part, you can find my analysis of how it applies to mortgage companies:

  • What's unique about operational mortgage data
  • How operational data can be used in mortgage operations

And in the 3rd part, you can find info on how to collect operational data:

  • What data engineering and operational data pipelines are
  • How an operational data pipeline works
  • How to build an operational data pipeline

1. OPERATIONAL DATA

What’s data and practical data definition

The term “data” has been used for a while. Way before computers existed.

There are plenty of definitions of data out there. Some are pretty difficult to understand.

I find it easier to grasp its practical applications by distilling it down to:

Data is raw, unprocessed facts about the world.

When we observe an event or an action, we’re collecting raw data.

We process data to extract insights, make decisions, and perform actions.

Some data is recorded, and some is not.

We collect a lot more data than we record.

Data can be recorded in text, numbers, dates, images, audio, video, etc.

Data records can be in digital or physical formats.

Digital data records can be processed by software.

Data is one of the three core layers of most software applications.

The software transforms data, searches data, updates data, and uses data to perform actions.

Digital data can be classified into different types based on the way it is represented in computer systems:

  • Structured Data: This is highly organized and easily understandable by machine language, such as databases. It follows a schema that defines the structure of the data, including tables, fields, and relationships between fields. Examples include SQL databases and spreadsheets.
  • Unstructured Data: This type of data does not follow a specific format or structure, making it more complex for data processing algorithms to understand and analyze. Examples include text documents, videos, audio files, and images.
  • Semi-structured Data: Semi-structured data is a form that does not reside in a relational database but has some organizational properties that make it easier to analyze than unstructured data. Examples include JSON and XML files.

What’s operational data, and how is it different

Let’s have a look into what makes data operational.

If Data is unprocessed facts about the world.

Then, Operational data is unprocessed facts about day-to-day business operations.

This Data includes customer interactions, actions within processes, and external facts brought into the business from customers, employees, etc.

Operational data is essential for businesses to carry out their core functions.

Facts Operational data can be characterized by two types of facts:

  • Facts about events → Event data
  • Facts about entities → Entity data

There is a significant difference in how these data types are collected, stored, and processed.

Understanding this distinction is crucial for effectively using it.

Below is a more detailed overview of each type, along with examples.

Event data

Definition:

Event data refers to facts describing occurrences of an event or action at a specific time. This data type records the interactions, transactions, or events over time.

Characteristics:

  • Time-based: Event data is inherently time-based, describing when an event occurred.
  • Action-oriented: In operations, event data is action-oriented and usually represents a completed task within a specific process.
  • Continuous: Event data is generated continuously as new events occur, leading to a dynamic dataset that grows and changes over time.
  • Immutable: Events are immutable by nature, meaning that once an event occurs, nothing about it changes.

Examples:

  • Application taken
  • Loan estimate sent
  • Loan taken into processing
  • Loan status changed from "under review" to "approved"
  • Loan funded

Facts that describe an event:

An event can be characterized by the following facts:

  • Type of the Event: What happened
  • Time of the Event: When it happened
  • Context of the Event: Who initiated or performed the action, and who or what was affected
  • Properties of the Event: Characteristics unique to this particular event

Purpose:

Event data is usually used for:

  • Analyzing behaviors, patterns, or trends over time
  • Understanding changes in entity data over time
  • Performing actions in response to specific events

Entity data

Definition:

Entity data refers to facts that describe the properties of objects, people, or concepts existing within business operations over time. This type of data characterizes the attributes or characteristics of these entities.

Characteristics:

  • Timeless: While facts about an entity can change, they remain accurate over time.
  • Descriptive: Entity data describes static facts about objects, people, or concepts, such as a person's name, a property's address, or a loan amount.
  • Discrete: Entity data does not continuously grow or change with every passing moment. Instead, it is updated or added in increments based on events that can be distinctly counted or observed.
  • Mutable: Unlike events, the properties of an entity can change after being recorded. For example, a loan amount or a living address might change.

Examples:

  • Borrower information (e.g., name, income, credit score).
  • Property details (e.g., location, valuation, property type).
  • Loan characteristics (e.g., loan amount, interest rate, loan term).

Facts that describe an entity:

  • Type of the Entity: What object or concept the data describes
  • Properties of the Entity: Characteristics unique to this particular entity

There are few shared facts between entities, just between events.

Types of facts are highly dependent on the entity you're describing.

Purpose:

Entity data provides the context needed to interpret event data and perform actions.

What’s software-ready data and operational data gap

Operational data, in its broadest sense, refers to facts about day-to-day business operations.

Software-ready operational data consists of facts about day-to-day business operations that software applications can directly process without further manipulation or transformation.

For data to be effectively processed by operational software, it should be:

  • Recorded: Captured in a manner that allows it to be processed later.
  • Digital: Able to be processed by a computer
  • Structured: Recorded in a format that is easily processed by software
  • Real-time: Available for processing near to the moment the fact was observed.

The value of software-ready operational data lies in its ability to enable organizations to automate workflows, make better decisions using analytics, and build more effective tools.

As is true for “data” generally, only a fraction of “operational data” is recorded.

An even smaller fraction is recorded in a way that software can directly process.

This discrepancy is what I refer to as the operational data gap.

An Operational data gap is the difference between operational data produced and operational data ready for software processing.

The smaller the gap, the more operational data the software can process.

The more data available for software processing, the higher the potential for automation and analytics.

2. OPERATIONAL DATA AND MORTGAGE

What's unique about operational mortgage data

Operational data is present in the day-to-day operations of any business.

So, what's different about operational data in mortgage companies?

Mortgage companies handle the same basic data types, including entities and events, as other businesses. However, the main differences lie in:

  • Data sources: The origins of where the data is collected and recorded.
  • Data format: The structure in which the data is recorded.
  • Entity types: The specific types of entities that are collected.
  • Event types: The specific types of events that are gathered.

Let's explore what's unique about each of these aspects.

Mortgage data sources

So, the first difference in operational mortgage data is where data is recorded and produced.

Each industry has its own software suite that they rely on to run day-to-day operations.

This software is usually where operational data is produced and captured

In the case of the mortgage industry, these are usually:

  • CRM (e.g., Salesforce, Jungo)
  • Point of Sale software (e.g., Floify, Blend)
  • Loan Origination System (e.g., Encompass, LendingPad)
  • Loan Servicing System (e.g., Sagent, Fintech Market)
  • Email, SMS, Messengers (e.g., Gmail, Whatsapp)
  • Credit reporting agencies (e.g., Xactus)
  • Open banking data providers (e.g., Plaid)
  • Property Appraisal service providers
  • Employment Verification Services
  • etc.

Mortgage data format ratio

The second difference is in the format in which operational data is recorded.

Every industry operates with the same formats, e.g., documents, XML files, and structured data.

The difference is the ratio which formate is more dominant than others.

In mortgage, the most common data formats are:

  • Unstructured data, e.g., documents, image
  • Structured data that’s stored within existing software systems

Mortgage entities types

The types of entities businesses interact with in day-to-day operations are mainly unique to each industry.

Therefore, operational data in the mortgage industry deals with entities specifically relevant to its domain.

Here are some examples of entity types unique to the mortgage industry:

  • Borrower
  • Property
  • Loan Application Title Holder
  • Loan Application Asset
  • Loan Application Liability
  • Loan Application Income
  • Loan Applicant Employment
  • Loan Application Address
  • Mortgage Loan
  • Branch
  • Loan Product
  • Payments
  • Lender
  • Lien
  • Rate lock
  • etc.

Mortgage event types

In the same way, entity types are unique to each industry, and event types are, too.

Mortgage operational data describe events specific to the mortgage industry, such as:

  • Offer accepted
  • Application taken
  • Loan application is taken into processing
  • Appraisal ordered
  • Loan application submitted to UW
  • Loan application approved
  • Rate locked
  • etc.

How to use operational data to improve mortgage operations

Mortgage companies already utilize operational data daily.

The real question is how reducing the software-ready data gap can enhance mortgage operations.

More software-ready operational data enables companies to leverage more software and technology.

Implementing software or technology aims to increase operational efficiency, reduce human error, save time, and lower costs, all while ensuring compliance with regulatory standards.

So, use cases of operational data in mortgage operations are defined by HOW software uses this data to achieve it.

I see that software can help increase operational efficiency in 3 ways:

  • Making better decions by using operational analytics
  • Reducing the number of manual actions through workflow automation
  • Increasing manual action efficiency with better operational tools

Below is a deeper overview of each of them.

Operational mortgage analytics

Operational mortgage analytics refers to applying data analysis tools and methodologies specifically to the operational aspects of mortgage lending.

Operational mortgage analytics uses operational data to answer questions about workflow efficiency, process bottlenecks, and staff productivity.

Insights from operational mortgage analytics can inform decision-making and facilitate operational improvements.

Operational analytics can then measure the impact of the changes, creating a continuous loop of operational improvements.

If you want to learn more about operational mortgage analytics, here’s an in-depth article.

Operational mortgage automation

Automation refers to using software to perform tasks that typically require human intervention.

Operational automation refers to using software to automate routine and repetitive tasks in business operations.

Operational mortgage automation specifically applies automation to the operational aspects of mortgage lending.

Operational mortgage automation uses operational data to trigger and perform an action.

Real-time activity primarily triggers an action and transfers data between steps in workflows and multiple workflows.

While entity data provides the required data for automation to perform an action.

The value of operational automation is that it reduces the number of manual tasks within a workflow that humans need to complete.

If you want to learn more about operational mortgage automation, here’s an in-depth article.

Operational mortgage tools

An operational mortgage tool is software designed to facilitate, execute, or manage specific tasks within mortgage operation workflows.

Unlike automation, a tool is a facilitator of work designed to be used by humans to accomplish tasks more effectively. Think digital hammer.

Operational mortgage tools enable users to access, interact, and manage operational data.

Individuals use tools to perform tasks more efficiently, accurately, or effectively than manually.

If you want to learn more about operational mortgage tools, here’s an in-depth article.

3. HOW TO COLLECT OPERATIONAL MORTGAGE DATA

What data engineering and operational data pipelines are

Data engineering is a subset of software engineering focused on building infrastructure for collecting, storing, and transforming digital data.

In operations, a primary use case for data engineering is to bridge the operational data gap.

An Operational data gap is the difference between operational data produced and operational data ready for software processing.

Data engineers close operational data gaps through operational data pipelines.

The operational data pipeline is a series of operations that automate raw operational data collection, storage, and transformation into a format ready for software processing.

How an operational data pipeline works

An operational data pipeline systematically moves data from its sources to the destinations where software can utilize it for operational purposes, referred to as data consumers.

In operational software, data consumers include analytics platforms, automation systems, and internal tools.

A modern operational data pipeline consists of multiple:

  • Data stores: Locations where data is stored.
  • Processing layers: Systems that move and transform data.

Data moves through these Data stores in operational pipelines:

  1. Data source: The original location where operational data is stored or produced.
  2. Data lake: Storage for unprocessed data
  3. Data warehouse: Storage for processed data

The processing layers in the data pipeline facilitate the movement of data:

  • Extraction layer (Data source → Data lake)
  • Transformation layer (Data lake → Data warehouse)
  • Semantic layer (Data warehouse → Data consumers)

Below is a detailed overview of the data stores and processing layers.

Data sources

Data sources are the origins from which data is gathered.

Data is collected from various operational sources, such as applications, databases, flat files, APIs, or real-time data sources like IoT devices (sensors).

In the mortgage industry, shared data sources include:

  • CRM (e.g., Salesforce, Jungo)
  • Point of Sale software (e.g., Floify, Blend)
  • Loan Origination System (e.g., Encompass, LendingPad)
  • Loan Servicing System (e.g., Sagent, Fintech Market)
  • Email, SMS, messages
  • Credit reporting agencies (e.g., Xactus)
  • Open banking data providers (e.g., Plaid)
  • Property Appraisal service providers
  • Employment Verification Services

Extraction layer

An extraction layer is a node in a data pipeline that facilitates data movement from the Data source to the Data lake.

This layer performs two primary operations:

  • Extracting data: Retrieving data from the original data sources.
  • Loading data: Depositing the extracted data into the data lake.

An extraction layer usually consists of the Source and Destination connectors.

Source connectors know how to extract data from the specific Data source (e.g., Encompass).

The destination connector knows how to load to the specific Data lake (e.g., Big Query).

In most cases, the extraction layer doesn’t transform the extracted data. Instead, it replicates the data as-is into the Data lake.

Data lake

A data lake is a data storage designed to store structured and unstructured data.

It serves as the first destination for data, where the Extraction layer loads data extracted from Data sources.

Unlike a data warehouse, you can load data as-is without first transforming it.

Having an intermediary data store like a data lake before transforming data enables companies to:

  • Bypass limitations of data querying in data sources
  • Reduce the amount of upfront work to start recording data
  • Prevent the loss of raw data during data transformation
  • Reduce time wasted on data transformation that’s never used

A data lake stores the extracted data in its native format. This means that data lakes can handle many data types and structures—from CSV and JSON files to more complex and unstructured data like images and documents.

Transformation layer

The transformation layer in a data pipeline is the stage where data undergoes processing to convert from its raw form into a format ready for consumption by operational software.

The transformation layer is critical for ensuring that data is accurate, consistent, and in a usable format for operational software.

Here are the primary operations that happen in the transformation layer:

  • Conversion: Converting data between different formats. A typical example of this is the transformation of documents (unstructured data) into structured data.
  • Cleaning: Removing inaccuracies, duplicates, or irrelevant data to ensure the dataset's quality. This step might involve fixing errors, dealing with missing values, or removing outliers.
  • Normalization: Converting data to a standard format or scale to ensure consistency across the dataset. For example, dates might be standardized to a YYYY-MM-DD format, or currency values might be converted to a single currency.
  • Enrichment: Adding additional data points to the dataset. This could involve adding geographic details based on postal codes, appending customer segmentation data, or calculating new metrics.
  • Data modeling: Structuring data into a format optimized for the specific needs of analysis or application. This might involve flattening nested structures or the creation of new entities combined from multiple datasets.
  • Integration: Combining data from various sources into a single, cohesive dataset. This process often requires resolving discrepancies between similar data sources and aligning data models.

Data warehouse

A Data warehouse is a data storage designed to store structured data.

A Data lake is 2nd data destination where the Transformation layer loads transformed data.

Unlike Data lake, you can’t load data as-is; data warehouses require data to be structured and follow a predefined schema.

The strict requirements for data schema in a data warehouse ensure that the data is in a format immediately usable by operational software.

Data warehouses are highly optimized for fast query performance on structured data, making them ideal for operational software.

Semantic layer

A Semantic layer controls how data moves out of the data warehouse.

You can think of a Semantic layer as a "kitchen" in a "data restaurant."

It ensures that when Data consumers order a "Data burger," they receive exactly what they expect, following a consistent recipe and using the same ingredients, without needing to venture into the restaurant's kitchen.

A semantic layer usually consists of 4 components:

  • Data modeling is akin to defining your data menu and then preparing menu items based on specific recipes.
  • Caching is similar to preparing popular menu items in advance. Caching stores frequently accessed information to speed up future requests.
  • Access control ensures that data consumers are permitted to order certain menu items.
  • APIs are similar to placing an order through a server instead of directly asking a Chef. APIs function similarly by providing a standardized interface for accessing the data services without needing direct access to the underlying systems.

How to build an operational data pipeline

An operational data pipeline development should start with identifying what you are trying to achieve with operational data. The use case of the data is the biggest thing that defines what data you need to collect and how.

Here’s a high-level overview of the process:

  1. Identify the use case for the data (e.g., automation, analytics, app, etc.)
  2. Define what data is required to implement this use case
  3. Identify data sources where you can get this data from
  4. Extract data from the data sources and load it into a data lake
  5. Transforms data from the data lake in the format you need
  6. Set up a Symantec layer so data consumers could access the data

There are plenty of data engineering products for each of the stages that speed up the process of development:

  • Extraction layer: Sequin/Fivetran/Airbyte
  • Data lake: AWS S3/Snowflake/MongoDB/BigQuery
  • Transformation layer: DBT
  • Data warehouse: ClickHouse/PostgreSQL/MongoDB
  • Semantic layer: Cube/Weld

An in-depth article on building operational mortgage apps is coming soon.

What’s next

I hope this post gave you insight into Operational Mortgage Data and how it can be used in mortgage operations.

If you’d like to stay on top of the latest mortgage tech and how it can be applied to mortgage operations, consider joining our mortgage technology newsletter.

MORTGAGE TECH NEWSLETTER

Discover how technology can assist your mortgage company in reaching its strategic goals

A weekly newsletter about leveraging data, custom software, and modern technology to drive efficiency in mortgage operations.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Written by
Vova Pylypchatin
CTO @ MortgageFlow

I’m a software consultant with background in software engineering. Currently, I run a mortgage software consulting and development company that builds custom tools and automation solutions for mortgage lenders.