In today’s digital world, data quality has become a top priority for enterprises due to increasingly stringent regulatory requirements and the need for improved operational efficiency. This is also crucial to successfully stay ahead of agile competitors, recalibrate quickly, and identify transformative opportunities early. These goals can be achieved only by implementing strong data governance and data quality tool.
In a previous article, we looked at the definition, challenges, and advantages of good data quality. However, it’s important to realize that data quality is much more complex than simply ensuring data is “right”. Data must be correct, consistent, complete, and timely. In this piece, we are going to dive deeper into the consistency dimension and why it is more important than enterprises may realize.
The current enterprise landscape is extremely complex, with thousands of fragmented and siloed systems constantly trying to interact with one other. Vast amounts of data are created, managed, and stored in each of these disparate applications and databases. Challenges arise when systems must interact with one another, but the data formats, taxonomies, and business rules are not consistent. Each application or disparate system in an enterprise represents its relevant data in a way that supports its specific function. In other words, each system processes information in its most optimal method for productivity, whether that is in real-time, end-of-day, synchronous, asynchronous, etc (and within its own business context).
Let’s look at a simple example. System A may store its data in a JSON format. When it passes its data along, system B may take this same piece of data, make enrichments or transformations and store it in an AVRO format. When system B passes the data along to system C (which may be a legacy application), system C cannot read the previous format and stores its data in an XML or CSV format. As a result, a lot of the schema information and data is flattened. As you can see, each of these three applications read and produce data in different formats. All three are representing the underlying object in the correct way, even though the formats of the data are different. Large enterprises have struggled with systems correctly reading different formats, and therefore, seek out solutions to make all data formats uniform.
As a result, it becomes difficult to ensure that data is consistent across the systems. Even if the data was passed from one of those systems to another perfectly, each of those systems also tend to use only the data they need, and then create new data from a business point of view. For example, as data moves throughout a financial institution, it is enriched by various business processes. These business processes span multiple organizational units, each of which typically has its own IT systems. For example, consider the front, middle, and back offices at a broker-dealer. Trade capture systems at the front office may capture a transaction (“trade”) with specific attributes (“economics”) in a specific format. Middle-office systems only use the data they need from the front office to perform functions like portfolio compressions (nettings), mark-to-market (MTM) calculations, risk exposure calculations, etc. They either create copies for each trade or annotate/enrich it with the relevant attributes (MTM, current exposure, and so on). As the trade progresses through the post-trade processing phase, it is important that front-to-back consistency for the trade record is maintained. This is done by ensuring a consistent representation of a transaction from inception to termination. Failure to do so makes the business process prone to processing errors, settlement delays, and potentially expose the organization to settlement, compliance, and regulatory risk.
It is not sufficient to ensure that the transactional data records have a consistent representation. The business logic and rules that create and enrich the data themselves need to be defined in a consistent manner as well. For example, at a broker/dealer, if a trade is valued using multiple valuation models in different systems, the result will be an inconsistent valuation or a misunderstanding of which valuation to use for a given purpose. This action may result in incorrect reporting for risk and compliance; hence the need for using a consistent trade valuation model that is labeled for its purpose. Other examples are risk calculation models, and liquidity calculation logic (all of which must have a consistent representation within a financial institution).
Achieving Consistency Through Normalization (Canonicalization)
So how do we ensure consistency of data, business logic, and rules? Normalization and canonicalization are two ways to accomplish that.
Let’s first look at normalization. Consider three transactions involving a customer we call Acme. Suppose the first transaction refers to the customer as Acme Corp, the second transaction refers to it as Acme, and the third transaction has the value Acme Corp NA. If business rules prescribe that these three names represent the same business entity, i.e., Acme, then the fact that transactions use multiple values to describe the same customer can cause processing errors such as incorrect accounting for orders and balances and inaccurate reporting. The solution is to “normalize” these multiple values into a single value, which could be either of these three, or some other unique and unifying value. In our example, assume Acme Corp NA is used to uniquely represent the customer. Acme Corp NA is then considered the “canonical” form of that customer since it has been chosen as the unique representation for this particular customer. The customer values of all transactions would then need to be “normalized” to Acme Corp NA. The process of creating a unique representation of a data element is called “canonicalization”.
Canonicalization is important not just for data, but for business logic and rules as well. If business rules are specified and applied in multiple forms, they can result in inconsistent rules being applied to data, leading to inconsistent results. In our broker-dealer example, this may result in inconsistent trade valuations. The solution is to capture the business logic for a process in a “canonical” form, using a standard model and grammar to capture the associated rules. The canonical form of a business rule can then be used across the enterprise, leading to the consistency of business logic and results.
Better Business Decisions
With ongoing data consistency, enterprises have cleaner, standardized, and high-quality data, resulting in more accurate analytics, clearer insights, and predictive advantages. Through canonicalized data formats and business rules, enterprises can make better business and strategic decisions.
Increased Operational Efficiency
With standardized data across the enterprise (and a strong data management strategy), fewer resources and efforts go into preventing and resolving data inconsistencies or formatting issues. Ultimately, this increases an enterprise’s overall operational efficiency and reduces risks and costs.
Meet Regulatory Requirements
Inconsistent data that is aggregated for regulatory reporting can create problems for both enterprises and regulators. Enterprises must be able to confidently submit high-quality data (that meets all requirements) to regulators to avoid fines and penalties. Implementing a consistent set of data formats, references, and standards across the enterprise (while establishing transparent business rules), can calm their concerns.
As enterprises are quickly discovering, data quality (and thus data consistency) cannot be achieved or improved upon without an effective data governance framework in place. Only then can firms create successful data management strategies for maintaining high-quality data. An effective data governance model implements processes and rules to ensure that inconsistent data is identified and addressed appropriately on an ongoing basis. A framework can also include canonicalization, which will create enterprise-wide, standardized, and uniform data formats and business logic. Enterprises can both reconcile the current data formats while implementing standards, rules, and processes for future data creation.
Remember, without cohesively and perpetually monitoring data quality throughout the data’s lifecycle, the smallest error or discrepancy from disparate systems storing, enriching, and transforming inconsistent data can quickly trickle down the pipeline and lead to often costly errors across the enterprise. These seemingly innocuous errors ultimately result in a downturn in operational efficiency, regulatory compliance, and overall bottom line.
PeerNova’s Cuneiform Platform is an active data governance, data management, and data quality tool that ensures the consistency of data formats and business logic across the enterprise. The platform not only enforces canonicalization of data but also ensures compressed or netted data/raw data is consistent across the organization. Through automation and ongoing data quality checks, the solution not only remedies existing data issues but also continuously manages and monitors data consistency throughout its entire lifecycle across siloed sources. This allows the firm to continuously enhance their understanding of the data and systemically correct it.
The platform also creates end-to-end (E2E), integrated, and active lineages across disparate tools and systems, building a knowledge graph of the enterprise data. Using E2E visibility and lineage tools, the solution’s self-serve platform allows business users to gain various insights and business knowledge from their data for use in liquidity management, risk management, and more. Additionally, firms can query knowledge graphs to identify risk hot spots and trends.
Effective data quality incorporates all aspects of a piece of data—from ingestion, to cleansing, integration, business rules, and beyond. Additionally, data quality initiatives should be a part of daily operations. It is not a one-time practice, but an ongoing process that must be continually updated and maintained. As data quality becomes an unspoken enterprise best practice, implementing the right data governance framework, data management strategy, and data quality tool remains a top priority.
To learn more about how PeerNova’s Cuneiform Platform can help your enterprise ensure perpetual data quality and consistency, be sure to get in touch with us and request a demo today.