header-ni-ii

News Insights

Read more about Symbolic Systems projects and learn about our innovative thought leadership with our Industy Insights..

Industry Insight
 
Using Canonical Data Modeling to Enable Data Interoperability
Written by Symbolic Systems, Inc.
PDF Download

The Canonical Data Model provides a "single version of the truth" for the scope of the data domain. It can be used as a data architectural framework for data integration of the systems participating in data sharing as well as to illustrate data requirements for both data in motion and data at rest.

 
 
Authoritative Data Source (ADS) Framework
Written by Frank J. Ponzio, Jr. .
PDF Download

The introduction of the ADS Framework concepts and the maturity model for ADS is transformational for improving the culture and practice of continuous data improvement. The ADS Framework is also scalable, in that the model is as relevant for an ADS product that provides code lists as it is for an ADS product that provides a complex architecture.

 
 
Improving Trust in ADS Data Content
Written by Frank J. Ponzio, Jr.
PDF Download

Improved methodologies and metrics are needed to raise and maintain the accuracy of the data being provided by ADSs.  Data accuracy and consistency develops enterprise trust of the ADS data. The ADS Framework and Maturity Model addresses the improvement of methods and metrics to develop content trust of ADS products.

 
 
DATAMRI®: A User Tool for Assessing Data Quality
Written by Frank J. Ponzio, Jr.

Data quality is often overlooked as an issue in software development and software maintenance projects, when in fact poor data quality affects overall project costs on a micro level and ultimately hurts the competitiveness of an organization on a macro level. Trading partners, such as manufacturers and their suppliers, must develop methodologies and need tools to deal with data quality issues in cooperative system environments.

Symbolic Systems, Inc. provides a data quality measurement and reporting service using its patent-pending DATAMRI® tool. This non-invasive tool checks the validity of the data in a file, compares the data between files, and provides other metrics, including grading the data, via customized reports.

Once configured to meet the unique needs of the parties, such as incorporating both system (table, field, etc. structure and attributes) and business rules, the file(s) are submitted to and data quality reports are retrieved via a simple web-based interface.

Background

The problem of poor data quality processed by information systems is widespread in commercial and government environments. Only recently, however, has widespread attention been paid to data quality issues. This growing interest is due to the negative impact that poor data quality has on the competitiveness of an organization.

Even though there are techniques and methodologies to improve data quality within a single autonomous system, up to now, there has been little research into the issues associated with and few tools available to deal with data quality in cooperative environments, for example between trading partners, where interoperability is critical.

In cooperative scenarios the primary data quality issues are:

  • Assessment of the quality of data owned by any participant organization.
  • Methods and techniques for exchanging quality information.
  • Improvement of quality in the overall environment.
  • Data and system heterogeneity.

Data quality control is essential at all stages in the life cycle of systems that exchange large amounts of data, regardless of the file type (general, XML, etc.) being exchanged. However, data quality is often ignored as development organizations concentrate on software issues or because of schedule and resource pressures.

This actually increases the overall cost of developing systems, but because this cost is born by the data recipient, it is not always included in the development cost numbers. Two problems can occur in development mode: First, if send files are not thoroughly tested, the problems may not be noticed until the recipient tries to use the files. Because the problems occur on the recipient’s side, the recipient will debug its software. The consequence is that if the problem occurred because of the sender’s data, the recipient will end up debugging the sender’s system. The recipient’s development costs increase, not because of software problems, but because of poor data.

Poor data quality also increases the time to make the system fully operational, but because the software is released, the “clean-up” is attributed to testing or implementation, and not to methodology and data testing shortcomings in the initial development phase.

In maintenance mode, data quality checks are equally as important. When new operating system or software versions are installed, data quality checks are also sometimes ignored. Without warning, bad data is noticed, but no one is sure of the source. The consequence is a lot of backtracking to determine what changes might have initiated the problem.

Mature industries, like manufacturing, have adopted quality guidelines, standards, and dispute resolution mechanisms. This is exemplified in automobile manufacturing, where the Japanese institution of measurable quality standards set the bar for the entire industry. In other industries, quality grading is an accepted and agreed to guideline to determine the usefulness and the value of the product. For example, specific metrics determine Grade A vs. USDA meat, a five-star vs. a four-star restaurant or hotel, and first quality vs. seconds in apparel. The metrics are determined by the community of interest, and are understood by the product’s provider and recipient. They are used to measure the value (and ultimately price) and to determine the product’s disposition.

The new frontier for the development of such standards within software-based communities of interest is in data quality.

Challenge

The issue is how to address the metrics supporting quality of data. We must make sure that the data files, both sent and received, are of a certain maturity level. The challenge, however, is in determining what qualifies as quality data and how that determination can become, easily and cost-effectively, part of software development and maintenance methodology.

The software industry has set standards for edits to check software, but the industry, and even cooperative system developers, may have no uniform guidelines on the same edits or the uniform implementation of those edits. This happens, in part, because of different coders, different implementation schedules, and different base and development systems. As a simple example, all organizations may agree on a field description, but some may implement that field with leading zeros and others may not. Once developed, code and logic may be replicated many times, in many places, and in many instances. The result is that a small edit difference becomes a major data quality system problem that must be revised among all trading partners.

In large distributed systems, there is also the added issue of synchronizing changes to tables throughout a whole system. As an example, what quality checks are put in place when a new price or new code is to become effective on the same day or within a designated period or when a code is terminated and must not be used after a designated expiration date? Trading partners need a methodology to ensure that the data changes have been implemented.

Solution

One solution is to centralize all the code related to data-edits plus other attributes, parameters, and business rules-into a single system, using the same technology, and to apply the code at the same time to all of the files.

Trading partners (i.e., a data quality community of interest) would agree on the rules as provided, and further agree on a set of grading metrics to determine when data, or what data, were acceptable for transmission. The metrics might, for example, allow transmission of files with certain data errors, but not allow transmission of files with other data errors; or the metrics might indicate acceptability for transmission on a row by row basis so that not all records are held up waiting for certain data to be fixed.

This type of quality checking has merit throughout the life cycle of the software. In the development stage, as trading partners complete their own unit testing, they would submit the files for a data quality check and receive a detailed condition report. When files were at an acceptable condition level, they could be exchanged with trading partners.

Before distributing a new release of a system, version comparisons between the production and the new system could confirm whether software changes had unexpectedly affected the data. If parallel runs between the current and new systems are planned, file comparisons can highlight any data differences.

In production mode, files can be sampled at different levels and sampled periodically to confirm that no changes had been made, or files can be monitored as part of the transmission process. In either scenario, if a change occurred that affected data, the technical and business sides of both organizations would know that there was an outstanding issue to address.

Symbolic provides a service, combined with a tool, DATAMRI®, that performs these types of comparisons and provides customized reports, depending on the needs and format requirements of the community of interest.

In order to implement DATAMRI®, a number of steps must occur. These steps are incorporated into the specific DATAMRI® process through which the files will be passed:

  • The community determines the rules that will be applied against the data. Rules include not only data rules, but also applicable business rules.
  • A sample data file(s) is parsed from its native format (which may be anything from an XML file to a Word or PDF document) to one that can be passed through DATAMRI®.
  • If different files are going to be compared, they are massaged into a common format. This may include determination of additional rules related to similar, but non-matching fields that must be compared.
  • Report formats are developed, including the levels of detail and report types, e.g., exceptions only, summary, etc.
  • The web-based submission form is configured to include all of the file, review, and report types as well as security requirements.

Once the planning is completed, users can submit files and retrieve reports. File submission is not a technical function. Via the Internet, a user fills out a form, checking the types of analysis reports required, confirming the e-mail address for notification when the analysis is complete, and attaching the file(s) for analysis. When the analysis is complete the user can open, view, or download the analysis files, which can be in Excel format for easy manipulation.

Conclusion

When the data passing between cooperative systems is monitored, the overall quality of information on which the system users rely, as well as the overall integrity of the organization’s systems, is enhanced. IT’s development and support costs of systems are reduced when inaccurate data can be identified. The costs---in financial departments, customer service, sales, manufacturing departments, and management---of relying on inaccurate information is minimized. Ultimately, the overall competitiveness of the organization is increased.

Expand and learn more
 
1 2 3

Page 2 of 3
Did You Know?
  • 071910_symbolic_timeline_thumbs_0035_1980.jpg1980 - Expanding IT solutionsSymbolic Systems expands IT solutions to the restaurant industry. It provides the first back office system using a PDP 11 computer to The Four Seasons, NYC.

Symbolic Activities