Behind the ubiquity of data disclaimers is the realization that our data is not perfect and that pursuing perfection is akin to boiling the ocean. But are we satisfied with doing nothing now that there is legal protection?
The answer was no. ADOT MPD saw the ability to measure data quality as paramount to building a healthy data culture.
After failing to find any ready-to-use methodologies for data quality measurement, the ADOT DataViz team took matters into their own hands – develop data quality index (DQI) system that serves the following key objectives:
- For data users – an important reference when making data-driven decisions,
- For data owners or stewards – a prioritized blueprint for data enhancements, and
- For the organization – a good data governance practice.
MPD DQI evaluates individual quality elements (QEs) of a data set or layer. QEs are grouped into sections. In the Source section, QEs are used to measure the level of completeness of basic metadata info about the data source and general data health as perceived by the owner. The Process section is also about metadata. QEs in this section are used to measure fidelity and security. QEs in the QC section identify the data issues by technical (database / GIS) and business rules.
A QE can be qualitatively or quantitatively evaluated. When a QE is qualitatively rated, it is referred to as Qual, which assumes the binary value of zero (0) or one (1). Qual value of one indicates a Known or Compliant state.
Quan refers to a QE that is quantitatively assessed. Quan is measured in the cardinal number of zero (0), one (1), or two (2), representing Poor, Fair, Good respectively. When a Quan is based on a calculation as opposed to a subjective rating, such as error rate automatically calculated in a QC process, a predefined value mapping is needed to convert the error rate in percentage to a Quan value, for example.
Once the Qual and Quan for the QEs are defined for a given data set, DQI can be calculated in the three steps using simple formulas:
Equation 1: Compute Section Score
Equation 2: Compute Section Score
Equation 3: Normalize Total Score
There are several important traits of ADOT’s DQI design.
First of all, as factors, a Qual has a higher impact on Section Score than Quan, which are the addends (Equation 1). In the Source Section, for example, one of the qualitative QEs is “Owner and Contact;” an “unknown” or “zero” score will render the entire section pointless. This deliberate design enforces the importance of ownership identification.
The second important trait is the flexibility of the design. The number of data quality sections, the number of quality elements (QEs), as well as QEs’ Qual and Quan designations can be fully customized based on individual agency’s business rules and the types of the data being measured.
The final DQI is normalized (Equation 3) in percentage terms to maintain the comparability between data sets within an agency’s data domain.
To achieve the DQI design objectives, the ADOT DataViz team devised the DQI management architecture (Figure 1) with three distinctively different application components.
Figure 2 is a screen capture of the Metadata Dashboards, where sections and quality elements (QEs) are listed along with the Qual and Quan scores for an LRS event layer. The low score was mainly due to the QE “Key Integrity” having Qual of “zero” due to primary key violation. The dashboard also shows that the QE “Temporal Consistency” scored “Poor”. Table 1 was used to convert the calculated error rate of each QE to the respective Quan in this QC section.
|Qaun||QC Error Rate|
|1||Between 0.001% and 1%|
The implemented DQI system in ADOT misses two important QEs – Completeness and Positional Accuracy. The former can be added to the framework design once target metrics representing “completeness” of the data layers are defined. More research is needed to accommodate the latter, however.
The design and implementation of ADOT’s DQI system by the Multi-modal Planning Division (MPD) is an important building block in the organization’s effort to cultivate a data-driven culture.