Metadata is foundational for data understanding and should be a top priority of any data governance program.
Without a masterful grasp of what data means, where it comes from or how it’s classified (metadata), it’s virtually impossible for organizations to define quality levels, access rights, usage guidelines, etc., among others.
To understand both the data and the value that can be derived from that data, enterprise-wide trust and transparency of metadata is essential — and organizational demand for these insights has been driving continued innovation and growth of the metadata management solutions market.
Metadata management sits at the core of data governance technology solutions such as Collibra.
Automating Metadata Management
Of course, metadata can be managed in a brute force manner by extracting the technical information from the underlying data sources and organizing it into spreadsheets or databases. While the effort required to gather that information can be excessive, the effort required to keep that information current is usually beyond the capabilities of most companies.
Automated metadata management tools provide capabilities to connect to and collect technical metadata in a central repository from a wide variety of data sources. How that metadata is collected and maintained — and how that differs whether it is business, technical or operational metadata — is the primary challenge for metadata administrators.
Metadata management sits at the core of data governance technology solutions such as the enterprise-oriented data intelligence platform Collibra. Many of our clients reach out to us for help with navigating Collibra’s integration options and identifying the best fit for how to get their metadata into the platform based on their use case(s).
As you would with any data initiative, consider your current enterprise structure, requirements, architecture and culture, as well as technical complexity and maturity. If you’re just beginning your Collibra journey, here is a primer for establishing a baseline.
Collibra Integrations
Metadata can be readily gathered from any Relational Database Management System (RDBMS) platform, as they all provide standard queries for selecting technical metadata using Open Database Connectivity (ODBC) or Java Database Connectivity (JDBC).
Many popular application packages that manage their data structures internally also provide API calls for extracting metadata from their platforms. Metadata governance requires the ability to pull the metadata from the relevant platforms on a regular or as-needed basis, to ensure that changes are reflected in the metadata repository.
For special purpose or specialty data stores or applications, this may require custom code to execute API calls against the application and then translate those results into the specific format required for the metadata repository.
Enterprise-wide trust and transparency of metadata is essential.
Collibra provides its own REST API calls for managing changes to data within its metadata repository, allowing customers to develop their own framework for managing changes to the data in their specialized data stores.
Data governance experts must work with the subject matter experts for all relevant data platforms to determine how data is created, maintained and used within all application systems and to understand how stable or dynamic the underlying data structures are. From this understanding comes the determination of the appropriate metadata integration approach, which can and will vary within an organization’s data governance program.
Relatively stable data platforms can be managed with manual workflows that are triggered by the infrequent changes to the data structures, allowing the use of manual updates to the metadata repository.
More dynamic data platforms, with new data structures being added or existing structures being modified on a regular basis, will require more automated methods for maintaining the metadata repository. This is where the core of metadata integration architecture will need to be developed and executed.
Utilization of the Collibra upsert capabilities allows an approach that pulls all metadata from the source data platform or system and applies all changes to the Collibra metadata repository. New assets are inserted to the repository and existing assets are updated with the latest changes. This simplifies the update process by removing the requirement to track incremental changes in the source data platform.
Another option is Collibra Connect, which packages the automated integration capabilities into the MuleSoft ESB platform. Developers can then build code on the Anypoint platform to access any data source and configure their updates to the Collibra metadata repository.
The capabilities in Collibra Connect correlate to the REST API capabilities and also include the operational aspects of the MuleSoft ESB platform.
Determining the “Right” Solution
As is the case in any data integration architecture, it is practical to select the least complex solution that meets the business needs of the users.
In an organization where all critical data assets are maintained in popular RDBMS platforms like Oracle, SQL Server or DB2, an out-of-the-box Collibra implementation should meet all the organization’s requirements.
In a smaller organization, it’s possible to leverage manual efforts to gather updates into spreadsheets and import those changes to Collibra. This is a manageable approach for a small number of relatively stable data sources.
If the number of assets is large and the data structures are relatively dynamic, the organization should consider implementation of the Collibra Catalog, allowing more automation of metadata collection and reducing the reliance on spreadsheet imports. This approach allows the organization to automate the collection of metadata changes and eliminates the manual effort required to format and import changes via spreadsheets. If required, the cost of implementation can be easily offset by the reduction in manual efforts.
Metadata for metadata’s sake isn’t sustainable, so be economical in your decision-making and focus on high-impact metadata as a starting point.
When specialized data stores or more likely specialized applications are in use, custom code is often required to access the underlying data structures. In these cases, the organization needs to consider acquiring pre-developed and certified code, or developing code to utilize the Collibra REST API integration solutions.
The cost of manually executing special code to create spreadsheets for import to Collibra can rise dramatically, especially if more data needs to be governed and the rate of change increases. Developing code to access the specialty metadata and format REST API calls to push the changes to Collibra can be very cost-effective for most organizations using these specialty data sources.
Govern Metadata Effectively
However you choose to integrate metadata, investing in metadata management will be a critical step toward transformation.
When business, technical and operational metadata are implemented together, rich contexts can interact to form a dynamic and complete view of enterprise data that enables:
- Data trust and knowledge
- Improved regulatory response
- Stronger problem resolution
- Increased speed to market
Of course, metadata for metadata’s sake isn’t sustainable, so be economical in your decision-making and focus on high-impact metadata as a starting point.
Keep in mind that tools need to be governed just like the data, and consider defining a process for metagovernance while you are implementing and integrating your solution.
To learn more about this methodology, check out our related article on Metagovernance for Collibra: Why Governing Governance is Critical.
Article contributed by Les Marsyla. With more than 30 years of experience in information technology, Les has done everything from mainframe and distributed application design and development to complex data management in both traditional and Big Data environments. Though he has a diverse background in IT, much of his career has been spent developing and implementing enterprise data management strategies and initiatives in the utilities sector.