The cybersecurity industry has turned data lineage into another buzzword, with vendors promising complete data visibility that will solve all data protection challenges. This marketing transforms a useful data governance tool into a supposed security panacea—a dangerous misdirection pulling security teams away from fundamental controls that actually prevent data breaches.
This blog explores what data lineage actually delivers versus what it promises, identifies its blind spots around identity and permissions and outlines foundational data security controls that security teams should prioritize instead.
What Data Lineage Actually Does
Data lineage comes in two forms:
- Structured data lineage maps how data transforms through databases, warehouses and analytics platforms by parsing structured query language (SQL) statements and tracking dependencies.
- Unstructured data lineage tracks how files and documents move between storage systems—from SharePoint to local drives to cloud accounts.
Both approaches excel as force multipliers for data classification and labeling. Instead of manually classifying each asset, you classify once at the source and let the lineage propagate that decision throughout your ecosystem. This transforms classification from a per-asset exercise into a propagation exercise, accelerating discovery work that would otherwise require months of manual investigation.
The Critical Data Lineage Blindspot
Even the best lineage tools hit a fundamental limitation, particularly for security use cases: They track data movement but ignore the identities and permissions that enable that movement. This creates critical security blind spots.
Structured lineage might show customer data flowing from Production Table A through Transformation Process B to Analytics Warehouse C, but it won’t reveal which service account executed that transformation, which other accounts have access to that data, or whether the same account could route data to unauthorized destinations.
Similarly, unstructured lineage might show Document X being copied from SharePoint to a local drive, then uploaded to Salesforce. But without identity context at each data store, you can’t determine if unauthorized users gained access through these movements.
The ‘Who Can Do What’ Gap
Consider these scenarios where lineage without identity context provides dangerously incomplete information:
- Your lineage diagram shows customer data flowing through an approved extract, transform, load (ETL) process. What it doesn’t show: The identity running this process also has write access to external application programming interfaces (APIs) and could route the same data to unauthorized third-party systems.
- Your lineage shows a confidential document shared via email and uploaded to three OneDrive folders. What it doesn’t reveal: which folder is publicly shared.
Every significant data exposure requires two permission failures: source access (reading sensitive data) and target access (writing to inappropriate destinations). Lineage tools provide beautiful visualizations of authorized flows while missing the identity-based attack paths that bypass those channels entirely.
Most lineage implementations can’t even identify which specific identity executed the transformations they’re tracking. They show data moved through Process X at Time Y, but not that Service Account Z with dangerous external write permissions executed that process.
The ‘Data Classification Changes’ Gap
Lineage tools assume sensitivity remains static as data moves through systems. This assumption breaks when data transformations or temporal factors create new sensitivity levels, such as in the following examples:
- Individual customer segments are routine business metrics. But aggregate thousands of them across multiple dimensions, and you start seeing bigger patterns, like which segments are gaining popularity across regions or how customer behavior changes before major market shifts. These patterns become valuable intelligence that competitors would pay millions to access.
- A pharmaceutical company combines research timelines (internal use), competitor patent filings (public) and resource allocation (internal use). Each dataset individually warrants routine classification, but their combination reveals strategic research priorities and drug development timelines—trade-secret-type information.
- The same customer email has different security implications in a marketing database versus a fraud investigation database, despite identical technical lineage.
Until lineage tools recognize when data combinations, context changes and temporal factors alter sensitivity levels, they’ll provide dangerous false confidence and miss the security implications of what those data flows actually create.
The Real Security Challenge
The highest-risk scenario isn’t created when data moves through tracked processes or if data leaks from the endpoint, but when an identity can access sensitive data and put it somewhere else with a vastly different permission profile. Lineage fundamentally cannot answer: “Which of the 12 identities accessing this financial data also have write permissions to external personal cloud storage, third-party analytics platforms or other places not covered by data loss prevention?”
Organizations expecting lineage to solve all their data security problems will find themselves with great visibility into where data came from, but remaining vulnerable to risks from identities they can’t track and permission combinations they can’t assess.
The Path Forward For Security Teams
Data lineage serves as an accelerator for foundational data governance, making discovery faster, classification more comprehensive and compliance documentation more complete. These are valuable capabilities, but accelerating data governance isn’t the same as reducing security risk.
Effective data security requires answering four fundamental questions:
- What sensitive data do we have?
- Which identities can read it?
- Where can those identities write data?
- Are any permission combinations creating unacceptable risk?
Focus On Information Flow Control
Data will flow where identities can take it—and lineage tools that can’t tell you which identities those are leave you fundamentally unable to assess or reduce your real security risk. To address this gap, organizations should focus on capabilities providing information flow control through data, identity and operation-type policy combinations. This three-dimensional approach governs not just what data is moved but also who can move what data for what purpose, addressing the identity blindness that makes lineage tools fundamentally inadequate for security.
The future of data security lies not in tracking how data moves through authorized processes, but in understanding which identities can execute those processes and what other access those identities possess. This requires comprehensive data security and identity governance that addresses both source access and destination permissions.
This article originally appeared here: https://www.forbes.com/councils/forbestechcouncil/2025/10/20/beyond-the-buzzword-data-lineage/