Not since 2016 has there been as much excitement focused on protecting data. GDPR was about to be adopted in April 2016 and organizations certainly were focusing on protecting data, given the massive increase in the financial repercussions for failing to protect personal information.
Today, regulators, startups and organizations are refocusing their efforts on protecting data both in and with AI. The explosion of access to AI, including generative AI tools and particularly large language models (LLMs) such as ChatGPT, Bard and others, has certainly caught everyone’s attention. These tools have rapidly democratized access to AI.
The adoption of generative AI has been so rapid that “asking ChatGPT or Bard” is part of our everyday vernacular, as much as Googling something is. The benefits are enormous, including benefits to cybersecurity teams. They allow individuals and organizations to leverage models trained on vast amounts of data, achieve their own business goals and obtain results in real time that would require substantial time and effort otherwise. This technology will revolutionize numerous fields in all aspects of our lives from healthcare to finance, helping organizations make more informed decisions and improve outcomes for their customers.
But the use of AI in its many forms comes with challenges. GDPR and other similar modern privacy laws have expanded in reach. While these tools are democratizing access to insights from huge datasets in a way that allows individuals and organizations to find knowledge and solutions, and generate content (whether code, art or text) in response to human input in near real time, the legislative requirements to enforce data protection principles like purpose limitation, data minimization, the special treatment of ‘sensitive data’ and the limitation on automated decisions are unchanged. Is it surprising that ChatGPT and other LLMs have already faced the wrath of regulators such as the Italian Data Protection Authority, Garante, responsible for enforcing GDPR in Italy?
CISOs are concerned about the risks of allowing the use of these tools without proper controls on what their users input into these tools. They are also trying to securely meet business demands for new AI tools to develop, refine and trail models on any and every bit of organizational data they are charged with securing.
The challenge for privacy regulators and CISOs is that it doesn’t seem easy to fathom exactly what data the LLM has been trained on and what sensitive data has been retained after training, nor how to realistically exert any control over what prompts the users are entering as inputs. ChatGPT itself indicates that it was “trained on a massive amount of text data from various sources on the internet, such as books, articles, websites and other text-based sources.” In response to queries about personal information, it quickly reminds users that it doesn’t have access to any personal or confidential information, and responses are generated solely based on training data and algorithms. On the other hand, it will quickly reveal a celebrity’s date of birth, despite knowing that dates of birth are personal information.
Only where an organization has exercised thoughtful analysis and training dataset preparation to control the training data used by an LLM will it be possible for a human to understand the data on which the model is based, let alone the model. This is challenging for organizations that simply believe they can point an AI at their data—they simply don’t know what data they hold and where it came from, and struggle to maintain an auditable data lineage to allow them to trust the data fully.
It is almost impossible for organizations with their existing cloud security tooling to definitely state that a training dataset does not contain personal information or alert when sensitive data is accidentally included. Sensitive data may exist in or flow into training datasets and AI may accidentally reveal it to all users who use AI without verifying if they have authorized access to the underlying sensitive data.
To address these challenges, we need to take a multi-faceted approach to data security to keep up with generative AI. First and foremost, we can’t ignore or block the innovation knocking on our door. We need to embrace it and find ways to protect the data being used in a way that is consistent with data protection principles set forth by GDPR. It is up to us to elaborate further on how these principles can apply to these new technologies in a way that balances this against the beneficial uses of these models.
Without a doubt, this means organizations need to become more knowledgeable about the data they hold and how it is being used. “I don’t know” simply is not good enough. Secondly, organizations need to be able to manage their data more effectively. To do all this at scale and with speed, we need to invest in new technologies and approaches to data security. Your enterprise DLP is looking for sharing of data outside your organization, rather than controlling it and monitoring data where it is.
Data-centric tools like data security posture management can be more effective in understanding, monitoring access to and observing changes to the data. Forward-thinking organizations should evaluate risks coming from exposure to a new generation of AI tools (ChatGPT, etc.) and mitigate them without disrupting access to AI tools to let their organizations gain a competitive advantage over other market players.
Ultimately, the future of data security depends on our ability to become more data-centric in our security approaches and thereby adapt and innovate in response to these new challenges. By doing so, we can ensure that our personal information remains safe and secure, while also unlocking the full potential of data-driven innovation.
This article originally appeared here: https://www.forbes.com/sites/forbestechcouncil/2023/05/26/the-future-of-data-security-staying-ahead-of-ai/?sh=5d1ed52614e3