AI’s Opaque Box Is Actually a Supply Chain

Blockchains make data for AI trustworthy

By Karen Kilroy
June 20, 2023
Containers Containers (source: Pixabay)

Understanding AI’s mysterious “opaque box” is paramount to creating explainable AI. This can be simplified by considering that AI, like all other technology, has a supply chain. Knowing what makes up the supply chain is critical to enforcing the security of the AI system, establishing trust with the consumer of the AI’s output, and protecting your organization from undue risk.

When pondering your approach to dissecting AI’s supply chain, consider how production, shipping, delivery, and invoicing are steps in just about any supply chain, for everything that you use, from toothpaste to technology. AI models are also created and delivered via supply chains.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Some of the steps in AI’s type of supply chain can be tricky to follow, with special gotchas like technology company trade secrets, closed code, and program synthesis—which is the process of AI writing its own code to improve itself. Combined with continuous machine learning cycles and deployments, reviews, and recalls, there are a lot of opportunities to bring transparency to the opaque box.

Blockchain technology is chosen by companies like Walmart to bring transparency to supply chains like food production and delivery, because it is tamper evident and distributed. Blockchain technology is used in an enterprise stack alongside other systems, to make integrations more secure and to establish a single audit trail. Verification, including that of the identity of all participants in a blockchain network, and compliance are woven throughout the workflow and processes.

Typically, an enterprise blockchain audit trail will consist of linked blocks containing transactions that reference hyperlinks to data that is stored off chain in traditional databases. Meanwhile, the system creates a cryptographic verification of that data and stores the verification on blockchain, which is comparable to the traditional process of providing a checksum to ensure integrity of a file download. If data on the blockchain network ever undergoes tampering, the cryptographic hash used as verification will no longer compute to the same value.

When you dissect AI’s supply chain, at the root, you will find algorithms. These are the mathematical formulas written to simulate functions of the brain, which underlie the AI programming. The algorithms are compiled into code libraries, and then distributed to AI developers who use them to write custom AI models. Meanwhile, a data scientist acquires and prepares training data, which is then used to bring the AI model to life.

University of Baltimore Law Professor Michelle Gillman, who fights to help people who were automatically denied benefits, recently spoke with NBC about the importance of understanding the origin of algorithms when managing AI risk. According to Gillman, whose clients often face life and death situations that are being decided by AI, “I’ve been in hearings where no one in the room can describe for me, how does the algorithm work? What factors does it weigh? How does it weigh those factors? And so you are really left unable to make a case for your client in those circumstances.”

Next, a workflow begins that implements an AI engineering and machine learning operations (MLOps) process, in which cycles of experiments and deployments are conducted, and the AI model, its data, and the variables, or hyperparameters of the experiment are tested, tweaked, and improved. This part of the supply chain keeps going in a cycle even after delivery to the consumer, since the training and improvement process is generally continuous. The consumer’s input in the form of reviews and ratings becomes part of the process to improve the model. The stakeholders of the project, such as the management of the organization that built the AI model, may also add their input and follow through to make sure it is considered.

If an organization is large, the AI model’s supply chain can involve extended teams and even multiple organizations. Or, it is entirely possible that by using cloud services and AI marketplaces, a single developer can perform all of these functions alone. In any case, you can add an enterprise blockchain technology, like Hyperledger Fabric, to the stack so you can track, trace, audit, and even recall your model.

An enterprise blockchain network is sometimes used to bring transparency to the supply chain. This helps network participants trust one another because they are members of the same blockchain network. The blockchain network is also really helpful when something goes wrong and a product needs to be quickly traced to its origin.

In the case of Walmart, they pioneered the use of enterprise blockchain to track and trace food that potentially carried a foodborne illness. For example, if a customer became sick from a package of sliced mangoes in any Walmart store, the mangoes had to be discarded at all of the stores because it took more than 6 days to trace the affected shipment. The new blockchain network cut this time to 2.2 seconds, saving Walmart the expense of discarding good mangoes. Walmart continues with their supply chain blockchain strategy today, which has become the foundation of automated payment systems for their many suppliers.

When this strategy is applied to AI’s opaque box, the convenience of a supply chain blockchain network will help you to track and trace important factors like the reason why an AI model’s intent or domain has drifted, or to learn what type of treatment was given to data that was used to produce a certain outcome. As explained in the O’Reilly book I co-authored, Blockchain Tethered AI, there are four blockchain controls for AI, which are:

  • Control 1, which is pre-establishing identity and workflow criteria for people and systems, can be used with AI to verify that data and models have not undergone tampering or corruption. This control includes criteria for telling humans apart from AI models.
  • Control 2 addresses distributing tamper-evident verification, which can be used with AI to make sure that the right people, systems, or intelligent agents—with the right authorization—are the only ones that participate in governance of or modification to the AI. This control can be used to create a tamper-evident audit trail of training data, even if that data is supplied by users in the form of chat history, as is the case with ChatGPT. A record can be stored on blockchain indicating whether the user has agreed to have their chat history used as training data or not, and if a chat is used as training data, the prompts within it can be reviewed by a human or intelligent agent for issues such as ethics violations, data sabotage, or other issues before it is used.
  • Control 3 involves governing, instructing, and inhibiting intelligent agents, and will become very important when wanting to trace or reverse AI, or prove in court that the output of AI is traceable to certain people or organizations. Reviews and ratings of how a model is performing can help to catch and address inappropriate or unethical output.
  • Control 4 is showing authenticity through user-viewable provenance, which will be especially important in using branded AI that has underlying components which come from distributed marketplaces. Understanding and proving provenance is a factor in legal issues involving AI.

This ability to track and trace can also be extended to the consumer, through the display of trust logos. The concept of trust logos, which are the long-time hallmark of e-commerce security, can be applied to AI by connecting the logos to the underlying blockchain network, and programming them to alert consumers should the AI model become compromised. A similar method could be used to show whether a customer service representative is an AI or a human.

Keep in mind that people in different roles may need different types of information in order to trust AI models. Depending on the perspective of the entity requesting the information, different levels of traceability could be desired. For example, a person answering their phone should be able to see an indicator as to whether a caller is AI, and whether or not the AI is from a trustworthy source. An engineer deciding whether or not to integrate AI components with their models would want a much deeper understanding of the supply chain, and a stakeholder might want to see if the reviews and comments are authentic and find out what is being done to address any recalls. This also brings up the question of a special handshake to enable AI models to trust one another and establish boundaries.

Even though you might not know everything about your AI model, you can commit the facts you do know to blockchain. Develop an AI factsheet as described in Chapter 1 of Blockchain Tethered AI. If you have used models that you have downloaded from marketplaces, you can typically find an AI model card and data cards that provide basic facts about the materials you are using. Also, you can always document that a part of the model is indeed “opaque,” and complete that part later once the details are known.

You can implement your blockchain network for your AI model’s supply chain in the same way that enterprise blockchain networks are used by developers for other purposes. You only need to record cryptographic verifications on your blockchain network, while storing the actual components of the AI off-chain. The code that comes with Blockchain Tethered AI can help you to visualize and implement this architecture.

This blockchain verification, which works similar to a checksum you might see when downloading a file, can be checked against the model and its components at any point to see if they have undergone any tampering. This type of use of a blockchain network doesn’t involve cryptocurrency or miners or use any unusually high amounts of energy to run, and should be thought of instead as a distributed text-based super-log that is automated by smart contracts.

Being able to track and trace goods in this way helps prevent sales of counterfeit goods, helps food companies to recall items quickly without having to throw everything away, and helps artists, musicians, and content creators be paid for their work. When applying these techniques and controls to make AI’s opaque box explainable, your AI models will enjoy the competitive advantage of being trackable, traceable, controllable, and even stoppable.

Post topics: AI & ML, Artificial Intelligence
Post tags: Research
Share:

Get the O’Reilly Radar Trends to Watch newsletter