Synthetic media are everywhere. Digital images and objects that appear to index something in the world but do nothing of the sort have their roots in video games and online worlds like Second Life. However, with the growing appetite for niche machine learning training sets and artificial environments for testing autonomous machines, synthetic media are increasingly central to the development of algorithmic systems that make meaningful decisions or undertake actions in physical environments. Microsoft AirSim is a prime example of the latter, an environment created in Epic’s Unreal Engine that can be used to test autonomous vehicles, drones and other devices that depend on computer vision for navigation. Artificial environments are useful testing grounds because they are so precisely manipulable: trees can be bent to a specific wind factor, light adjusted, surface resistance altered. They are also faster and cheaper places to test and refine navigation software prior to expensive material prototyping and real-world testing. In machine learning, building synthetic training sets is an established practice. Synthetic media are particularly valuable in contexts such as armed conflict, where images might be too few in number to produce a large enough corpus and too classified to be released to either digital piece workers for tagging or private sector developers to train algorithms.
But what happens when synthetic media are marshalled to do the activist work of witnessing state and corporate violence? What are we to make of the proposition that truths about the world might be produced via algorithms trained almost exclusively with synthetic data? This essay sketches answers to these questions through an engagement with Triple Chaser, an investigative and aesthetic project from the UK-based research agency Forensic Architecture. Founded in 2010 by architect and academic Eyal Weizman and located at Goldsmiths, Forensic Architecture pioneers investigative techniques using spatial, architectural, and situated methods. Using aesthetic practice to produce actionable forensic evidence, their work appears in galleries, court rooms, and communities. In recent years, they have begun to use machine learning and synthetic media to overcome limited publicly available data and to multiply by several orders of magnitude the effectiveness of images collected by activists. My contention in this essay is that these techniques show how algorithms can do the work of witnessing: registering meaningful events to produce knowledge founded on claims of truth and significance.
Presented at the 2019 Whitney Biennial in New York, Triple Chaser combines photographic images and video with synthetic media to develop a dataset for a deep learning neural network able to recognise tear gas canisters used against civilians around the world. It responds to the controversy that engulfed the Biennial following revelations that tear gas manufactured by Safariland, a company owned by Whitney trustee Warren B. Kanders, was used against protestors at the US-Mexican border. Public demonstrations and artist protests erupted, leading to significant negative press coverage across 2018 and 2019. Rather than withdraw, Forensic Architecture submitted an investigative piece that sought to demonstrate the potential for machine learning to function as an activist tool.
Produced in concert with Praxis Films, run by the artist and filmmaker Laura Poitras, Triple Chaser was presented as an 11-minute video installation. Framed by a placard explaining the controversy and Forensic Architecture’s decision to remain in the exhibition, viewers entered a severe, dark room to watch the tightly focused account of Safariland, the problem of identifying tear gas manufacturers, the technical processes employed by the research agency, and its further applications. Despite initial intransigence, the withdrawal of eight artists in July 2019 pushed Kanders to resign as vice chairman of the Museum and, later, announce that Safariland would sell off its chemicals division that produced tear gas and other anti-dissent weapons. Meanwhile, Forensic Architecture began to make its codes and image sets available for open source download while applying the same techniques to other cases, uploading its Mtriage tool and Model Zoo synthetic media database to the code repository GitHub. A truth-seeking tool trained on synthetic data, Triple Chaser reveals how witnessing can occur in and through nonhuman agencies, as well as and even in place of humans.
In keeping with the established ethos of Forensic Architecture, Triple Chaser demonstrates how forensics – a practice heavily associated with both policing – can be turned against the very state agencies that typically deploy its gaze. As the cultural studies scholar Joseph Pugliese points out, ‘[E]mbedded in the concept of forensic is a combination of rhetorical, performative, and narratological techniques’1 that can be deployed outside courts of law. For Weizman, the fora of forensics is critical: it brings evidence into the domain of contestation in which politics happens. In his agency’s counter-forensic investigation into Safariland, tear gas deployed by police and security agencies becomes the subject of interrogation and re-presentation to the public. In this making public, distinctions and overlaps can be traced between different modes of knowledge making and address: the production of evidence, the speaking of testimony, the witnessing of the audience. But how might we understood the role of the machine learning algorithm itself? And what are we to make of this synthetic evidence?
Weizman describes the practice of forensic architecture as composing ‘evidence assemblages’ from ‘different structures, infrastructures, objects, environments, actors and incidents’.2 There is an inherent tension between testimony and evidence that forensics as a resistant and activist practice seeks to harness by making the material speak in its own terms. As a methodology, forensic architecture seeks a kind of ‘synthesis between testimony and evidence’ that takes up the lessons of the forensic turn in human rights investigation to perceive testimony itself as a material practice as well as a linguistic one. Barely detectable traces of violence can be marshalled through the forensic process to become material witnesses, evidentiary entities. But evidence cannot speak for itself: it depends on the human witness. Evidence and testimony are closely linked notions, not least because both demarcate an object: speech spoken, matter marked. Testimony can, of course, enter into evidence. But I think something more fundamental is at work in Triple Chaser. It doesn’t simply register or represent: it is operational, generative of relations between objects in the world and the parameters of its data. Its technical assemblage precedes both evidence and testimony. It engages in a witnessing that is, I think, nonhuman. Triple Chaser brings the registering of violations of human rights into an agential domain in which the work of witnessing is necessarily inseparable from the nonhuman, whether in the form of code, data, or computation.
As development commenced, Triple Chaser faced a challenge: Forensic Architecture was only able to source a small percentage of the thousands of images needed to train a machine learning algorithm to recognise the tear gas canister. They were, however, able to source detailed video footage of depleted canisters from activists, and even obtained some material fragments. Borrowing from strategies used by Microsoft, Nvidia and others, this video data could be modelled in environments built in the Unreal gaming engine, and then scripted to output thousands of canister images against backgrounds ranging from abstract patterns to simulated real-world contexts. Tagging of these natively digital objects also sidestepped the labour and error of manual tagging, allowing a training set to be swiftly built from images created with their metadata attached. Using a number of different machine learning techniques, investigators were able to train a neural network to identify Safariland tear gas canisters from a partial image, with a high degree of accuracy and with weighted probabilities. These synthetic evidence assemblages then taught the algorithm to witness.
Like most image recognition systems, Triple Chaser deploys a convolutional neural network, or CNN, which learns how to spatially analyse the pixels of an image. Trained on tagged data sets, CNNs slide – convolve, rather – a series of filters across the surface of an image to produce activation maps that allow the algorithm to iteratively learn about the spatial arrangements of large sets of images. These activation maps are passed from one convolution layer to the next, with various techniques applied to increase accuracy and prevent the spatial scale of the system from growing out of control. Exactly what happens within each convolutional layer remains in the algorithmic unknown: it cannot be distilled into representational form but rather eludes cognition.
Machine learning processes thus exhibit a kind of autonomic, affective capacity to form relations between objects and build schemas for action from the modulation and mapping of those relations. Relations between elements vary in intensity, with the process of learning both producing and identifying intensities that are autonomous from the elements themselves. Intensive relations assemble elements into new aggregations; bodies affect and are affected by other bodies. Geographer of algorithmic systems Louise Amoore writes that algorithms must be understood as ‘entities whose particular form of experimental and adventurous rationality incorporates unreason in an intractable and productive knot’.3 There is an autonomic quality to such algorithmic knowledge making, more affective than cognitive. In the context of image analysis, Anna Munster and Adrian MacKenzie call this platform seeing,4 a mode of perception that is precisely not visual because it works only via the spatial arrangement of pixels in an image, with no regard for its content or meaning. This machinic registering of relations accumulates to make legible otherwise unknown connections between sensory data, and it does so with the potential (if not intention) to make political claims: to function as a kind of witnessing of what might otherwise go undetected.
Underpinning the project is the proposition that social media and other image platforms contain within them markers of violence that can and should be revealed. For the machine learning algorithm of Triple Chaser, the events to which it becomes responsible are themselves computational: machinic encounters with the imaged mediation of tear gas canisters launched at protesters, refugees, migrants. But their computational nature does not exclude them from witnessing. With so much of the world now either emergent within or subject to computational systems, the reverse holds true: the domain of computation and the events that compose it must be brought within the frame of witnessing. While the standing of such counter-forensic algorithms in the courtroom might – for now – demand an expert human witness to vouch for their accuracy and explain their processes, witnessing itself has already taken place long before testimony occurs in front of the law. Comparisons can be drawn to the analogue photograph, which gradually became a vital mode of witnessing and testimony, not least in contexts of war and violence. Yet despite its solidity, the photograph is an imperfect witness. Much that matters resides in what it obscures, or in what fails to enter the frame. With the photograph giving way to the digital image and the digital image to the computational algorithm, the ambit of witnessing must expand. As power is increasingly exercised through and even produced by algorithmic systems, modes of knowledge making and contestation predicated on an ocular era must be updated.
As Triple Chaser demonstrates, algorithmic witnessing troubles relations both between witness and evidence and between witnessing and event. This machine learning system, trained to witness via synthetic data sets, suggests that the linear temporal relation in which evidence – the photograph, the fragment of tear gas canister – is interpreted by the human witness cannot or need not hold. Through their capacities for recognition and discrimination, nonhuman agencies of the machinic system enact the witnessing that turns the trace of events into evidence. Witnessing is, in this sense, a relational diagram that makes possible the composition of relations that in turn assemble into meaningful, even aesthetic objects. If witnessing precedes both evidence and witness, then witnessing forges the witness rather than the figure of the witness granting witnessing its legitimacy and standing.
While this processual refiguring of witnessing has ramifications for nonhuman agencies and contexts beyond the algorithmic, Forensic Architecture’s movement into this space suggests the strategic potential of machine learning systems as the anchor for an alternative politics of machine learning. While I firmly believe that scepticism towards the emancipatory and resistant potential for machine learning – and algorithmic systems more generally – is deeply warranted, there is also a strategic imperative to do more to ask how such systems can work for people rather than against them. With its tool and synthetic media database both made open source, Forensic Architecture aims to democratise the production of evidence through the proliferation of algorithmic witnessing that works on behalf of NGOs, activists and oppressed peoples, and against the techno-political state.
Notes:
- Pugliese, Joseph. Biopolitics of the More-Than-Human: Forensic Ecologies of Violence. Durham, NC: Duke University Press, 2020.
- Weizman, Eyal. Forensic Architecture: Violence at the Threshold of Detectability. New York: Zone Books, 2017.
- Amoore, Louise. Cloud Ethics: Algorithms and the Attributes of Ourselves and Others. Durham: Duke University Press, 2020.
- 1. MacKenzie A, Munster A. Platform Seeing: Image Ensembles and Their Invisualities. Theory, Culture & Society. 2019;36(5):3-22. doi:10.1177/0263276419847508