Sonifying cyber-attacks on water plants: towards the definition of a design approach to sonification

It’s time to share some thoughts and results on the first “juicy” sonification project we worked on for about 6 months at Density Design together with the Singapore University of Technology and Design (SUTD). “We” stands for prof. Stefano Galelli and Dr. Riccardo Taormina from SUTD, my supervisor Prof. Paolo Ciuccarelli, also founder and Scientific Director of Density and currently at Northeastern University in Boston and Ginevra Terenghi, MA student (now successfully graduated) in Design at Politecnico di Milano who volunteered to collaborate with me in order to discover, from her mainly visual background, the potentialities of using sound in data representation. Stefano and Riccardo are the developers of an algorithm which goal is to identify (thus helping recognizing and, in the future, preventing) cyber-attacks to digital/physical networks, In particular, they work on water plants, a key infrastructure for any country in the world and more and more the object of hackers’ unwanted attention due to the transition from a fully physical system to a mixed system controlled by digital sensors and a digital central control unit, the SCADA  (Supervisory Control and Data Acquisition) system.

In my two previous posts I was delineating a possible strong use case for the introduction of sonification in the real world, i.e., process monitoring in AI-led anomaly detection systems. This project seemed to have all the right characteristics to develop a first design activity to investigate the real potential of sound to support the operators’ action taking in case of anomaly.

I regard this project as a first action towards the final goal of delineating a set of guidelines applicable to the design of real-world applications of sonification. As a follow up to the conclusion of the study I am therefore trying to identify the different work phases from conceptualization to prototype development to field validation, phases I will apply and test in the new actions I am currently working at.

The work process, which lasted approximately six months, included:

·      Research for Design: the exploratory phase of collecting all material, literature, debates and overall information on context, users’ needs, objectives and current debate to define the problem at stake;

·      Concept development: the sketching of a series of hypothesis on how the sound content should be, how the mapping strategy (of data to sound) should behave, and how the user’s interaction with both data and sound a real environment could be implemented;

·      Prototyping: we went through two series of prototyping, quite distant one from the other, the first series being so unsuccessful that it required starting from scratch with a second series;

·      Field validation/Research through Design: the validation phase, where we first designed an experimental protocol and then deployed it with the help of six expert testers; it also included the analysis of experimental results, which has not been a minor task as we were trying at the same time to experiment with research methodologies and analysis methodologies for data sonification’s projects;

·      A Dissemination and Iteration phase, still largely ongoing, which consists in sharing as much as possible the results and the process to gather feedback – this post is obviously part of this phase! We are also planning to iterate the prototype design process based on the results of the field validation.

So, nothing new under the sun in terms of basic design methods. On the contrary, the overarching idea is to test how much a largely accepted – consciously or not – and at least partly codified approach to the everyday practice of design can apply to sonification, a field that up to date has not shined for successful practical applications of its outcomes nor for validation of results in a real world environment, with real expert users. I tend to think that the lack of a user-centered, context-aware, goal-oriented and iterative design methodology is part of the problem. If we want, of course, as I do, to consider the lack of practical outcomes of sonification projects a problem at all.

Exploration: understanding how cyber-attacks work and how algorithms are meant to help us preventing them.

As mentioned, more and more the complex systems we rely on for the smooth functioning of our organized lives (such as, in this case, water plants) are dependent on digital elements that can be infiltrated, damaged, taken control of -  briefly, hacked causing serious accidents which consequences are still largely unknown. Little in fact is shared by stakeholders on real cases of cyber-attacks to national facilities. Many research institutions are working together with governments to develop an artificial intelligence able to build, through deep learning and machine learning, reliable defense systems. These systems basically consist on algorithms that can analyze large flow of data in order to build models able to detect anomalies occurring in real time by comparing the actual behavior of the system with the predicted behavior based on the model. In doing so, these systems should enable human operators not only to react in time to a hackers’ attack, but also to prevent it before it happens.

As we saw, though, these types of AI-led systems are still not completely reliable, leaving it to the human in charge to take the right decision at the right time. And anyways, as one of our testers stated during the interview, with all that’s at stake “in a water plant, we do not trust machine”.

We worked for a few weeks to dig as deeply as we could into the context of usage of our algorithm in order to imagine at which point sound could enter the picture. The algorithm is modelled on the water plant network of the imaginary C-Town. The network consists on a certain number of tanks, pumps and valves distributed over 5 districts. All the components of the network are monitored through digital sensors physically places on the component, which send their data to the central SCADA system.

Map of C-Town and our Water Distribution Network. From Ginevra’s MA Thesis, April 2019.

Map of C-Town and our Water Distribution Network. From Ginevra’s MA Thesis, April 2019.

The anomaly-detection algorithm collects data from the SCADA on the real-time behaviour of the system and checks them against its model of an ideal behaviour in a non-anomalous status (i.e., in this case, with no alterations due to cyber-attacks).

Currently, in a real-world implementation, the algorithm should have a dedicated visualization (on a dedicated screen?) for the operator to specifically monitor alerts coming from cyber-attacks.

Considering a typical, current work space organization where the operator is confronted with about 9 screens displaying information in real-time, our use case for sound does not need any further explanation…

A view of a typical control room, from Ginevra’s MA Thesis, April 2019.

A view of a typical control room, from Ginevra’s MA Thesis, April 2019.

Current visualization tools, from Ginevra’s MA Thesis, April 2019.

Current visualization tools, from Ginevra’s MA Thesis, April 2019.

But if we wanted, we could also notice that the current visualizations tools which are implemented in the SCADA systems seem not to make an effort towards being user-friendly or integrating any principle of data visualization design. Basically, they are simple, straightforward diagram representation which the operator learns to understand over the course of her career with too much fuss.

Concept development

With the second phase we started analyzing what part of the process should be represented through sound. What should sound say? And moreover, assuming that we never planned to fully eliminate the visual representation, what amount of information should we communicate through the auditory channel, taking it away from the visual one?

This brings me to introduce two main concept I am trying to develop in the doctoral research:

No sense is an island.

I.e., we should reach a point in data representation where we, as designers, are able to choose the right tool – the right sensory modality – to partially represent those elements of a data set that are better represented via that specific modality.

The main hypothesis here is that no sensory modality is the best for all aspects of all data. In this specific project, we never contested that to focus on analytical aspects of incoming data, to run check on historical data and to, in general, dig into numerical details of data values, visualization has an added value. There are tons of cases, literature and daily uses to prove it, and anyway we had no reason to claim the contrary. We therefore decided to use sound for the assets it presents in the specific use case, as I already explained here.

 How good is good enough?

I borrowed this second concept from my years in startups. The good enough status of a prototype is a widely accepted definition in lean methodologies used in startups’ prototype, development and marketing processes.

In other words, when launching a new product, under the frame of Lean Methodology, we do not have to wait for it to be completely finalized for all possible instances and applications, because the feedback we want to obtain from its introduction to the real world has not to be fully rounded, but rather “good enough” to move to the next step.

I claim that in many instances of data communication in everyday life sound should be regarded as a good enough tool, not a tool able to perfectly represent ALL variables and nuances of a data set in ANY possible circumstance. Should a new need arise during field validation, we can easily adjust the sonification experience to cater for that as well. I lament that many potential applications of sonification are never shipped to the real world precisely because, despite in theory able to communicate all possible information contained in a specific data set, including that already successfully communicated with visualization, they are still simply and paradoxically not good enough – they fail by hypertrophy of options.

During the concept development phase, we also listed all the design constraints we were going to take into account in the prototype design, including:

  • That data from the algorithm were conveyed to the operator only once every hour;

  • That the operators were busy with other tasks which might also include producing or receiving other sounds (phone calls, chatting) possibly masking our sound;

  • That they might not want to user headphones, or at least we were not sure about it, so sound should have been produced to be listened to via HP or speakers (therefor being audible by everybody in the room);

  • That our users might have or not musical competence or training;

  • That the amount of false alarms generated by the algorithm was pushing the operators to keep switched off and taking the risk of letting a real cyber-attack go unnoticed vs having to deal with official reports of false alarms very day: whatever our solution, we had to avoid complicating this issue, adding another potential source of false alarms!

 Prototyping

This is the phase where we also developed our main working hypothesis i.e., that we were going to develop a tool where sound could be a facilitator in finding, more easily and rapidly (i.e., more efficiently) analytical information on the visual representation. So, an aide, and not a substitute, to the visualization tools current in use. Aid justified by the specific working conditions in real time process monitoring.

Secondly, that sonification could limit the amount of false alarms by communicating to the operator not only the final and binary decision of the algorithm (cyber-attack YES/cyber-attack NO) but continuous data on anomalous behavior in order for the human to take autonomous decision integrating AI judgement with his experience.

 Prototype One:

 We decided to work with sounds designed to relate to their metaphorical and iconic value, along the lines of both Gaver and his earcons [quote] and Chion with his definition, among other listening modalities, of a semantic and a causal listening [quote Sound]. A reflection on how different types of listening modalities could be leveraged to obtain possible design guidelines for data sonification is something I will focus on in the coming months. An embodied approach to sonification was also, as it always is, a fundamental source of inspiration for me, but I am not sure this definition can be applied to the way we worked in this case.

 We designed a soundscape composed by two layers: on the background, we had a soundscape defining the component we are talking about: sounds designed to remind the listener of the sonic experience of a steel, big tank, a functioning pump, water running through a valve and so on. These sounds were played in a pre-determined, arbitrary order (first the tanks, then pumps then valves) and announced to the listener which component’s variable we were taking into account at that specific moment. The use of this iconic sounds was meant to trigger in the listener an involuntary (so without the need for learning or training) semantic connection between the sound and the meaning of the sound in the real world, making it easier to form the context for understanding data-related information and helping memorability.

The second layer, on the foreground, presented the behaviour of each component as captured by the data coming from the algorithm. Variables of this behaviour included the tank level, the value of water pressure in the pumps and valves, and the status (on/off) of pumps and valves. We tried different ways of using a dedicated alert sound representing the anomaly level for each component’s variable in the network, on a scale 0 to 5 that we defined ourselves starting from the dataset (i.e., there is no such scale algorithmically defined). In a first version, an alarm sound was added to the background soundscape of each component, i.e., the “tank alarm” would play during the background sound representing tanks, and so on. The alarm sound was scaled in pitch according to the anomaly level, from lower pitch = lower level of anomaly to higher pitch= higher level of anomaly. A second option used a digital audio distortion on the anomaly sound to represent the anomaly level, i.e., the alarm sound was progressively distorted to represent increased anomaly level.

These alarm sounds would play at the foreground of the component soundscape, and the order in which they played was mapped to an ideal clockwise movement West-East through the C-Town map .This was done to ideally help the operator to locate the affected component on the visual map for further investigation. So, for example, first would play the alarm sounds for all the tanks in District 1 (the Westernmost district of our C-Town), then all the tanks in District 2 and so on. The idea was that this additional info would reduce the time needed to find analytical information on the map as a secondary step, should the gravity of the alert require it.  

Results were disappointing. We realized fast we were trying to convey too much information to be efficient.

Also, the geographical information though interesting was very confusing and it very easy to forget at which point on the map you were while listening (one needed to remember the count from the first sound to located itself in the correct district). Also, the alarm sound had to play once for each component in a given district, so the system was hardly scalable – imagine a network with 5 districts and between 3 to 8 tanks in each district: on the background sound representing  tanks, the operator would hear, in order, 30+ sounds with different levels of anomaly. Besides clear issues on memory consumption and cognitive effort, from the mere point of sound design playing all these sounds would require a certain amount of time – probably more time than what’s required to the operator to focus her attention to the visual map in search of the same – or more detailed – information.

Also, the background sound designed to represent pump, tank and valve, despite the effort, sounded just not right. For one thing, in order to accomodate the alarm sounds for each component (for example, the 7 tanks) they had to last way longer than what they would last in a real world experience. In fact, you do not hear a “tank” resounding unless you hit it or touch it voluntarily to hear its reverberation – but this gestural sound would last a few seconds, not enough to display all the 7 alarm sounds for our 7 tanks. As for pumps and valves, a realistic sound of these elements ended up sounding repetitive and quite annoying, considering it had to be heard in a real work environment. Overall, the sound experience was uneasy and way too long for an operator to listen to it while keeping the focus on her attention on the other tasks.

Still, we submitted the prototype to a review by experts of information design, water management, and sound design to double check our feelings, and subsequently dismissed it without regrets.

How we moved on from here - or better went back to zero - and ended up with our second series of prototypes which we tested in the real world, will be the subject of the next post.