Building an AI Assistant for Locked Shields


Today is the last day of the world’s largest and most complex international live-fire cyber defence exercise known as Locked Shields and hosted annually by the Nato Cooperative Cyber Defence Centre of Excellence (CCDCOE). Locked Shields is a real time cyber exercise with the focus on protecting the systems our societies rely on to function. As the center states “It simulates a realistic, large-scale live-fire cyber conflict, testing participants’ technical, operational, and strategic capabilities alongside decision-making under pressure… fostering a wartime mentality that compels teams to think quickly, adapt to unexpected threats, and collaborate effectively.”

The exercise brings together approximately 4000 participants from over 40 countries. It takes place over several days where the blue teams, from both Nato allies and partners as well as likeminded countries, protect the vital services and critical infrastructure that keeps societies running. All the while the red teams attack these systems with authentic attacks and in real-life scenarios. There are many other teams as well making the scope of the exercise truly awe inspiring and hundreds to thousands of volunteers making this event the world leading cyber exercise it is. The focus of it is to prepare for and practice different kinds of scenarios, and effectively communicating in them to enable efficient decision making when it really matters. Helheim Labs is proud to participate in this prestigious cyber exercise for the second year in a row.

AI assistance in a rapidly changing environment

Our role has been to develop, pilot, and operate a local AI evaluation assistant system. We check the blue teams submissions against the guidelines to check for compliance and effective communication. The AI is tasked with focusing on where the teams have followed the guidelines as well as where they have deviated from them. We also check for readability and ability of the report to support effective decision making in a crisis situation. Last year we piloted the system and this year we made it faster and better integrated into the exercise enabling it to support the desired function with very little delay. Minimizing the delay is vital as it means the blue teams can get the feedback faster.

In this years implementation, we use an open source model from the Mistral family of models. This way we have full control where the data flowing through our systems is located at all times. This is crucial for an event such as this. All our hosting happens within our own infrastructure with the ability to collaborate effectively with agreed upon providers. Additionally, we aim for an environmentally aware AI system using repurposed hardware and as small a model as functions effectively.

The challenge of generative AI, especially in a rapidly changing environment like this, is it’s propensity to hallucinate. This is what is called when the AI produces content that seems legitimate while being wrong. This is a feature of generative AI models due to them being developed to produce human-like communication based on the probabilities found in their training data and it affects all generative AI models. As the CEO and owner of Helheim Labs Satu states, they are not built to produce the truth but instead effective communication. Additionally, as language models are focused on producing realistic language, they lack the depth of true subject matter expertise and ability of humans to understand changing contexts. We are privileged to work with people with inspiring amounts of knowledge.

The Role of AI is always crucial

Our AI system has received good feedback as is demonstrated by us doing this for the second year already. However, it is not 100% accurate at all times, as no AI system is. “If anyone promises you a perfect AI system that never fails, it is time to start asking a lot of pointed questions”, Satu states. Therefore, with AI, it is crucial to scope it properly and define its role in a way that accounts for both the strengths and weaknesses of this technology, and supports the human making the decisions.

The AI system developed for Locked Shields uses AI as an assistant to the subject matter experts enabling them to get the feedback to the blue teams faster. “We give the AI output to the subject matter experts so they can use the insightful parts while ignoring the parts that don’t serve the blue team in this rapidly changing environment”. The exercise is never the same. There are always changes in the scenarios, the countries, the challenges, and, generally, what the teams face. In this kind of a case full automation would not be a responsible use of AI, Satu states.

The assistant is a part of a R&D effort by Helheim Labs to evaluate the ability for AI to bring value in this kind of a changing environment, where accuracy and security are vital, and the need is to support the human expert. Last year we piloted the system and got good feedback. This year we took the pilot, completely reworked it with the viewpoint of fast and effective operations. We got it working so well the AI evaluation for a phase is done only just a 5-7 minutes after the submission time ends.

A delight and a privilege

As the event, and our contribution for LS26, is drawing to a close for this year, we want to thank the organizers and fellow volunteers once again for organizing something unique and enabling us research, develop, and increase the understanding of the places and ways AI can and should be used responsibly. As a new and small company, I am truly proud to say we created an AI system that was successfully used in the biggest real time cyber exercise in the world.

If you want to hear more about Helheim Labs, or are interested in collaborating with us to build safe and responsible AI systems, and lets talk. We specialize in AI targets and assistants for cyber competitions and have been to quite a few cyber conferences as well as Crossed Swords 2025.


Leave a Reply

Your email address will not be published. Required fields are marked *