top of page

Things I wish I knew about Cloud (Security) before I became a CISO | Part 1: What’s all this cloud stuff anyway?

  • Autorenbild: Anton Horn
    Anton Horn
  • 19. Dez. 2025
  • 9 Min. Lesezeit

This article is part 1 in a 3-part series where I break down the issues I encountered in understanding public cloud environments and how to secure them. I went through this learning journey while I started my CISO job at Allianz Direct, a German insurance company that was doing direct to consumer business worth over 1bn € annually and running exclusively on public cloud infrastructure.

This part focuses on how the cloud is fundamentally different from traditional on-premise environments and which traditional IT concepts I had to re-learn.

In part 2, I will write about the differences in security concepts and security technology in the cloud.

In part 3, I will explain what I would do now if I was made responsible for securing the cloud assets of a larger organization — both immediately and which processes I would set up to secure it long-term.

I try to stick to concepts and abstractions of the relevant technology portions. Obviously all of this circles around tech but I’m trying to stay away from going too deep into it.

I thought primarily of the following groups of people when I wrote this:

  • Security GRC Professionals

  • Auditors

  • “Non-technical” Security Executives

  • Other tech professionals who want to understand how GRC people think about the Cloud

Disclaimer: I’m assuming you already know about the shared responsibility model in the cloud and I’m not focusing much on SaaS specific concepts or privacy aspects of Cloud usage. Also when I talk about Cloud in this article I generally mean the three hyperscalers Amazon Web Services (AWS), Microsoft Azure and Google Cloud Project (GCP).


The Cloud matters because it solves two business problems

To understand the Public Cloud, it’s first important to understand what business problems it solves. And while there are probably quite a few, I believe the following two are the most important ones:

  1. Companies can quicker respond to market demands

  2. IT costs become variable costs

Companies can quicker respond to market demands

Especially when it comes to technology products, customers (private and enterprise) are more and more used to receive products that meet their exact demands.

A logical conclusion here is that the faster you can respond to customer and market demands, the more you outpace your competitors. I don’t believe in quotes that much, so I’ll spare you whatever interpretation Marc Andreesen or Jeff Bezos have on the topic.

I believe it’s self-evident though: Faster iteration give companies an edge to 1) increase the gap if they’re already ahead of the competition, and 2) learn from mistakes and adjust their strategy accordingly.

In the past (and in some companies still the present) making changes in technology took a lot of time — some companies are able to apply changes only once or twice a year to their core technology platforms. Along with this very low change frequency comes incredible pain with each change.

IT costs become variable costs

In the olden days large companies had to order IT infrastructure well in advance and based on some kind of prediction of how much they would consume in the coming months or years. It’s another factor that makes fast iteration almost impossible, unless you order a lot of spare capacity. Ordering extra servers just because you might acquire more customers, launch a new product line, integrate a company you purchased etc. is of course very inefficient use of capital. Instead that money could be used for additional marketing, R&D or funding a super critical security imitative.

So what if I could only consume my technology infrastructure as I go. Like a rental car, I only pay for the size (e.g. Mercedes S-Class vs. Fiat 500) and the volume (one afternoon vs. 4 weeks) I actually consume.

This would give companies incredible flexibility in how they could allocate capital. Also going back to the quick iteration, I can put additional resources into a marketing initiative that works incredibly well and could reduce the technology spend in an R&D initiative that clearly doesn’t deliver on it’s promises.

This is where the Cloud comes in

The cloud offers solution to both problems:

  • You can iterate much faster and reliably through the Distributed, Ephemeral and Immutable nature of resources

  • You can scale your technology spend up and down through the elasticity of the cloud

I will now dive a bit deeper into both concepts, how they’re implemented in the cloud and touch a bit on how this affects security.

Resources are Distributed, Ephemeral and Immutable

This concept is probably the biggest brain-twist that I had to get through — if you understand the distributed, ephemeral and immutable nature of cloud assets, you make a huge leap to understanding how the cloud works in general and why it’s different.

One of the major use-cases in the public cloud is the deployment of applications — let’s say a web application like a retail banking system.

The banking application would consist of multiple micro-services, being orchestrated through Kubernetes clusters (more on what the hell that is below). The fact that it consists of multiple services which are running on multiple compute instances (e.g. virtual machines) already makes it distributed (i.e. no single point of failure). The underlying cloud infrastructure including networking and data storage is also redundant, making it much more resilient to failures.

These microservices can sometimes exist for very brief periods of time (minutes or hours), making them ephemeral (i.e. short-lived). Whenever a new version gets deployed, the old one just gets shut down and the newer version gets spun up (this all happens without a noticeable downtime to the banking customers).

This ephemeral nature is supported by the fact that the application does not store any state information (e.g. customer data or logs) on the running system itself. Instead the system is built from an immutable (i.e. unchanging) image. If I want to change something in my banking app (e.g. deploy a new feature like immediate wire transfers), I build a new immutable image and deploy that image to production.

(There is a whole section I could write here about Continuous Integration / Continuous Deployment “CI/CD” but that would get a bit too deep — maybe in another article…)

Kubernetes is a system that implements all three of these requirements for you. It manages applications that are deployed from immutable container images, shuts them down / starts them up again and manages them in a distributed fashion for you. For a more detailed explanation of what Kubernetes, Containers, Docker etc. mean, check out this article from Microsoft: Demystifying containers, Docker, and Kubernetes — Microsoft Open Source Blog.

Kubernetes can melt your brain a bit but understanding the business value of these technologies helps you grasp why everyone is so obsessed with this cloud stuff.

The cool thing about an environment that is actually distributed, ephemeral and immutable is that you get a couple of security benefits that 1) just make your security program much more effective, and2) have significant business value, which should make them very valuable for you as a security practitioner as well:

  • Patching of systems is never delayed by maintenance windows or reliability concerns. If you deploy a few hundred or a few thousand times a month, updating a system to fix a vulnerability is never pushed to a later sprint because the system is too unreliable.

    It will definitely still get pushed because of conflicting business vs. security priorities to build the patch but that’s expected and what we’re being paid for as security professionals.

    As an example: from first hearing of log4shell to a full patch of all 100+ Java services running in our environment, it took us less than 48 hours without a minute of downtime for our customer-facing systems.

  • Attackers’ making themselves comfortable on a system for prolonged periods of time becomes a bit more difficult for them if the system is shut off and started up all the time. Depending on the attack vector and exploits used, this doesn’t mean it’s impossible to gain persistence in the cloud but it’s at least a bit harder.

  • Someone tripping over a cable in a data center will not cause outages to your systems. Even if someone did that in an AWS data center, there is so much built-in redundancy that you’ll probably never notice. Another great thing about distributed systems: if one part of it fails it does not automatically mean that the rest is failing as well.

  • You generally don’t have issues caused by admins SSH-ing into a production environment and causing some kind of system outage because they’re accidentally running a command in the wrong terminal window. If every system is deployed based on an immutable image, the change is tested in a pre-production environment first and then deployed through a reliable CI/CD pipeline. No one is logging directly into that critical system to update something or make a configuration change.


Elasticity is probably the single greatest business benefits

The great thing about the Cloud is that it’s elastic. That means you don’t have to order new compute, storage or networking hardware. You just use it as you need and shut it down again when you don’t (but don’t forget to actually shut it down — your finance team will thank you).

Not using the elasticity of resources will make using the Cloud somewhat pointless. If you know exactly how much resources you need, you can probably achieve some things significantly cheaper in an on-premise data center.

The interesting security challenge here is that this elasticity can be exploited by attackers. Two examples, one being very well known, one I heard about recently and found quite interesting:

  1. Crypto mining: Attackers will simply launch a very powerful and expensive compute instance and start mining Bitcoin or other cryptocurrency. They can almost directly monetize this attack, as opposed to data theft or ransomware. So a very attractive option for them.

  2. Monetizing AI APIs: Attackers can launch LLM APIs (like OpenAI ChatGPT in Azure) that incur cost based on token consumption and sell access to this API via the dark web. Slightly more difficult to monetize but also more difficult to the detect.

I think you get the idea — stuff in the cloud can become very big and very expensive, which is very interesting for hackers to make money.

Everything is Code!

The configuration of your entire infrastructure can and should be represented as code. The term used here is “Infrastructure as Code” and the system most widely used for it is called “Terraform”.

The benefit of defining everything as Code is not that you can make it more complicated for non-technical people to understand. The benefit is that you get to apply all the governance and control processes that software engineers have implemented over the past decades. This includes:

  • Fully traceable versioning — every change in the infrastructure can be traced back to a change in the code and you can revert back to any previous version quite easily.

  • Disaster Recovery — due to this ability to revert back to a previous version in a standardized and non-emergency process, recovering from an outage caused by a bad code change is incredibly easy and painless.

  • Fully auditable — every change is tied to a specific user with access to the version control system. You also know who reviewed the code change and find out what the hell LGTM stands for (and then you need to decide if you want to know if the reviewer really found no issues with the code or they just wanted to go to lunch 😬…)

  • Testable — you can run automated tests for every change, as well as for the full codebase as it is currently deployed. This doesn’t just include tests of the code functionality but also security tests like Static Application Security Testing. This gets really interesting when you consider that you can run security tests on infrastructure changes. So if someone wants to deploy a database and decides to make it public to the open internet, you would probably want to have an alert at least asking them if they’re really sure they want to do that. In non-software defined infrastructure this would be much more difficult to achieve.

It probably helps to understand how Git works to understand what Governance measures are programmed into your deployment processes — both for your infrastructure and for your application code. Git is one of those systems that can mess with your head if you dive too deep into it but on a surface level it’s pretty easy to grasp.

Cloud ≠ Cloud Native!

You can put a lot of stuff on the cloud but that doesn’t make it cloud-native. The practice of throwing the same applications and databases you had in your on-premise environment on the cloud without really changing anything about them is called “lift and shift” and it’s definitely not a recommended use of Public Cloud resources.

If you use the Public Cloud like a rented data center you’re shooting yourself in the foot. Just from a security standpoint, you have to take care of all the additional attack surface that the cloud brings but you don’t offload a lot of the old attack surface to the Cloud Service Provider. And it’s most likely more expensive than running the legacy stuff on premise.

“Cloud-native” means that resources make full use of the distributed, ephemeral and immutable nature of the cloud, as well as its elasticity.

This is specifically implemented through applications being deployed as containers or serverless functions. These types of resources behave vastly different compared to virtual machines deployed in on premise data centers.

That’s it for the first iteration this series. In the next article, I will talk about how I had to understand security concepts and security technology differently after taking on my CISO role of a cloud-native company.

If you want to dive deeper into how the Cloud works in general, I would recommend reading “Cloud Strategy: A Decision-based Approach to Successful Cloud Migration” from Gregor Hohpe (also formerly with Allianz as their Chief Architect)


 
 
 

Kommentare


bottom of page