There's a few scenarios where I imagined this approach being useful:
* If you have any kind of remote dependency in your SSH auth flow (LDAP, or an online CA, or automated Ansible playbooks to push keys), any of those might fail and render the host otherwise inaccessible.
* It's becoming more common to not ever SSH into machines. So, what if emergency SSH access is the only way to access a host? Some companies even go a few steps further: When a host is SSH'd into, it is considered "tainted by humans", is quarantined and eventually shut down.
* Some hosts should never allow root access to anyone. For example, there's no reason for anyone to have root on a bastion host. So, what if the only way to get root on some hosts is with the emergency key?
While you could use the cloud VM console for emergency access in these cases, having a hardware key provides even more security and would let you turn off cloud VM access.
Of course if you broke your SSHD config, or have a network issue that prevents you from reaching the host, this won't magically fix any of that. IPMI is good for that though.
> While you could use the cloud VM console for emergency access in these cases, having a hardware key provides even more security and would let you turn off cloud VM access.
I'm not sure it's more secure, but I suppose it depends on the provider. Your control of your account's admin key (or password) is the last bastion of security for most providers.
> Of course if you broke your SSHD config, or have a network issue that prevents you from reaching the host, this won't magically fix any of that. IPMI is good for that though.
This is why I just use the providers' emergency management (or IPMI). Easier to have one method of emergency access that always works regardless of the guest. The guest's root (or emergency) account can still have a pretty darned complex password.
> It's becoming more common to not ever SSH into machines
This is a reality for me. At work we run a handful of distributed clusters, if anyone does an equivalent of sshing into a box and poking around (in our case, `kubectl exec`), the infrastructure team gets an alert, then follows up with whoever invoked the command. If they are doing debugging, we shift whatever resources they need into dev. If they are not debugging, they will probably get questioned by their boss. (fortunately, most of the time this chat results in, "oh wow I didn't know about the APM/Metrics/Graphs/Logs/etc setup we had, I'll check that next time)
IPMI is painfully insecure, and therefore assumes the existence of a completely separate, protected network. Some people don't colocate more than a few machines (and therefore can't justify the extra infrastructure for an IPMI OOB network), don't want to pay extra for a colo provider to provide IPMI OOB, and/or don't trust their colo provider to have access to such a sensitive and insecure thing.
Having an emergency method to connect is an excellent idea.
I think it depends. I've worked in places that had something like the following setup.
- Hardware in datacenters with operators who were not experts on the applications running.
- All remote access was done using a short term (~1 day) ssh keys. There was an authentication service to generate these.
It was pretty easy to imagine that the authentication service would go down. In this case a selection of people who worked on the infrastructure had longer-term keys on HSMs. (With very high logging and alerting for any use). It would actually make sense for these to be CA keys so that they could access different user accounts or similar.
TL;DR you are assuming a very basic SSH auth setup. As the regular setup gets more complicated having something like this as a backup makes sense.
It’s really not - by limiting the life of keys, and having a service generating them, you can more effectively lock things down when someone leaves, rather than going round revoking keys from servers. Something we’re experimenting with at work is AWS Instance Connect, which uses your AWS credentials to push a key to a target instance with 1 minute validity - no more managing keys on instances, and revoking access is just a change to an IAM policy.
As opposed to having a few bastion-hosts, and requiring people to log in there in order to then ssh on to their final destinations -- in that case, revoking their keys is as simple as wiping their accounts on the bastion hosts.
This is REALLY helpful on devices like raspberry pi, where they may stay shutdown / offline for years. The minute they're powered up again they'll get my fresh keys and I can login to them without needing a console.
Neat - something I feel that often gets overlooked in most SAAS systems (think internal side) be it customer service, ops, etc tooling is break the glass escalation functionality. Most systems I’ve seen in the wild completely lack this and will result in over provisioning of admin “god mode” accounts.
NoodlesUK points out alerting which is a pretty important concept to incorporate.
Largely a solved concept in Electronic Medical Records & as outlined in the post.
I think another thing we might want to learn about is how to sound the alarm when the break glass is used. Is there an easy way of doing that with SSH? Running a command to page the ops/security team when a server receives a login attempt with an emergency credential?
You can physically put the (yubikey) device in a vault that will physically sound an alarm when opened. It could also have a battery-powered arduino inside the box (with SIM breakout) that texted the devops team when opened.
Overcomplex technical solution to a simple problem.
Besides which, if you really want to go full-on with technically clever solutions, keep in mind you could ensure no cellular service prior to opening. But then we're just getting into the realms of silly situations.
Hey there, I wrote this post. It's a great question.
One benefit of using certificates for emergency access is that SSHD logging can be configured to show a lot more detail about the certificate that was used. With public keys, there isn't anything to show. But with certificates you have a key ID, serial number, principals, CA fingerprint, etc. So, that log is a good hook for sounding the alarm. A more advanced version of this would allow you to record a reason for using the emergency access key when the connection is made (or when sudo is used).
I don't know what it looks like for a certificated system, but syslog records the private key used for login in a fairly vanilla Debian. If you worry about things like that and aren't looking at physical access (as suggested elsewhere), you presumably have remote syslog and audit which you can check.
What a coincidence, 3 days ago I ordered two pieces of yubikey 5, today arrived a package and today I read a post on how to use them in an interesting way for emergency access to my server via SSH.
I'd like to add that the way it's described really works.
But... Now I don't know to leave one yubikey in case I need to use it for emergency access to ssh? I have a server since 2011 and I have never problems with access through ssh, I use the same keys to this day and everything works.
I think this way with yubikey to emergency access is overkill.
If you need to the option to give someone temporary access it seems like a good option. I don't think it would add anything to my personal stuff since there's no reason I can think of to give someone else access. At work definitely.
Right, this is more about a cryptographic grant of temporary emergency access to someone who doesn't have a user account or admin keys already on a machine (and ideally nobody should have persistent admin access in a well-oiled production setting) in the event that existing access control mechanisms have failed. And backing the signing operations by a YubiKey lets you physically secure the key in ways that you wouldn't an entire laptop and provides all the benefits of tamper resistant, proximity aware, hardware. Probably not something most people will want or need to bother with for personal stuff, but very reasonable expectation as soon as you're working on a team or managing many hosts, etc.
I'm very confused, given Yubikeys have smart card fuctionality and they can be used by gpg-agent to SSH with the regular gpg key (you can add to authorized_keys just like any other keys) and you don't have to go through this whole mess of setup to create a CA and install it.
It's a chicken and egg problem: if you can't SSH into the machine, how do you add your key to the SSH config on the target machine?
You could use a very long lived key, but then as soon as you have multiple people who might need production SSH access, you've got access control and revocation issues. The SSH CA is a good minimal solution, because the CA can issue only short-lived SSH keys (few hours at a time) that you use once and throw away. Also, CA trust scales better because it moves user management burden to the certificate issuing process and removes the need to modify the SSH config every time you onboard a new user.
This can also work as a solution where the “setup” (of trusting CA) is baked in to the image. Then there is no ssh related setup until the day you actually need to ssh to the host. And you get the guarantee that no ssh login can happen until you issue a temp-pair.
This is actually quite useful for deploying clusters of machines that one doesn’t want normal ssh access until there is a real need. I think this was also mentioned in another comment
That sounds like a great option too, depending on your situation.
One difference is that the CA is on the hardware key, but the cert (and its private key) is not.
Imagine you're on a team of 50, and anyone on the team might need emergency access to a host at some point. You wouldn't want to buy 50 keys and 50 safes. Just designate a couple folks to manage emergency access. They can manually mint a cert for a colleague as needed, and send it over a secure channel. No security key needed to use the cert, and it self-destructs after a few minutes.
Right, you'd have to look up the switches in the manpage if you don't remember them, but that's already the case with the generation portion, which is why the post includes the switches for that. I'm just saying it could have included the inspect switches too.
We actually plan to update the post to demonstrate doing it entirely with the `step` tool. We just want to do a pass on the UX to make sure it is as easy an foolproof as possible before bringing more attention to it.
This is a pedantic detail, but if you're trying to implement this system, it does matter: "resident key" is not a required feature here. You're not using the hardware token for its WebAuthn capability, you're using it for its smart card capability.
You just need PKCS11 token support for SSH, which the YubiKey's smart card capability can do. YubiKey 4 and YubiKey FIPS can both do it, and so can regular old smart cards even though that form factor is a lot less popular now.
The workflow is the same: generate a key pair on the hardware token, have the CA sign it, install the signed cert onto the hardware token, and then SSH with it.
I'm not sure of the exact scenario but I would just note that there are other types of computing environments than virtual machines. For example, there a physical machines, sometimes hosted in a colo where you have no employees on the ground.
Surely you'd have some sort of remote KVM in such cases (like IPMI, as mentioned in another comment). That's critical in the clusters I've run and, of course, the manufacturers' implementation of that critical functionality in IPMI is likely to be rubbish and you can't get it fixed...