PKI

Certificate Lifecycle Management: From Issuance to Revocation

Anil K··8 min read
#pki#certificate-management#acme#ocsp#hsm#automation#certificate-transparency
Certificate Lifecycle Management: From Issuance to Revocation

Every org has a certificate graveyard

Walk into any organisation with more than a dozen TLS endpoints and you'll find it. Production services running on certificates issued four years ago by a CA that no longer exists. Owned by a team that got reorganised twice. Private keys sitting on a filesystem nobody's audited since the Obama administration. Renewals on someone's personal calendar — someone who left in 2022.

Certificate lifecycle management is unglamorous work. It is also not optional. A non-trivial fraction of the big public outages and breaches of the last decade trace back to this graveyard: LinkedIn 2026, Microsoft Teams 2020, Equifax 2017, the Let's Encrypt chain expiry of 2021 that knocked older Android devices offline. Every one of them had a root cause that sounds boring on a postmortem slide and very loud at 3 a.m.

This post is the lifecycle model I keep landing on, plus the tooling that makes it automatic instead of aspirational.


Six stages. Miss any of them and the whole thing rots.

1. Planning — decide before you request

Before you ever call a CA, answer four questions.

What names does it need to cover? Wildcard or explicit SANs? Wildcards feel operationally convenient, and they are — right up to the moment you need to revoke one, at which point you've just invalidated every service sharing it. I default to SAN-per-service for anything production-critical, and reserve wildcards for genuinely ephemeral infrastructure.

Which algorithm? ECDSA P-256. Every time, unless a legacy client forces your hand. 128-bit security, faster handshakes than RSA-2048, smaller signatures. RSA-4096 is slower than P-256 for no practical security gain; the people asking for it usually haven't done the math.

Which CA? Public CA for anything externally reachable. Internal CA for internal services. Mixing those is how you end up with production endpoints that some employees trust and others don't.

What validity period? The industry is marching toward 90 days — Let's Encrypt ships it as default, and the CA/Browser Forum's SC-081 ballot is driving the industry-wide cutover. Short validity forces automation and shrinks the blast radius of a key compromise. Resist the temptation to argue for a one-year cert because you "don't have automation yet." The automation is the win.

2. Key generation — never leaves, never unencrypted

The private key must be generated in the environment that will use it. It must never leave that environment in the clear. That's it. That's the rule.

# ECDSA P-256, the sane default
openssl ecparam -name prime256v1 -genkey -noout -out server.key

# Or with a passphrase if you've got automation-friendly secret management
openssl ecparam -name prime256v1 -genkey | openssl ec -aes256 -out server.key

For high-value keys — code signing, a CA private key, payment infrastructure — generate in an HSM. The key never exists as bytes in software memory. If it does, it isn't high-value key protection; it's a roleplay.

3. The CSR — proof of possession

The CSR carries the public key plus the Subject/SAN fields, signed by the matching private key. That signature is the "yes, I actually own this key" proof.

# Single-domain CSR
openssl req -new -key server.key -out server.csr \
  -subj "/CN=api.example.com/O=Example Corp/C=US"

# CSR with SANs (modern browsers ignore CN alone)
openssl req -new -key server.key -out server.csr \
  -subj "/CN=api.example.com" \
  -addext "subjectAltName=DNS:api.example.com,DNS:api-v2.example.com"

Eyeball it before you submit:

openssl req -text -noout -in server.csr

More than one production cert has been issued for the wrong name because no one ran that command.

4. Issuance — automate, or pay for it later

For public CAs, ACME is the only reasonable answer. Let's Encrypt, ZeroSSL, Google Trust Services — all speak it. Pick a client: certbot, acme.sh, or cert-manager if you're on Kubernetes.

# DNS-01 challenge works for wildcards and for internal hostnames
certbot certonly --dns-cloudflare \
  --dns-cloudflare-credentials ~/.secrets/cloudflare.ini \
  -d api.example.com -d "*.api.example.com"

For internal services, run your own PKI platform — HashiCorp Vault PKI, EJBCA, Smallstep CA — and issue very short-lived certs. Hourly. Daily, at most. Internal workloads should rotate constantly; there's literally no operational cost once the automation exists.

# Smallstep: a 24-hour cert
step ca certificate api.internal api.crt api.key \
  --ca-url https://ca.internal \
  --root /etc/step/certs/root_ca.crt \
  --not-after 24h

5. Deployment — the stage where security usually dies

Cert distribution is where I see the most accidental own-goals. The keys have to reach their consumers, and this is exactly where the good hygiene of the previous four stages gets thrown in the bin.

What not to do:

  • Email private keys as zip attachments. Unencrypted, logged, no delivery guarantee, and now your email provider has your key.
  • Check keys into Git. Even a private repo. Especially a private repo — you'll forget it's there.
  • Share keys between environments. A compromised staging key is a compromised prod key. They are not separate secrets if you used the same bytes.

What actually works:

  • A secrets manager — Vault, AWS Secrets Manager, Azure Key Vault — as the authoritative store.
  • Certificate delivery via ACME or the CA's API. The consumer fetches its own cert instead of being handed one. Pull, not push.
  • On Kubernetes: cert-manager issues certs into Secret objects and pods mount them as volumes. Once wired up, it disappears from your to-do list.

6. Renewal and revocation — the part everyone underestimates

Renewal has one rule: automatic, and early. Renew at two-thirds of validity. For a 90-day cert, you're renewing at day 60. If the renewal fails, you still have 30 days to notice.

# certbot cron — twice daily, only acts if < 30 days remain
0 0,12 * * * certbot renew --quiet --post-hook "systemctl reload nginx"

Revocation is where PKI quietly admits that the real world defeated the elegant design.

Two mechanisms exist:

  • CRL: a signed list of revoked serial numbers published by the CA. Downloaded by clients, sometimes megabytes in size, often hours stale.
  • OCSP: a real-time query to the CA for a specific cert's status.

Both are, honestly, broken. CRLs are too big and too late. OCSP is a privacy leak — the CA learns which sites you visit — and browsers soft-fail it: if the OCSP responder isn't reachable, most browsers quietly accept the cert anyway. Which is the exact behaviour an attacker with a revoked cert would want.

OCSP stapling is the pragmatic workaround. Your server fetches its own OCSP response, caches it, and staples it to every TLS handshake. The client gets revocation status with zero extra round-trips and no privacy leak. Configure it once and stop thinking about it:

ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;

You cannot manage what you can't see

A certificate inventory that answers four questions:

  • What certs exist?
  • Where are they deployed?
  • When do they expire?
  • Who owns them?

Three ways to build one, and you probably need all three:

  1. Active scanning. nmap's ssl-cert script or sslscan across your IP ranges. Catches shadow IT and the server someone span up in 2019 and forgot.
  2. CT log monitoring. Certificate Transparency logs record every publicly-trusted cert ever issued. Tools like certspotter or crt.sh alert when a new cert for your domain appears — including certs issued by an attacker using a mis-issued one. Subscribe to this. Today.
  3. Secrets-manager inventory. If every cert lives in Vault or AWS Secrets Manager (see stage 5), the inventory is a single API call away.

Dashboard it by days-until-expiry. Page at 30 days. Escalate at 14. Block deploys at 7. If none of those fire, you found out about expiry from a customer. Don't find out about expiry from a customer.


Where you are, and where you want to be

Level Looks like
0 Manual issuance, no inventory, expiry discovered via outage
1 Spreadsheet inventory, manual calendar reminders
2 Automated renewal via ACME, basic CT monitoring
3 Internal PKI with short-lived certs, secrets-manager integration
4 Full SPIFFE/SPIRE workload identity, zero manual cert operations

Most organisations I walk into sit at Level 1, sometimes creatively pretending to be at Level 2. Level 3 is a one-quarter project if you commit to it. Level 4 is the goal if your security posture genuinely demands it — it's real work, and it's worth it.

Certificates are infrastructure, not artefacts. Treat them like code you ship: versioned, automated, monitored, disposable. The graveyard stops growing the day you do.

Found this useful? Give it a like.

Stay in the loop

New articles on AI, Cybersecurity, and PKI — delivered to your inbox.