Interest in research collaboration

Hi,

I’m David Chu, a 4th year PhD student at UC Berkeley. I’ve been looking into protections against rollback attacks in TEEs and am trying to create a generic rollback resistant solution called Rollbaccine, in collaboration with the Microsoft Confidential Computing team and Intel.

Here’s a quick summary of the research problem and our proposed solution:

  • Secure hardware like SGX or SEV-SNP do not inherently protect the confidentiality or integrity of data on disk. Confidentiality is easy to guarantee with encryption, but integrity can be lost if the application crashes and the disk is rolled back to an earlier state. This is a rollback attack.
  • Traditionally, rollback attacks are detected by replicating a hash or counter of some core part of application state on 2f+1 TEEs. Crucially this does not allow the application to recover from a rollback, only to detect one. It’s also a manual approach; the application developer has to specify what they want to be protected against the attacks, because creating this hash/counter on each state update, and checking against it on each state read, increases latency.
  • With the rise of Confidential VMs, application developers want to lift-and-shift their programs into TEEs (and ideally not worry about rollback attacks). They don’t want to have to modify their applications.
  • We want to create a rollback resistant solution that replicates disk automatically. By replicating disk, we can recover from rollbacks (instead of just detecting them) and also do not require the application to be modified (we can intercept reads and writes). We prototyped with FUSE, but saw pretty high latency and throughput overheads. In addition, replication became too complex with multithreaded writes.

We’re currently working on a small device mapper that performs disk replication, encryption, integrity checking, and leader election. We have no experience in kernel modules so we’re basing our implementation on what we can find in other repositories, and we’re making a lot of blind decisions.

I was recently recommended by some folks from CNCF to contact projects in the Linux Foundation to ask for assistance. I know DRBD has many components that we need, and we think integrating Rollbaccine into DRBD (or forking DRBD) in single-primary mode would be a good idea.

Here’s (what I believe) are the main things we need to add to DRBD:

  1. A midpoint between the Asynchronous replication protocol and the Memory synchronous replication protocol. Our pitch is that because an application could crash mid-write and recover expecting the write to not have completed, the application semantics allow us to lose the write. Only on fsync do we wait for all nodes to have the write in their memories.
  2. An integrity map, which pages to disk. Encryption and integrity should be guaranteed if we layer dm-crypt with AEAD. Instead of using dm-integrity, which puts the hashes of blocks on disk (which are still vulnerable to rollback attacks), we have to keep it in memory.
  3. Reconfiguration. If a primary node fails, the user may want to start up another TEE to join the quorum. I’m not sure if this is already in DRBD.

I’m curious what the maintainers of DRBD think about this project and potentially collaborating. We’ll ask no more than code reviews on pull requests (or not, if this is better as a fork) and occasional kernel questions; we can implement the revisions on our own.

I think given each cloud provider’s embrace of TEEs, it could see DRBD get more use. I understand that it’s also a lot of extra burden and you have limited resources and existing projects, so it’s perfectly understandable if you’re not interested in collaborating. In that case, let me know if you know someone who might be able to give more guidance on implementing device mappers.

Thanks for reading!

1 Like