Authenticated Data Structures, Generically

This is a translation of my slides for a presentation on the paper, given at Cloudflare in June 2015.

Authenticated Data Structures, Generically Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi 2014

What is an ADS?

  • ADS: authenticated data structure. They are
    • A data structure
      • It's useful to what we're trying to do
      • The primary purpose is to do something with it
    • ... that is authenticated
      • A prover carries out some operation on the data
      • A verifier can check the prover's work

Why use an ADS?

  • Example: a PGP key server
    • Source wants to scale for reliability (or DDoS protection)
    • Client queries mirrors for key
  • Mirrors aren't necessarily trusted
    • What if they inject their own key for someone?
  • Clients:
    • Want to ensure the integrity of keys being sent
    • Want to ensure that the keys came from the source

What have people done with them?

  • Built a wide variety of data structures
    • sets
    • dictionaries
    • range queries
    • B-trees
  • Most use cryptographic hash functions
    • These are used for their collision resistance property
    • Collision resistance: it is astronomically unlikely that two inputs to the hash function hash to the same value
  • There are some optimisations that can be done
    • This paper doesn't cover most of them

λ●

  • Pronounced "lambda auth"

    "a ML-like functional programming language… with which one can program authenticated operations over any data structure defined by standard type constructors"

  • Implemented in the paper as an extension to the OCaml compiler.

  • Data structures written in λ● are compiled to code to be run by the prover and verifier.

Why implement ADSs via λ●?

  • Rely on the static typing properties of OCaml
    • A well-typed program generates correct and secure code for the prover and verifier
    • Security is the same as for cryptographic hashes
  • Authors propose two main benefits:
    1. It is flexible; the authors implemented a number of data structures this way.
    2. It is easy to use (if you're into the whole statically-typed functional programming thing)

Example ADS: Merkle Trees

  • Merkle trees are authenticated binary trees
  • Data is stored in leaves, not in nodes
Merkle tree figure from the paper.
  • The typical query looks up value at xi
  • Example: i = 1
    • Prover P returns x1 and the digests that are needed to prove the root digest
    • Verifier computes the hashes
      • h5 (from the leaf data)
      • the hash h2 (by hashing h4 and h5)
      • the hash h1 (by hashing h2 and h3)
  • The tree is balanced, so the size of the proof is log2n, where n is the number of elements.
    • Plugging in numbers: a SHA-256 hash is 32 bytes, so log 2 (4) → 2 * 32b = 64b
    • My PGP public key is 2215 bytes
  • λ● definition:
type tree = Tip of string | Bin of •tree × •tree
type bit = L | R
let rec fetch (idx:bit list) (t:•tree) : string =
  match idx, unauth t with
    | [], Tip a → a
    | L :: idx, Bin(l, ) → fetch idx l
    | R :: idx, Bin( ,r) → fetch idx r

How does λ● work?

λ● extends OCaml with authenticated types (●τ), and functions auth and unauth.

  • auth: ∀α.α → ●α
  • unauth: ∀α.●α → α

On the prover, ●τ is stored as a (τ, hash) pair; auth and unauth are used to generate proofs.

On the verifier, ●τ is stored as a hash; auth and unauth are used to check a proof 𝝿.

Final thoughts

  • ADS allow us to query data from an untrusted mirror so long as it originates from a trusted source

Haskell examples

Caveats:

  • I haven’t written Haskell in a while
  • AuthTypes is only an implementation of some core ideas; without compiler support, it’s difficult to do right
  • AuthTypes.hs: sketching out some of the ideas in the paper
  • Merkle.hs: a low-budget Merkle tree implementaiton

AuthTypes.hs

{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleInstances #-}

{-
This is an implementation in Haskell of basic generic authenticated
types, used as a demonstration of some of the ideas in the paper
"Authenticated Data Structures, Generically". It has many shortcomings,
namely that lacking compiler support for doing this makes it hard to
actually build an authenticated type.
-}

module AuthTypes (
    Proof
  , Authenticated
  , AuthT
  , Party
  , auth, unauth
) where

import qualified Crypto.Hash.SHA256 as SHA256
import Data.ByteString.Char8 (ByteString, append, concat, pack)

-- A proof contains either a value or a bytestring.
data Proof a = ProofValue a | ProofHash ByteString
               deriving (Show, Read)
-- I'm not entirely sure that's what the authors intended,
-- but I don't quite understand the use of the OCaml channels
-- that are used for proof streams.

-- digestGeneric uses the Show form of a type to hash it.
digestGeneric :: (Show a) => a -> ByteString
digestGeneric x = SHA256.hash . pack $ show x

-- An Authenticated type can be digested and can produce a shallow
-- projection from its contents.
class (Show a) => Authenticated a where
  digest  :: a -> ByteString
  shallow :: a -> Proof a

-- Realisations of the Authenticated type for strings, integers,
-- and lists of strings.
instance Authenticated String where
  digest  x = SHA256.hash $ pack x
  shallow x = ProofValue x

instance Authenticated Int where
  digest  = digestGeneric
  shallow = ProofValue

instance Authenticated [String] where
  digest  = digestGeneric
  shallow = ProofValue

-- A Prover stores an authenticated type as <v, h>, while
-- a verifier stores only the digest.
data AuthT a = Prover a ByteString | Verifier ByteString
               deriving (Show, Read)
-- The cryptohash package provides digests as ByteStrings.

-- The party is either a P(rover) or V(erifier).
data Party = P | V

-- auth creates a new authenticated type. Note this type signature in Go
-- would be 'func auth (x interface{}) AuthT', where AuthT would be an
-- interface.
--
-- We have to provide different functions for the prover (who stores
-- the pair (V, H)) and the verifier (who stores H).
auth :: (Authenticated a) => Party -> a -> AuthT a
auth P x = Prover x (digest x)
auth V x = Verifier (digest x)

-- unauth is the interesting part. This is the part that's not
-- right, as I still need to work out the implementation of a
-- proof stream.
unauth :: (Authenticated a) => [Proof a] -> AuthT a
                            -> ([Proof a], Maybe a)
unauth p (Prover v _) = (shallow v : p, Just v)

-- If there is no proof stream, nothing can be verified.
unauth [] (Verifier h) = ([], Nothing)
unauth [(ProofValue v)] (Verifier h) = case (digest v) == h of
  True  -> ([], Just v)
  False -> ([ProofValue v], Nothing)
unauth ((ProofValue v) : p) (Verifier h) =
  case unauth [(ProofValue v)] (Verifier h) of
    ([], Just x) -> (p, Just x)
    _            -> (p, Nothing)


-- Let's try to build a hash list.

data HashBlock a = Block ByteString a deriving (Show, Read)

getHash :: HashBlock a -> ByteString
getHash (Block h _) = h

getValue :: HashBlock a -> a
getValue (Block _ v) = v

hashBlocks :: [HashBlock a] -> ByteString
hashBlocks blocks = SHA256.hash $ Data.ByteString.Char8.concat hs
  where hs = map getHash blocks

data HashList a = HashList {rootHash :: ByteString
                           ,blockList :: [HashBlock a]}
                  deriving (Show, Read)

newHashList :: (Show a) => a -> HashList a
newHashList v = HashList root block
  where block = [(Block (digestGeneric v) v)]
        root  = hashBlocks block

pushBlock :: (Show a) => HashList a -> a -> HashList a
pushBlock hlst@(HashList _ blocks) v =
  HashList (hashBlocks blocks') blocks'
  where blocks' = blocks ++ [(Block (digestGeneric v) v)]

verifyBlock :: (Show a) => HashBlock a -> Bool
verifyBlock (Block h v) = h == digestGeneric v

verifyHashList :: (Show a) => HashList a -> Bool
verifyHashList (HashList root blocks) =
  root == hashBlocks blocks && (all id $ map verifyBlock blocks)

getBlock :: HashList a -> Int -> a
getBlock (HashList _ blocks) idx = getValue $ blocks !! idx

HashList.hs

{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleInstances #-}

{-
This is an implementation of hash lists [1] as an example of an
authenticated data structure. It does not use lambda auth.
[1] https://en.wikipedia.org/wiki/Hash_list
-}

module HashList (
  HashBlock, getHash, getValue
 ,HashList, newHashList, pushBlock, verifyHashList
 ,getBlock, getValues
) where

import qualified Crypto.Hash.SHA256 as SHA256
import Data.ByteString.Char8 (ByteString, append, concat, pack)

-- digestGeneric uses the Show form of a type to hash it.
digestGeneric :: (Show a) => a -> ByteString
digestGeneric x = SHA256.hash . pack $ show x

-- A HashBlock stores data alongside a hash.
data HashBlock a = Block ByteString a deriving (Show, Read)

-- getHash returns the hash from a block.
getHash :: HashBlock a -> ByteString
getHash (Block h _) = h

-- getValue returns the value from a block.
getValue :: HashBlock a -> a
getValue (Block _ v) = v

-- newBlock creates a block from a value.
newBlock :: (Show a) => a -> HashBlock a
newBlock v = Block (digestGeneric v) v

-- hashBlocks computes the hash of all the hashes in a list of blocks.
hashBlocks :: [HashBlock a] -> ByteString
hashBlocks blocks = SHA256.hash $ Data.ByteString.Char8.concat hs
  where hs = map getHash blocks

-- A HashList stores a root hash, which is the hash of all the hashes
-- in the hash list's blocks, and a list of hash blocks.
data HashList a = HashList {rootHash :: ByteString
                           ,blockList :: [HashBlock a]}
                  deriving (Show, Read)

-- initHashList creates a new hash list from a value.
initHashList :: (Show a) => a -> HashList a
initHashList v = HashList root block
  where block = [(Block (digestGeneric v) v)]
        root  = hashBlocks block

-- newHashList creates a new hash list from a list of values. In
-- effect, it converts an unauthenticated list to an authenticated
-- list.
newHashList :: (Show a) => [a] -> HashList a
newHashList values = HashList (hashBlocks blocks) blocks
  where blocks = map newBlock values

-- pushBlock adds a new block to the hash list.
pushBlock :: (Show a) => HashList a -> a -> HashList a
pushBlock hlst@(HashList _ blocks) v =
  HashList (hashBlocks blocks') blocks'
  where blocks' = blocks ++ [(Block (digestGeneric v) v)]

-- verifyBlock verifies the hash on a single block.
verifyBlock :: (Show a) => HashBlock a -> Bool
verifyBlock (Block h v) = h == digestGeneric v

-- verifyHashList verifies the blocks in the hash list, and verifies
-- that the root hash is valid.
verifyHashList :: (Show a) => HashList a -> Bool
verifyHashList (HashList root blocks) =
  root == hashBlocks blocks && (all id $ map verifyBlock blocks)

-- getBlock returns the value of the block at a given index.
getBlock :: HashList a -> Int -> a
getBlock (HashList _ blocks) idx = getValue $ blocks !! idx

-- getValues returns a list of the values in the hash list. In effect,
-- it converts the authenticated list to an unauthenticated list.
getValues :: HashList a -> [a]
getValues (HashList _ blocks) = map getValue blocks

-- getRoot returns the root hash of the hash list.
getRoot :: (Show a) => HashList a -> ByteString
getRoot (HashList root _) = root

instance (Show a) => Eq (HashList a) where
  a == b = ((verifyHashList a) && (verifyHashList b)) &&
getRoot a == getRoot b

Merkle.hs

{-
Low budget Merkle Tree implementation for a talk.
-}
module Merkle where

import qualified Crypto.Hash.SHA256 as SHA256
import Data.ByteString.Char8 (ByteString, append, pack)

data Tree = Leaf String | Node ByteString Tree Tree
            deriving (Show, Read)


-- Given h1, h2, produce the hash of both of them.
hashHashes :: ByteString -> ByteString -> ByteString
hashHashes h1 h2 = SHA256.hash $ append h1 h2

-- Return all the leaves in the tree.
getLeaves :: Tree -> [String]
getLeaves (Leaf s) = [s]
getLeaves (Node _ l r) = getLeaves l ++ getLeaves r

-- Produce a digest of the tree.
digest :: Tree ->  ByteString
digest (Leaf s)     = SHA256.hash $ pack s
digest (Node _ l r) =
  hashHashes (digest l) (digest r)

-- Convenience function for building a node from two strings.
joinLeaves :: String -> String -> Tree
joinLeaves a b =
  Node (digest tempNode) (Leaf a) (Leaf b)
  where tempNode = Node (pack "") (Leaf a) (Leaf b)

-- Produce a new node that contains the digest of both its trees.
joinNodes :: Tree -> Tree -> Tree
joinNodes l@(Node h1 l1 r1) r@(Node h2 l2 r2) =
  Node (digest tempNode) l r
  where tempNode = Node (pack "") l r
joinNodes (Leaf a) (Leaf b) = joinLeaves a b

-- Inefficient, ends up recomputing many of the digests.
-- Ensure that all the digests match up.
verifyTree :: Tree -> Bool
verifyTree (Leaf _) = True
verifyTree node@(Node h l r) =
  verifyTree l && verifyTree r && h == digest node