The Resurrection of LLL: Part III

In part 2 of this series I showed how to install the LLL compiler, lllc. I also discussed the dispatcher's supporting source files, constants.lll and utilities.lll. In this article I'll finish exploring utilities.lll and start examining the dispatcher itself.

Modifiers

Continuing on in utilities.lll we see a few more modifiers, all with the same purpose: to decide whether execution should continue or not based on testing a particular state. For example, contract-enabled checks the "enabled" flag for the current contract and refuses entry if it's false. More than one modifier can be applied to a function:

(when (= function-id replace)
  (seq only-owner no-contract-address ...

This applies two modifiers to the replace(address) function. The first restricts function execution to the contract's owner, and the second checks that a contract address was provided. Modifiers are a convenient way to make execution decisions without cluttering up your source with repeated code sequences. Again though, keep in mind that these macros are placed into your code on compilation, not called as a function on execution. Another point to remember: modifiers are executed in the order you place them in your source. In the example above, the only-owner modifier would run, then the no-contract-address modifier.

Functions

Well, they're "functions" in a loose sense. As stated repeatedly, macros aren't called as functions and can't return values. But it's convenient to view them as functions in certain circumstances. Currently, utilities.lll contains five macros labeled as functions. You can easily add your own as it becomes necessary.

bytes4

The first function takes a 32-byte value and isolates the four leftmost bytes. This is done by shifting the value right by 28 bytes, accomplished by dividing the number by 2224. You may wonder why you would ever want to do this. It's part of the way contract functions are called by external applications. See the Ethereum Contract ABI documentation for examples. I'll be going into this further when I discuss the dispatcher itself. For now, just know that this is a necessary action in order to work with web3.

So to summarize, given input that looks like this:

0xdebf9dd297d3980055770d6c047a67eed8f6b1e91397831703b5f2a73bec1320  

bytes4 produces this:

0x00000000000000000000000000000000000000000000000000000000debf9dd2  

I called it bytes4 because Solidity has an operator that does something similar called... wait for it... bytes4.

pad-right

The second function pads the input so it's the leftmost four bytes of the result. This does the inverse of bytes4: it shifts a given value left by 28 bytes, accomplished by multiplying the number by 2224. Again, this has to do with how functions are called. This time we want to call another function, so we need to put the called function's ID into the leftmost four bytes of a 32-byte number. From the examples, the function ID is derived as the first four bytes of the Keccak hash of the ASCII form of the signature. More on this later. For example, given this:

0x00000000000000000000000000000000000000000000000000000000cabfb934  

(defined in constants.lll as simply 0xcabfb934 but actually stored as the number above), pad-right produces this:

0xcabfb93400000000000000000000000000000000000000000000000000000000  

function-id

The third function exists entirely to promote code clarity. Instead of seeing (bytes4 (calldataload 0)) strewn through your code, you instead see function-id. This of course inserts the preceding line into the code at the point function-id was used. The code itself extracts the requested function's ID from the first four bytes of the call data at position zero.

return-size

The fourth function is again intended to clarify code, but it's a bit more involved than function-id. return-size retrieves the data return size for a given function ID. Each function in a contract can return data of varying length, including zero. In order to specify the correct length of return data to the calling contract, each function's return data length is stored upon the contract's initialization. The location in which it is stored is derived from the contract's address and the function's ID. These are added together to form the location at which the return length data is stored. Here's an example from an initialization function:

(sstore (+ @@contract-address replace) 32)

This adds the contract's address, e.g. 0x754a5bec6ddd2b6a568203c3c2382137f8faa41c, with the function's ID, in this case replace which resolves to 0xcabfb934, forming a unique key at which to store the return data size for the replace function, in this case 32. When we want to retrieve the return data length for a particular function, we do the same addition, this time retrieving the data instead of storing it:

(sload (+ @@contract-address @short-hash))

This is very similar to the previous code, except that we've used @short-hash, as we're working with an unknown function in the context of the code. @short-hash is assumed to have been set to the correct function ID.

keccak

The last function is supposed to behave as described below, but doesn't. If anyone reading this can explain why it fails, please let me know. This is the source of the keccak macro:

(def 'keccak (input)
  (sha3 0x00 (lit 0x00 input)))

And here's the error produced when using the keccak macro:

CodeFragment.h(53): Throw in function void dev::eth::CodeFragment::error() const [with T = dev::eth::InvalidLiteral]  
Dynamic exception type: boost::exception_detail::clone_impl<dev::eth::InvalidLiteral>  

Ok, on with the show! In a perfect world, keccak produces the sha3 hash of a given 32-byte string. This is needed when emitting events from an LLL function. The Contract ABI section on events defines the format of the event's string. Given an event signature of Initialized(bool), keccak produces

0xdebf9dd297d3980055770d6c047a67eed8f6b1e91397831703b5f2a73bec1320  

This is the equivalent of web3.sha3("Initialized(bool)"). LLL's sha3 operator takes two arguments: the memory location of the string you want to hash and the string's length. So we have to store the input provided at a specific location and then determine its length and give it to sha3. Fortunately, lit does just that. It takes a binary string, places it at the location specified and returns its length.

Because this doesn't work as a macro, I've had to use, for example,

(sha3 0x00 (lit 0x00 "Initialized(bool)"))

where I would have simply done

(keccak "Initialized(bool)")

It doesn't seem like a huge difference, but for me clarity is very important, and the second version is clearer.

dispatcher.lll

We're finally in a position to start talking about the dispatcher itself. In order to do that, we need to discuss the structure of an LLL-based contract. The actual structure is hidden from you when you use Solidity as it has more of an object-oriented approach.

An Ethereum smart contract actually has two distinct sections: initialization and code. The initialization section is where any contract setup is done, such as recording the owner of the contract. It's where Solidity's constructors actually happen. The code section is where your contract and its functions live. When you deploy a contract to the blockchain the initialization section is executed. It does the setup as described above, then returns the code section to the caller in order that it's "installed" into the blockchain. When your contract is called from then on, it's this code section that's used.

With that in mind we can talk about a contract as written in LLL.

If you look at the dispatcher's source you can see the two sections, marked INIT and CODE in comments. The INIT section in this case comprises two different parts. The first part includes the two source files we talked about previously: constants.lll and utilities.lll, both residing in the lib directory. These two files need to be included as early as possible in order that their contents are available to the code.

The second part of this INIT section sets up two storage locations: contract-owner and contract-address. Both of these symbols are defined in constants.lll and resolve to 0x00 and 0x01 respectively. contract-owner records the address of the caller, i.e. the external or contract account that deployed the dispatcher, so future checks can be made to restrict execution in certain cases. contract-address records the address of the contract associated with this dispatcher. This is the contract to which the dispatcher will be passing function calls. The contract-address value is initialized once in the dispatcher but can be changed by a contract if it becomes necessary.

Conclusion

This article has become quite lengthy so I think I'll stop here for now. In part 4 I'll continue with the breakdown of dispatcher.lll, getting deeper into the CODE section.