Today, several articles (e.g. Calling Conventions and X64 Primer) let me understand what is going on under the hood.
In Win32, compiler generates special instructions for Try/Catch statement. Every function that needs attention due to an exception must add an element to a thread-global linked list upon entry, and remove it upon exit. Each element in the linked list contains a function pointer to call in the event of an exception, and then some data that said function will consume. When an exception is thrown, OS will walk through the linked list to find a function to process the exception properly.
The linked list structure of Win32 is not efficient. In addition, the linked list actually resides on the stack, thus there is a function pointer (to remove element upon exit?) sitting right below the return address on your stack -- Buffer overruns.
In contrast to the Win32 exception handling, Win64 executable contains a runtime function table. Each function table entry contains both the starting and ending address for the function, as well as the location of a rich set of data about exception-handling code in the function and the function's stack frame layout.
When an exception occurs, the OS walks the regular thread stack to search the runtime function table in that module, locates the appropriate runtime function entry, and makes the appropriate exception-processing decisions from that data.