>How the fuck do people write programs that run across multiple machines at once?
With a lot of effort for anything that isn't trivial. It gets even worse when you have several machines with different hardware and different accelerators (GPUs, Xeon Phi, etc).
>Do they just show up to the program as moar cores?
You can but shouldn't be blind to that. For instance the other machine may not have the data so if you compute there you'll need to transfer it. There are runtimes which automate this though.
>What if there are multiple servers across different regions?
That'll only increase latency, other than that it makes no difference. So depending on the problem it'll be slightly slower or much slower if you need a lot of synchronization.