rtcvb32: The bigger issues are coming to how to do function calling or the stack, which may be too shallow for most uses.
The idea is to use a software stack, which is slower than a hardware stack.
An indirect call can be done as follows:
1. Push the address of END onto the stack. This takes 4 instructions, 2 for each byte of the address.
2. Push the function pointer onto the stack. (2 instructions, maybe 3, per byte, so 4-6 instructions.)
3. Execute the RTS instruction, which will "return" to the address just pushed onto the stack.
END. This is the next part of the code, that it will return to.
rtcvb32: You only need a handful of instructions to do just about everything, it just depends on how complex the instructions are.
Strictly speaking, on a Turing-complete CPU, which the 6502 is (as are the Z80 and 8086, as well as most CPUs you're likely to encounter, except maybe some (mostly older) GPUs) can, with a way of accessing enough extra memory, emulate any other Turing-complete (or even incomplete) CPU out there.
You could, for example, emulate a system with an Intel® Core™ i9-11900KB Processor (24M Cache, up to 4.90 GHz), and 128GB RAM with just a 6502 (or Z80) and enough storage space; it will just be really slow (unusibly slow, in fact). Add a GPU to the emulated machine, and you can still emulate it; it will just still be really slow, perhaps even more so.