I think you underestimate how advanced bots are. A bot can look at your screen and know exactly what is going on (image recognition), there are a finite amount of possible images that can be displayed when harvesting/fishing/crafting so it's childs play for a bot to work around any visual (or text and visual) based system. All they do is add to the frustration of non-botters who are getting penalised over it.

For example, FFXI, the original fishing was /fish /wait 14 /mined for fish, people got to cap fishing by pushing Alt+1, now you have to watch the rod and push the rod in the opposite direction to the way it is currently going, get the fish down to 0% stamina, oh, and it's frame capped at 30 so sluggish as possible, fishing isn't enjoyable unless you enjoy stress.