I've cut out the parts that I don't have a direct disagreement with. It was not to make your post seem less coherent.
You are however missing the point here. I'm not talking about videos that are recorded in 24 (or 30) FPS then "upscaled" to 60 fps, I'm talking about videos that are actually recorded at 60 fps and been at 60 fps all the way from the camera to your display. This is what looks really good.
The amount of "frames" you can process is not something that is limited by your eyes. This is dependent on the type of data you need to process as well as how good you are at processing visual data. This is a skill that can, to a certain extent, be trained. The most prime example is a Formula 1 driver, who typically can process many times more visual information per second than most people can. If you fed 15 frames per second to a F1 driver, I can assure you that this driver will be missing a *ton* of information that his (or her) brain was accustomed to having available for processing before.
Because eyes do not work in clean, separate frames, those faint visual changes that an extremely quick movement would give you can also be processed as visual data in your brain. These faint registrations that would not normally qualify as a "whole frame" can give you enough information for your brain to decide if further investigation of the object is worth doing. The eye can then "lock on" to the object for just enough time to properly identify it.
If this rapidly moving object was recorded with a 24fps camera, the entire object's movement would be just a blurred mess, and no amount of movement tracking done by your eyes would be able to stabilize it and get a clear image of what you're looking at. If it was captured with let's say something really crazy like 200 fps and displayed to you at that framerate, the same object would still have been a blurry mess as it was moving around, but your eyes could lock on to it and give you a clear image just like you would be able to do in real life, while everything that wasn't moving would become a blurry mess instead.
Your eyes do this all the time in real life. They stabilize everything that moves that you want to detect more accurately. Your eyes can't do this if you're watching a low-fps recording of the real world, where motion blur distorts the image data permanently.
Playing action games at high framerates lets you do the same thing. Of course, it is not as important to the actual gameplay of FF14, because it's relatively slow paced and inhibited by latency, but you can still notice a much smoother experience if you move the camera around fast, and if another character runs past you really close to your camera. At a low framerate, an object moving past you close to the camera would be harder for the eyes to track if the entire run-by was just a total of 10 frames(1
/3rd of a second at 30fps), rather than 40 (1/3 of a second at 120fps). The eyes (or rather, your brain) would have had a lot more information to detect speed and direction with, and therefore be able to more easily lock your eyes on to the moving object to stabilize it.
-edit-
changed values in examples to reflect more practical situations.
Also, don't worry. I know exactly which specifications to look for when I buy my visual equipment. As you might have noticed, I have a more than average interest in how these things work.