You are correct... I found this out yesterday actually looking at the source code to a 1.0 server someone made. Face info is sent as a single 32bit integer to the client. In 1.0 the characteristics, the characteristic color, mouth, nose, features, ears, eyes, and iris all into 32 bits using a bitfield.
Characteristics were 5 bits. (31 options) <- This is probably as flags rather than a number though, so actually 6 options
Characteristic Color was 3 bits. (7 options)
Face type was 6 bits. (63 options) <- This is probably as flags rather than a number though, so actually 7 options
Ears was 2 bits. (3 options)
Mouth was 2 bits. (3 options)
Features was 2 bits. (3 options)
Nose was 3 bits. (7 options)
Eye shape 3 bits. (7 options)
Iris size 1 bit. (2 options)
Eyebrows 3 bits. (7 options)
With 2 bits remaining that the author didn't know or were unused. (3 options)
(Add up the bits and you get 32, or 4 bytes)
Look very similar to the caps we have now. Assuming SE still uses that in 2.0, those are the hard caps until they change the data size. I'd assume that SE would've restructured how the info was stored but I guess due to time constraints, worked with what they got since this setup is probably referenced all over the game.