Yes, without listening to the audio, I would consider the example provided to be one C-unit and would have segmented exactly as was shown.
I'm curious, by chance was the second phrase, "how they want it" a maze (revision of "as they want it")? And was the third phrase, "exactly how they want it" an expansion of getting the food to them "as they want it"? It would be transcribed as such, if so:
So we have to get their food to them as they want it (how they want it), exactly how they want it, because if there/'s any mistake then it/'s not only the manager/s that are gonna yell at you and not only the owner/s but the kitchen/'s gonna be really mad at you.
Thank you for your inquiry.