Parameters will be updated with *something* even with a single batch of data so they are always filled. The issue is that such a model would be heavily underfitting the domain space - the amount of data would be too scarce to 'describe' the complex relations between the data and model would not be able to capture them, i.e. it would just output some garbage because the parameters would be close to the initial random distribution. This is the most obvious case but underfitting varies in severity - in case of PanGu-Sigma, the model clearly works but most likely it is not reaching its full potential because of limited data.
I should also note that underfitting can happen when the model itself is too 'simple' - lacking in capacity - to capture more complex interactions, which is why we usually see improvements with larger models and the general trend has been 'the larger - the better' (note: that does not mean that architecturally better but smaller model cannot outperform a larger one). That can also backfire as training for a long time on a small (relative to the model size) dataset can lead to overfitting, the model would simply memorize the training tokens and achieve near perfect metrics on the training set but fail to generalize on any new data. This issue, again, can vary in severity - it is pretty interesting with LLMs because I think it is hard to detect overfitting with them due to humongous data sizes as getting an out-of-domain testing set would be difficult.