Every ETL pipeline is only ever as reliable as the data that the upstream system provides. It is inevitable that assumptions you make about the data you are provided will be shattered and there is absolutely nothing you can do about it.
So what can we do? Do we just accept that our pipelines will break and fix them when the CEO shouts that the figures are out or even worse if no one notices and the data is wrong for months or years?