Because this was a problem that Google was zero help on and I did find a couple other people afflicted with it, I figured I’d write a post about it.
The issue was this. I add a table to replication (SQL Server 2005). The option to replicate DDL changes is set to true, vertical_partition on the article is set to false. From everything you’ll read about replication (in 2005+, 2000 didn’t work this way, you needed to use the column add/delete procs), it should now just work. Sure, you’ll find some posts floating around out there about people with random problems that resembled this and the fixes usually just involve turning on the option to replicate schema changes or some other such basic fix. Not the case here. I tried everything I could think of. Rebuilt the publications from scratch. Dropped replication all together on the server and added it back (to be fair this might have temporarily worked, but I don’t feel like dropping it and re-creating it all again to verify).
The issue turned out to be… the replication DDL triggers (in this case, specifically sp_MSTran_altertable). These are database level triggers (introduced in 2005, which is why 2000 doesn’t work the same way) that fire when you do things like alter the schema of a table or push a stored procedure change. They had managed to become disabled (most likely by the careless hacking together of a partial script generated by SQL Compare, my fault, not the tool’s fault, this was a dev environment). The publication level option to replicate schema changes is what tells replication you want them propagated to your subscribers, but the triggers are what enable this to happen.
One of the things that really threw me off here is that when I scripted out the publication after I had initially created it (and before I had tried to add any columns), I saw a correct replication script, with the vertical_partition option set to ‘false’. Add a column and re-script out the publication and you see that the vertical_partition option changes to ‘true’ and you get all the columns that were part of the original definition of the table added to the script via sp_articlecolumn. This led me to believe that somehow something in replication was switching this on me behind the scenes… but really I believe this was just a side effect of the script process trying to generate scripts based on the current state of the publication (and I don’t know that the vertical_partition option is actually stored anywhere).
So, one more thing to check if this happens to you, and hopefully it saves you the many hours of replication rebuilds and fruitless Googling it took to figure this out. (And to be fair, it was actually a co-worker who thought to check the triggers, I probably would have been several more hours at it!)
And now a few more search terms to hopefully help Google find this:
SQL 2005 Replication not replicating Schema properly.
schema replication stopped working
vertical_partition option getting changed to true