I'm not aware of any.
If I had to do such myself I would do it as I suggested-- breaking the text into blocks, timing them, and then using the advanced scene switcher to advance scenes based on those timings, and put individual text files and audio sources into separate scenes. If precise timing is not necessary, I would just use longer scenes.