Thoughts on preserving timestamps & speaker metadata in LangChain audio transcripts?

Hey all :waving_hand:

While playing with LangChain’s local Whisper audio parser, I noticed that timestamps (and any speaker info) don’t really survive once audio becomes Documents everything ends up flattened into text.

That makes things like time-based search, jump-to-audio playback, or per-speaker summaries a bit awkward.

I’m thinking about exploring a small, backward-compatible way to preserve start/end timestamps (and optional speaker metadata) directly in Document.metadata. I’d genuinely love to work on this if it’s useful to others.

Curious if anyone else has hit this, or has thoughts on whether this belongs in LangChain vs downstream.

@Cosmos-Atom Are you refering to following parser: OpenAIWhisperParserLocal — 🦜🔗 LangChain documentation?