PDFs to AI-ready structured data with spaCy, table extraction & more

In this new blog post, we present a new modular workflow for converting PDFs and similar documents to structured data and show how to build end-to-end document understanding and information extraction pipelines for industry use cases. We also share tips for table extraction, data collection and annotation, custom training and working with layout features.