unstructured · PyPI The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more
Welcome to Unstructured! This quickstart shows how, in just a few minutes, you can use Unstructured Pipelines to quickly and easily see Unstructured’s best-in-class transformation results for a single file that is stored on your local computer
Unstructured 0. 12. 6 documentation The unstructured library is designed to help preprocess and structure unstructured text documents for use in downstream machine learning tasks Examples of documents that can be processed using the unstructured library include PDFs, XML and HTML documents