SmolDocling

SmolDocling is a friendly tool that turns documents into a format computers can understand. It can even read the words in pictures, which is called Optical Character Recognition or OCR for short. Smart folks at IBM Research and HuggingFace created it, and it is part of the Docling project under the Linux Foundation AI & Data Foundation.
Benefits
SmolDocling has many good points. It uses a special format called DocTags to turn tricky documents into easy to read data. This format keeps the words, the way they are set up, and where they are on the page. The best part is that it works super fast and just as well as bigger models even though it is small. SmolDocling can handle lots of different types of documents. It knows how to read tables, computer code, math problems, charts, and more. It uses something called vLLM to work quickly, taking about 0.35 seconds to read one page on a special computer called an A100 GPU.
Use Cases
SmolDocling can be used in lots of places. It is awesome at turning tricky documents into a format that everyone can understand. This helps with reading words in pictures, figuring out how things are set up on a page, and making sense of tables and charts. It is great for jobs like reading text, looking at how things are set up, and understanding how tables are put together. It can also read computer code and math problems. SmolDocling learned from many different things. These include DocLayNet, FinTabNet, and MathWriting. They have lots of document parts like tables, charts, math problems, and computer code.
Additional Information
SmolDocling brings us DocTags, a way to mark up documents so they are easy to read. DocTags keep the words separate from how they are set up on the page. This makes it easier for computers to understand. Even though it has some things it cannot do, SmolDocling is strong and can copy tables right even when they are not set up correctly. It is a big step forward in turning documents into something computers can understand. Its small size, quick work, and ability to handle many different document parts make it a helpful tool.
Comments
Please log in to post a comment.