AI research for public benefit
Independent research · Open source · From Madrid

Current Initiative
Opening Historical Archives
Libraries and museums contain vast collections of scanned documents that are searchable by metadata but not by content. We are working on changing that by applying state-of-the-art OCR models to extract the full text from these archives, opening them up for research, exploration, and AI training.
Our first release covers over 830,000 pages from the Biblioteca Nacional de España, drawn from 19th-century publications in science, medicine, literature, and more. Next, we plan to expand to other archives across Spain.