Show HN: Multi-modal RAG with ColQwen in a single line of Code
github.comHi HN, we're Arnav and Adi, and we're building DataBridge - a multi-modal database built from the ground up with AI use cases in mind.
We recently launched support for ColPali-style image embeddings and late-interaction retrieval. We've implemented a hamming distance version of retrieval which helps this approach scale significantly more when compared with the regular late-interaction similarity scoring.
These embeddings provide a significantly better retrieval accuracy, with ColQwen achieving around an 89% average score on the ViDoRe benchmark, compared to around 67% for traditional parsing and captioning based methods.
We're completely open source, and getting started takes less than 10 minutes (get started here: https://databridge.mintlify.app/getting-started). In fact, using these style of embeddings requires just setting `use_colpali=True` in our python SDK while ingesting or retrieving documents.
Our long term goal is to make state of the art research in retrieval be as accessible for production use cases as possible, and integrating ColPali is an initial step towards that goal. If there's research that you think is compelling, but haven't been able to integrate into production, let us know: we'd be happy to help.
We really appreciate the honest feedback the HN community provides, and so we'd love to hear from you!
I tried to learn more but this link from readme gives me 404 https://databridge.gitbook.io/databridge-docs/architecture/o...
Can you tell what is it and how it works?
Yes of course! DataBridge is a multi-modal database with the belief that retrieval over unstructured data - such as videos, pdfs, and other documents should work with the same reliability, speed, and consistency as regular database data structures (and at a similar level of abstraction). Right now, we're focusing on helping simplify RAG pipelines for users.
You can find more about getting started and installation instructions here: https://databridge.mintlify.app/getting-started
And you can find our API reference here: https://databridge.mintlify.app/api-reference/
Is there anything specific you're looking for?
what does it mean "multi-modal database"? Do you have any details somewhere?
multi-modal database means that we treat multi-modal data, such as images, audio, and video as first-class citizens and provide natural language search over them. That is, you can ingest multi-modal content into DataBridge the same way you'd ingest structured data into a database. You can perform updates over this information, extract metadata, or define custom parsing/processing rules (eg. redact any PII).
Your search queries would go through a planner which - depending on the kind of data we're retrieving - will call the correct tools to extract information from the data and respond to your query.
For instance, this could be function calling over object-tracking data if your query relates to object movements over a video. This could also be a call to ColQwen in case we're looking for particular features within a diagram-heavy PDF. It could also be a simple semantic search if thats what the planner deems most useful.
The idea is that traditional databases work the same way - query planning systems figure out the best path to execute the user query, and pass that to the query execution engine. We think a lot of this complexity can be abstracted away from the user - as long as we can provide them strong retrieval guarantees (the same ways Databases have SLAs).
Let me know if something is unclear here!