Show HN: Multi-modal RAG with ColQwen in a single line of Code

mbridge 7 months ago

Are you planning to address scaling with colqwen? The current approach is unindexed, and I’d be surprised if it scales beyond 10k pages or so…

ArnavAgrawal03 7 months ago

For similarity score computation, we employ a hamming distance approximation instead - meaning we can store the individual vectors as just bit-arrays. The search, therefore, has proven to be really scalable in our experience.
We're finding the actual embedding process to be more of a bottleneck, and we're exploring options on how to speed that up.
Feel free to clarify in case that didn't answer your question!

eamag 8 months ago

I tried to learn more but this link from readme gives me 404 https://databridge.gitbook.io/databridge-docs/architecture/o...

Can you tell what is it and how it works?

ArnavAgrawal03 8 months ago

Yes of course! DataBridge is a multi-modal database with the belief that retrieval over unstructured data - such as videos, pdfs, and other documents should work with the same reliability, speed, and consistency as regular database data structures (and at a similar level of abstraction). Right now, we're focusing on helping simplify RAG pipelines for users.
You can find more about getting started and installation instructions here: https://databridge.mintlify.app/getting-started
And you can find our API reference here: https://databridge.mintlify.app/api-reference/
Is there anything specific you're looking for?
- eamag 8 months ago
  
  what does it mean "multi-modal database"? Do you have any details somewhere?
  - ArnavAgrawal03 8 months ago
    
    multi-modal database means that we treat multi-modal data, such as images, audio, and video as first-class citizens and provide natural language search over them. That is, you can ingest multi-modal content into DataBridge the same way you'd ingest structured data into a database. You can perform updates over this information, extract metadata, or define custom parsing/processing rules (eg. redact any PII).
    Your search queries would go through a planner which - depending on the kind of data we're retrieving - will call the correct tools to extract information from the data and respond to your query.
    For instance, this could be function calling over object-tracking data if your query relates to object movements over a video. This could also be a call to ColQwen in case we're looking for particular features within a diagram-heavy PDF. It could also be a simple semantic search if thats what the planner deems most useful.
    The idea is that traditional databases work the same way - query planning systems figure out the best path to execute the user query, and pass that to the query execution engine. We think a lot of this complexity can be abstracted away from the user - as long as we can provide them strong retrieval guarantees (the same ways Databases have SLAs).
    Let me know if something is unclear here!