Show HN: Musoq – Query Anything with SQL Syntax (Git, C#, CSV, Can DBC)

github.com

56 points by Puchaczov 7 days ago

Hey, For those of you who don't know my little tool Musoq, I wanted to introduce it as a small tool that allows you to query with SQL-like syntax without any database.

It allows you to query various things from niche ones like CAN DBC files, weird ones like C# code, interesting ones with Git querying to regular stuff like CSV, TSV and various others.

I am quite a bit experimenting with various things so I'm hybridizing the engine with LLMs or doing other weird stuff that are more or less practical :-)

I wanted also to share some recent developments in this little project as I hope it might be interesting to some of you.

New Experimental Plugins: * Git Plugin (Beta): I've been working on Git repository querying - managed to test it on the EF Core repo (16k commits) and it seems to work okay * Roslyn Plugin (Beta): Added basic C# code analysis capabilities

For the very first time: I've extended CROSS APPLY to use computed results as arguments! Now the operator can use values from the current row as inputs. Here's an example:

  SELECT
    f.DirectoryName,
    f.FileName
  FROM #os.directories('/some/path', false) d
  CROSS APPLY #os.files(d.FullName, true) f
  WHERE d.Name IN ('Folder1', 'Folder2')
After another pack of fixes I'm finally able to query multiple git repositories AT ONCE!

  with ProjectsToAnalyze as (
    select
        dir2.FullName as FullName
    from #os.directories('D:\repos', false) dir1
    cross apply #os.directories(dir1.FullName, false) dir2
    where
        dir2.Name = '.git'
  )
  select
    c.Message,
    c.Author,
    c.CommittedWhen
  from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r 
  cross apply r.Commits c
  where c.AuthorEmail = 'my-email@email.ok'
  order by c.CommittedWhen desc
Under the Hood: - Added a Buckets feature for memory management (currently just testing it with the Roslyn plugin)

- Moved to .NET 8

- Added CROSS/OUTER APPLY operators

- Made some improvements to error messages and runtime behavior

New piping features: I've been experimenting with piping capabilities: * Image Analysis with LLMs:

  ./Musoq.exe image encode "image.jpg" | ./Musoq.exe run query "select s.Shop, s.ProductName, s.Price from ..."
* Text Data Extraction:

  Get-Content "ticket.txt" | ./Musoq.exe run query "select t.TicketNumber, t.CustomerName ... from #stdin.text('Ollama', 'llama3.1') t"
* Data Source Combination:

  { docker image ls; ./Musoq.exe separator; docker container ls } | ./Musoq.exe run query "..."
I'm working on comprehensive documentation: I encourage you especially to look at section "Practical Examples and Applications" and "Data Sources" where you can look at all the tables the tool currently provides. <https://puchaczov.github.io/Musoq/>

Other Changes:

- Made some improvements to OS and Archive data sources (OS can now query metadata like EXIF)

- Added a few fields to CAN DBC plugin

- Command outputs can now be used as inputs for queries

I'm hoping to:

- Improve stability and add more tests

- Flesh out the documentation

- Work on package distribution (Scoop, Ubuntu packages)

- Share some examples of source code querying with Roslyn

Ideas for later:

- WHERE robust analysis and optimizations

- DISTINCT operator implementation

- PROTOBUF schema support

- Performance improvements

- Query parallelization

- Recursive CTEs

- Subqueries

I'd really appreciate any thoughts or feedback!

The documentation section where I write a short analysis of EF Core with git plugin: <https://puchaczov.github.io/Musoq/practical-examples-and-app...>

lathiat 6 days ago

This looks awesome. As someone working in support for a wide array of Linux Apps, and data dumps from customers where I have no access to the system, plus I also write or backport bug fixes to all sorts of random software, I often want to do this kind of crazy stuff. With exactly these kinds of artefacts.

  • Puchaczov 3 days ago

    Thanks! Would love to hear about your specific support scenarios :)

snthpy 6 days ago

Very cool!

How does this interface with the different tools and how would one add another tool for it to operate on?

I started on something similar last year which was just a simple bash script to interact with things like osquery. Alas it was too buggy for what I wanted to do and it's paused indefinitely for now.

  • Puchaczov 6 days ago

    You can query with --format [raw|csv|json|interpreted_json]

    This will output pure json or csv. This way you can use other tools like jq, grep, csvtoolkit or whatever you need further process your data.

    to dig deeper, just look at: https://github.com/Puchaczov/Musoq.CLI

    After some thought: you could have also asked whether it's possible to add new data sources that you need to query, and the answer is of course yes! It's actually quite simple and there are many examples. Each data source tool is just a plugin implementing the appropriate interfaces. You can look at some example projects and see how they implement their data source here: https://github.com/Puchaczov/Musoq.DataSources

    • snthpy 6 days ago

      Thank you for your reply.

      Yes, I was asking about new data sources, so for example if I wanted to add Github to query my GH issues with Musoq, how I would do that.

      Great, I'll check out the links!

cryptoalex 6 days ago

Hey I like your project, earned a star from me! When time allows, will take it on a test drive to see how exactly it works with Roslyn/C# data. My C# solution has grown to about 80 projects so it would be good test.

  • Puchaczov 6 days ago

    Thanks! It would be great to ask what kind of usage you have in mind? I've currently been using the Roslyn plugin mainly with the engine repository itself because it helps me extract queries from tests, I use it in combination with GPT, which helps me create documentation. In the longer term, I would like this plugin to help me refactor some problematic code fragments, but for that I will need to further develop the Roslyn integration itself.

re_spond 6 days ago

Where would you place this between osquery and steampipe? It seems to borrow concepts on both sides, but I'm not sure how it could not be plugin for either.

  • Puchaczov 6 days ago

    I see it a bit that way: while osquery focuses on OS-level queries and steampipe on cloud services, Musoq is more developer-centric, I'm using it like swiss army knife for various thing, something like sed or grep. You can ofcourse create plugins that covers what mentioned tools do or even, use that tools as a plugin for musoq but in general - I'm not going to compete with any of them, I'm filling my own niche - flexible developer tool.

    • re_spond 6 days ago

      That helps, thanks! It is all about the framing. I love these kinds of tools and have been using a combination of them together with nushell, but it is a road less travelled it seems. All the more reason to evangelise your tool :) keep up the good work.

johnthescott 5 days ago

in postgresql syntax for data types is stored in the database. how would this tool parse gis expressions, for example?

  • Puchaczov 5 days ago

    You actually don't need any special type system or complex infrastructure for this. Each data source can handle its own data representation and operations. For GIS data, you could create a plugin that naturally handles spatial operations. Here's how it could work:

    select Id, DistanceBetween(s.FromPoint(-73.935242, 40.730610)) as Distance, s.IsBetween(s.FromPoint(-73.935242, 40.730610)) as IsInArea from #gis.shapefile('map.shp') s

    You could even combine it with other data sources. For example, if you have geometry data in a CSV:

    select sfg.DistanceBetween(sfg.FromPoint(..., ...)) from #csv.file(...) c cross apply #gis.ShapeFromGeometry(c.Geometry) sfg

cachvico 6 days ago

Ironically I think I'd rather query a database with anything other than SQL ¯\_(ツ)_/¯

  • Puchaczov 6 days ago

    Do you have something specific in your mind? I would love to hear that! Unexpected approach is what driving me usually

  • condwanaland 4 days ago

    Totally agree. PySpark, dplyr, or polars any day