Open-source tool puts machine learning dataset analysis at data scientists’ fingertips

Comet released Kangas, an open-source smart data exploration, analysis and model debugging tool for machine learning.


Kangas, available on GitHub, helps users understand and debug their data in a new and highly intuitive way. With Kangas, visualizations are generated in real-time; enabling ML practitioners to group, sort, filter, query and interpret their structured and unstructured data to derive meaningful information and accelerate model development.

Data scientists often need to analyze datasets both during the data preparation stage and model training, which can be overwhelming and time-consuming, especially when working on large-scale datasets. The tool makes it possible to intuitively explore, debug and analyze data in real-time to quickly gain insights, leading to better, faster decisions.

Machine learning will have a profound impact on every sector, and our goal is to provide powerful tools that deliver true value to the user. Kangas specifically applies Comet’s deep experience in model debugging to analyze your data in new ways. Now the entire ML community can participate in pushing boundaries forward, particularly in terms of debugging, analysis and exploration of training data. There is a lot of interesting work being done, and we believe that Kangas will become an indispensable part in driving the future of ML,” Gideon Mendels, CEO of Comet, told Help Net Security.

Kangas benefits

  • Scalability: The tool was developed to handle large datasets with high performance.
  • Purpose built: Computer Vision/ML concepts like scoring, bounding boxes and more are supported out-of-the-box, and statistics/charts are generated automatically.
  • Support for different forms of media: Kangas is not limited to traditional text queries. It also supports images, videos and more.
  • Interoperability: Kangas can run in a notebook, as a standalone local app or even deployed as a web app. It ingests data in a simple format that makes it easy to work with whatever tooling data scientists already use.
  • Open source: Kangas is 100% open source and is built by and for the ML community.


Kangas was designed for the entire community, to be embraced by students, researchers and the enterprise. As individuals and teams work to further their ML initiatives, they will be able to leverage the full benefits of Kangas. Being open source, all are able to contribute and further enhance it as well.

“Interoperability and flexibility are inherent in Comet’s value proposition, and Comet aims to expand on that value through open source contributions,” added Mendels. “Kangas is a continuation of all of our efforts, and we couldn’t wait to get its capabilities into the hands of as many data scientists, data engineers and ML engineers as possible. We believe by open sourcing it, Comet can help teams get the most out of their ML projects in ways that have not been possible previously.”

Don't miss