Welcome To Datagrunt

Streamline your CSV workflows with intelligent delimiter inference, multiple processing engines, and AI-powered analysis.

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.


Why Datagrunt?

πŸ’‘
Born out of real-world frustration, Datagrunt eliminates the need for repetitive coding when handling CSV files. Whether you’re a data analyst, data engineer, or data scientist, Datagrunt empowers you to focus on insights, not tedious data wrangling.

What Datagrunt Is Not

Datagrunt is not an extension of or a replacement for DuckDB or Polars, nor is it a comprehensive data processing solution. It is not designed to be a comprehensive one-stop shop for all of your CSV processing needs. Instead, it’s designed to simplify the way you work with CSV files and to help solve the pain point of inferring delimiters when a file structure is unknown.

Key Features


Powertools Under The Hood


Engine Comparison

Feature Polars DuckDB PyArrow
Best for DataFrame operations SQL queries & analytics Arrow ecosystem integration
Performance Fast in-memory processing Excellent for large datasets Optimized columnar operations
Default for CSVReader CSVWriter -
Export Quality Good Excellent (especially JSON) Native Parquet support

Datagrunt’s Role

πŸ“

Datagrunt’s Primary Functions:

  1. Accurately inferring CSV delimiters
  2. Providing helper methods for common data tasks
  3. Facilitating CSV file loading into Polars dataframes
  4. Enabling conversion to various output formats
  5. Generating AI-powered schema reports

Flexibility and Integration


License

This project is licensed under the MIT License

Acknowledgements

A HUGE thank you to the open source community and the creators of DuckDB, Polars, and PyArrow for their fantastic libraries that power Datagrunt.