One Dataset, Many Formats: DeepFabric’s Approach to Training Format Flexibility

The Format Problem in ML Training

Synthetic dataset generation for fine-tuning language models often leads to a format lock-in problem. After investing significant time generating high-quality synthetic data, teams discover their dataset is bound to a single training format. Experimenting with TRL’s SFTTrainer requires one specific structure. Switching to GRPO for mathematical reasoning demands another. Testing OpenAI’s Harmony format for reasoning-focused models needs yet another transformation.

Teams typically respond by regenerating datasets for each framework, writing custom conversion scripts that are brittle and hard to maintain, limiting themselves to a single framework, or maintaining multiple versions of the same dataset with associated storage and versioning complexity.

The DeepFabric Approach

DeepFabric addresses this through a format-agnostic workflow built on three principles. First, all datasets are generated in a universal storage format using the OpenAI messages standard in JSONL. Second, HuggingFace Hub integration enables sharing datasets once in their generic format. Third, on-demand formatting allows pulling and reformatting datasets for any training framework at runtime.

This architecture means datasets are generated once and uploaded to HuggingFace in a generic format where they can be shared with teams or the broader community. When needed, the dataset can be formatted for any training pipeline without regeneration, enabling experimentation with different frameworks using the same source data.

The Complete Workflow

Step 1: Generate Your Dataset with Chain-of-Thought Reasoning and Custom Tools

Chain-of-thought (CoT) reasoning in tool-calling datasets teaches models to think through problems step-by-step before invoking functions. This approach significantly improves structured output quality in tool and MCP (Model Context Protocol) calling scenarios. When models explicitly reason about which tool to use, what parameters to provide, and why that tool is appropriate, they produce more accurate and contextually appropriate function calls.

DeepFabric supports custom tool definitions, allowing you to specify domain-specific functions that your model should learn to use. Let’s create a configuration file that defines custom financial analysis tools and generates a dataset with chain-of-thought reasoning:

# financial_cot_config.yaml
dataset_system_prompt: "You are an expert financial analyst. When responding, first explain your reasoning step-by-step, then call the appropriate financial analysis tools with correct parameters."

topic_tree:
  topic_prompt: "Financial analysis and portfolio management with quantitative tools"
  topic_system_prompt: "You are an expert financial analyst creating comprehensive financial analysis scenarios."
  provider: "openai"
  model: "gpt-4o"
  degree: 4
  depth: 3
  temperature: 0.7
  save_as: "financial_topics.jsonl"

data_engine:
  instructions: "Generate realistic financial analysis scenarios with step-by-step reasoning before tool calls"
  generation_system_prompt: "You are an expert financial analyst. When responding, first explain your reasoning step-by-step, then call the appropriate financial analysis tools with correct parameters."
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.8
  max_retries: 3
  conversation_template: "cot_structured"

  # Define custom financial analysis tools
  custom_tools:
    - name: "get_portfolio_performance"
      description: "Retrieves historical performance data for a portfolio including returns, holdings, and time series data"
      parameters:
        type: "object"
        properties:
          portfolio_id:
            type: "string"
            description: "Unique identifier for the portfolio"
          start_date:
            type: "string"
            description: "Start date in YYYY-MM-DD format"
          end_date:
            type: "string"
            description: "End date in YYYY-MM-DD format"
        required: ["portfolio_id", "start_date", "end_date"]

    - name: "calculate_sharpe_ratio"
      description: "Calculates the Sharpe ratio for risk-adjusted return analysis"
      parameters:
        type: "object"
        properties:
          returns:
            type: "array"
            items:
              type: "number"
            description: "Array of periodic returns"
          risk_free_rate:
            type: "number"
            description: "Annual risk-free rate as a decimal"
        required: ["returns", "risk_free_rate"]

    - name: "analyze_sector_exposure"
      description: "Analyzes portfolio exposure across different market sectors"
      parameters:
        type: "object"
        properties:
          portfolio_id:
            type: "string"
            description: "Unique identifier for the portfolio"
          date:
            type: "string"
            description: "Analysis date in YYYY-MM-DD format"
        required: ["portfolio_id", "date"]

dataset:
  creation:
    num_steps: 200
    batch_size: 4
    sys_msg: true
  save_as: "financial_cot_dataset.jsonl"

# Automatically upload to HuggingFace Hub
huggingface:
  repository: "your-org/financial-reasoning-cot"
  tags:
    - "finance"
    - "reasoning"
    - "chain-of-thought"
    - "tool-calling"
    - "synthetic"

Generate the dataset and upload it to HuggingFace in one step:

export HF_TOKEN="your-hf-token"
deepfabric generate financial_cot_config.yaml

The generated dataset contains examples where the model reasons through financial problems before making tool calls. This generic format includes the reasoning chain, tool definitions, and the conversation structure:

{
  "messages": [
    {
      "role": "user",
      "content": "I need to analyze the risk-adjusted return of my portfolio over the last year"
    },
    {
      "role": "assistant",
      "content": "Let me work through this systematically. To calculate risk-adjusted returns, I need three key metrics: the portfolio's actual return, the risk-free rate for comparison, and the portfolio's volatility (standard deviation). The Sharpe ratio is the standard metric for this analysis, calculated as (portfolio return - risk-free rate) / portfolio standard deviation.nnFirst, I should retrieve your portfolio's historical performance data for the past year. Then I'll need to fetch the current risk-free rate, typically using the 1-year Treasury yield. Finally, I'll calculate the standard deviation of returns and compute the Sharpe ratio.nnLet me start by getting your portfolio performance data for the specified period."
    }
  ],
  "available_tools": [
    {
      "name": "get_portfolio_performance",
      "description": "Retrieves historical performance data for a portfolio including returns, holdings, and time series data",
      "parameters": {
        "type": "object",
        "properties": {
          "portfolio_id": {"type": "string", "description": "Unique identifier for the portfolio"},
          "start_date": {"type": "string", "description": "Start date in YYYY-MM-DD format"},
          "end_date": {"type": "string", "description": "End date in YYYY-MM-DD format"}
        },
        "required": ["portfolio_id", "start_date", "end_date"]
      }
    },
    {
      "name": "calculate_sharpe_ratio",
      "description": "Calculates the Sharpe ratio for risk-adjusted return analysis",
      "parameters": {
        "type": "object",
        "properties": {
          "returns": {"type": "array", "items": {"type": "number"}, "description": "Array of periodic returns"},
          "risk_free_rate": {"type": "number", "description": "Annual risk-free rate as a decimal"}
        },
        "required": ["returns", "risk_free_rate"]
      }
    }
  ]
}

This format is framework-agnostic. It contains all necessary information including the chain-of-thought reasoning, tool schemas, and conversation flow, but isn’t locked to any specific training library. The reasoning component teaches the model to think through domain-specific problems, improving both the quality of tool selection and parameter accuracy in production use.

DeepFabric automatically handles the HuggingFace upload, repository creation, dataset card generation with appropriate metadata, and tag application for discoverability. The key insight is that you’re uploading the generic format, not a training-specific format. This single upload serves all downstream use cases across different training frameworks.

Step 2: Pull and Format for Your Training Pipeline

Anyone with access to the dataset can now pull it and format it for their specific training framework. The same source data can be reformatted multiple times for different purposes without regeneration.

Training with TRL SFTTrainer for Tool Calling

TRL’s SFTTrainer expects tools in an OpenAI-compatible schema with explicit function definitions. The chain-of-thought reasoning is preserved in the assistant’s response, teaching the model to think through the problem before tool invocation:

import subprocess
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer

# Format the dataset for TRL
subprocess.run([
    "deepfabric", "format",
    "--repo", "your-org/financial-reasoning-cot",
    "--formatter", "trl",
    "-o", "training_trl.jsonl"
])

# Load the formatted dataset
dataset = load_dataset("json", data_files="training_trl.jsonl")

# Train with TRL - the reasoning helps the model learn better tool selection
config = SFTConfig(output_dir="./output")
trainer = SFTTrainer(
    model=model,
    args=config,
    train_dataset=dataset["train"],
)
trainer.train()

This converts the generic dataset to TRL’s specific structure:

{
  "messages": [
    {
      "role": "user",
      "content": "I need to analyze the risk-adjusted return of my portfolio over the last year"
    },
    {
      "role": "assistant",
      "content": "Let me work through this systematically. To calculate risk-adjusted returns, I need three key metrics: the portfolio's actual return..."
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_portfolio_performance",
        "description": "Retrieves historical performance data for a portfolio including returns, holdings, and time series data",
        "parameters": {
          "type": "object",
          "properties": {
            "portfolio_id": {"type": "string", "description": "Unique identifier for the portfolio"},
            "start_date": {"type": "string", "description": "Start date in YYYY-MM-DD format"},
            "end_date": {"type": "string", "description": "End date in YYYY-MM-DD format"}
          },
          "required": ["portfolio_id", "start_date", "end_date"]
        }
      }
    }
  ]
}

Training with GRPO for Mathematical Reasoning

GRPO (Generalized Reward-based Policy Optimization) requires explicit reasoning tags to separate the working-out from the final solution. This format works particularly well with the chain-of-thought data:

deepfabric format 
  --repo your-org/financial-reasoning-cot 
  --formatter grpo 
  -o training_grpo.jsonl

The formatter wraps reasoning in tags that GRPO uses for reward modeling:

{
  "messages": [
    {"role": "system", "content": "You are a financial analysis assistant that shows your reasoning before providing solutions."},
    {"role": "user", "content": "I need to analyze the risk-adjusted return of my portfolio over the last year"},
    {
      "role": "assistant",
      "content": "<start_working_out>To calculate risk-adjusted returns, I need three key metrics: the portfolio's actual return, the risk-free rate for comparison, and the portfolio's volatility. The Sharpe ratio is calculated as (portfolio return - risk-free rate) / portfolio standard deviation. I should retrieve the portfolio's historical performance data, fetch the current risk-free rate using the 1-year Treasury yield, calculate the standard deviation of returns, then compute the Sharpe ratio.<end_working_out><SOLUTION>Retrieve portfolio performance data using get_portfolio_performance, then calculate Sharpe ratio with the returns and current risk-free rate.</SOLUTION>"
    }
  ]
}

The explicit separation of reasoning from solution helps GRPO training optimize for both correct thinking processes and accurate final answers.

Training with Harmony for OpenAI Reasoning Models

OpenAI’s Harmony format (used in gpt-oss and reasoning-focused models) uses specific XML-style tags for reasoning chains:

deepfabric format 
  --repo your-org/financial-reasoning-cot 
  --formatter harmony 
  -o training_harmony.jsonl

Output structure:

{
  "messages": [
    {"role": "user", "content": "I need to analyze the risk-adjusted return of my portfolio over the last year"},
    {
      "role": "assistant",
      "content": "<reasoning>nTo calculate risk-adjusted returns, I need three key metrics: portfolio return, risk-free rate, and volatility. The Sharpe ratio is the standard metric: (return - risk_free_rate) / std_dev.nnI need to:n1. Retrieve portfolio historical datan2. Get current risk-free raten3. Calculate standard deviationn4. Compute Sharpe ration</reasoning>n<output>I'll retrieve your portfolio performance data for the past year and calculate the Sharpe ratio for risk-adjusted return analysis.</output>"
    }
  ]
}

Training with ChatML for Reasoning-Capable Chat Models

For models that expect ChatML delimiters with preserved reasoning:

deepfabric format 
  --repo your-org/financial-reasoning-cot 
  --formatter im_format 
  -o training_chatml.jsonl

The formatter wraps everything in ChatML tags while maintaining the reasoning flow:

{
  "text": "<|im_start|>usernI need to analyze the risk-adjusted return of my portfolio over the last year<|im_end|>n<|im_start|>assistantnLet me work through this systematically. To calculate risk-adjusted returns, I need three key metrics: the portfolio's actual return, the risk-free rate for comparison, and the portfolio's volatility...<|im_end|>"
}

Why Chain-of-Thought Improves Tool Calling with Custom Tools

The chain-of-thought approach combined with custom tool definitions provides several concrete benefits for model training. Models trained with explicit reasoning learn to validate their tool selection before making calls, reducing errors where inappropriate functions are invoked. The reasoning chain provides context for parameter selection, leading to more accurate argument values. When models explain their approach, they’re more likely to catch edge cases and error conditions before execution.

Custom tools allow you to define domain-specific functions that match your production environment. For example, the financial analysis tools defined in the configuration above (get_portfolio_performance, calculate_sharpe_ratio, analyze_sector_exposure) teach the model to work with your specific API schema. When fine-tuning models like SmolLM2-1.7B-Instruct with PEFT/LoRA on this custom tool dataset, the model learns both the reasoning patterns and the exact tool signatures it will encounter in production.

Here’s a practical training example using the formatted dataset:

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
import subprocess

# Format the dataset for TRL
subprocess.run([
    "deepfabric", "format",
    "--repo", "your-org/financial-reasoning-cot",
    "--formatter", "trl",
    "-o", "trl_sft_tools.jsonl"
])

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")

# Configure LoRA for efficient fine-tuning
peft_config = LoraConfig(
    r=8,
    lora_alpha=8,
    lora_dropout=0.1,
    target_modules=['down_proj', 'o_proj', 'k_proj', 'q_proj', 'gate_proj', 'up_proj', 'v_proj'],
    use_dora=True,
    init_lora_weights="gaussian"
)

# Apply PEFT model adaptation
peft_model = get_peft_model(model, peft_config)

# Load dataset
dataset = load_dataset("json", data_files="./trl_sft_tools.jsonl")

# Configure training
training_args = SFTConfig(
    output_dir="financial-reasoning-model",
    optim="adamw_torch_fused",
    bf16=True,
    push_to_hub=True,
    report_to="none"
)

# Train
trainer = SFTTrainer(
    model=peft_model,
    args=training_args,
    train_dataset=dataset["train"],
    processing_class=tokenizer,
)

trainer.train()

The resulting model learns to reason through financial problems using the exact custom tools you defined, making it production-ready for your specific use case.

Multi-Format Conversion Script

Here’s a practical script demonstrating conversion of one chain-of-thought dataset to multiple formats:

#!/bin/bash
# multi-format-reasoning.sh

REPO="your-org/financial-reasoning-cot"
BASE_NAME="financial_reasoning"

echo "Converting reasoning dataset to multiple formats..."

# Format for TRL SFTTrainer with tool calling
echo "Formatting for TRL SFTTrainer..."
deepfabric format --repo $REPO --formatter trl -o "${BASE_NAME}_trl.jsonl"

# Format for GRPO reasoning training
echo "Formatting for GRPO..."
deepfabric format --repo $REPO --formatter grpo -o "${BASE_NAME}_grpo.jsonl"

# Format for Harmony (OpenAI reasoning models)
echo "Formatting for Harmony..."
deepfabric format --repo $REPO --formatter harmony -o "${BASE_NAME}_harmony.jsonl"

# Format for ChatML reasoning chat models
echo "Formatting for ChatML..."
deepfabric format --repo $REPO --formatter im_format -o "${BASE_NAME}_chatml.jsonl"

# Format for XLAM v2 multi-turn reasoning
echo "Formatting for XLAM v2..."
deepfabric format --repo $REPO --formatter xlam_v2 -o "${BASE_NAME}_xlam.jsonl"

echo "Conversion complete. Created 5 training-ready formats from a single dataset."

Running this script produces TRL-ready format with OpenAI-compatible tool schemas, GRPO format with explicit reasoning/solution separation, Harmony format with reasoning tags for OpenAI models, ChatML format with delimiter-wrapped reasoning, and XLAM v2 format for Salesforce’s multi-turn tool calling framework. All formats preserve the chain-of-thought reasoning that improves tool calling accuracy.

Supported Formatters

DeepFabric currently supports formatters for major training frameworks:

Formatter Command Flag Use Case Framework
TRL SFT Tools trl Tool/function calling HuggingFace TRL
GRPO grpo Mathematical/logical reasoning GRPO training
Harmony harmony Reasoning with XML tags OpenAI gpt-oss
ChatML chatml Chat models with structure ChatML-compatible
Im Format im_format Chat with delimiters ChatML variants
Alpaca alpaca Instruction following Stanford Alpaca
XLAM v2 xlam_v2 Multi-turn tool calling Salesforce xLAM
Tool Calling tool_calling Generic tool calling Agent training
Single Tool single_tool_call Individual function calls Function execution

Custom formatters can be created for specific needs using DeepFabric’s formatter API.

Advanced Configuration

Fine-grained control over formatting is available through YAML configuration files:

# formatter_config.yaml
dataset:
  formatters:
    - name: "trl_custom"
      template: "builtin://trl_sft_tools"
      output: "custom_trl.jsonl"
      config:
        include_system_prompt: true
        system_prompt_override: |
          You are a function calling AI model. You are provided with function
          signatures within <tools></tools> XML tags. Think through your
          reasoning before invoking tools.
        validate_tool_schemas: true
        remove_available_tools_field: true

Apply with:

deepfabric format --repo your-org/dataset -c formatter_config.yaml

Benefits of This Approach

Storage efficiency improves because teams store one canonical dataset and generate formats on-demand rather than maintaining multiple versions. Collaboration becomes simpler when datasets are shared once in a universal format but consumed in framework-specific formats, eliminating coordination overhead. Experimentation costs drop dramatically since trying different frameworks no longer requires expensive regeneration. Reproducibility improves with clear lineage from source to formatted data and version control at the dataset level. Cost savings accumulate because expensive synthetic data generation happens once while cheap reformatting happens as needed.

The approach is future-proof. When new training frameworks emerge, adding a formatter is straightforward. Datasets remain relevant as frameworks evolve without requiring regeneration as tools improve.

Community Impact

This architecture enables new collaboration patterns. Dataset creators can focus on data quality rather than format compatibility, reach wider audiences with a single upload, and receive feedback more easily. Dataset users find more usable datasets, can experiment with different frameworks without risk, and can contribute formatters back to the community. Research teams can compare approaches fairly using identical source data, reproduce results more reliably, and build on each other’s work more effectively.

Getting Started

Install DeepFabric:

pip install deepfabric

Create a configuration file with your custom tools and chain-of-thought settings:

# my_cot_config.yaml
dataset_system_prompt: "You are an expert software engineer. Explain your reasoning step-by-step before using debugging tools."

topic_tree:
  topic_prompt: "Software debugging with developer tools"
  provider: "openai"
  model: "gpt-4o"
  degree: 3
  depth: 3
  temperature: 0.7
  save_as: "debugging_topics.jsonl"

data_engine:
  instructions: "Generate realistic debugging scenarios with step-by-step reasoning"
  generation_system_prompt: "You are an expert software engineer. Explain your reasoning step-by-step before using debugging tools."
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.8
  conversation_template: "cot_structured"

  custom_tools:
    - name: "run_debugger"
      description: "Executes a debugger command and returns the output"
      parameters:
        type: "object"
        properties:
          command:
            type: "string"
            description: "The debugger command to execute"
          breakpoint:
            type: "integer"
            description: "Line number for breakpoint"
        required: ["command"]

dataset:
  creation:
    num_steps: 50
    batch_size: 2
  save_as: "my_dataset.jsonl"

huggingface:
  repository: "your-username/debugging-reasoning"
  tags:
    - "software-engineering"
    - "reasoning"
    - "chain-of-thought"
    - "tools"

Generate and upload in one step:

export HF_TOKEN="your-token"
deepfabric generate my_cot_config.yaml

Format for your training pipeline:

deepfabric format 
  --repo your-username/debugging-reasoning 
  --formatter trl 
  -o training_data.jsonl

Then train using your preferred framework as shown in the training example above.

Working with Existing Datasets

You can format existing community datasets without generating new ones:

# Format a community dataset for TRL
deepfabric format 
  --repo lukehinds/smol-test-sample 
  --formatter trl 
  -o trl_training.jsonl

# Or for GRPO reasoning training
deepfabric format 
  --repo lukehinds/smol-test-sample 
  --formatter grpo 
  -o grpo_training.jsonl

Conclusion

DeepFabric’s format-agnostic architecture decouples data generation from training format requirements. Datasets are generated once in a universal format, shared through HuggingFace Hub, and reformatted on-demand for specific training frameworks. This approach reduces storage requirements, eliminates coordination overhead, enables cost-effective experimentation, and ensures datasets remain useful as the ecosystem evolves.

The integration of chain-of-thought reasoning with tool calling further improves this workflow by teaching models to reason through problems before invoking functions, resulting in more accurate tool selection and parameter generation in production environments.

Resources

Documentation: https://lukehinds.github.io/deepfabric/

GitHub: https://github.com/lukehinds/deepfabric

Discord Community: https://discord.gg/pPcjYzGvbS

Examples: https://github.com/lukehinds/deepfabric/tree/main/examples

DeepFabric is open source and welcomes contributions. To add support for a new training format, see the custom formatter guide.

Leave a Reply