What is Data Capture?
Data Capture is a feature designed to fetch and log data interactions with le Chat and our various APIs stored in our servers.
📌 This logged data can serve various purposes, such as analyzing usage patterns, debugging issues, or creating datasets for fine-tuning machine learning models.
Creating an extract Job
Creating an extract job is quite straightforward:
1. Navigate to Data Capture
Access the Data Capture
section from the main navigation menu on the left side of the platform interface.
Clicking Data Capture
in the left-hand navigation menu
2. Initiate a New Job
On the main Data Capture
view, locate and click the New Extract Job
button.
Clicking the New Extract Job
button on the Data Capture page
3. Configure the Extract Job
A configuration modal window will appear. You need to specify the parameters for your data extraction:
Choose a Data Source: Select either
API
orle Chat
depending on the interactions you want to extract (1).Select a Date Range: Specify the
Start Date
andEnd Date
for the data extraction (2).(Optional) Choose a Model: You can optionally select a specific model from the dropdown list (e.g., a base Mistral model or a fine-tuned model you own) to apply during the extraction (3). If omitted, all models are selected by default.
Click
Create Job
to submit your configuration.
🔑 The maximum duration for a single extract job is 31 days.
Configuring the extract job: (1) Data Source, (2) Date Range, (3) Optional Model Selection before clicking on the Create job
button
4. Monitor Job Completion and Access Details
After creating the job, it will appear in the list on the Data Capture
page with a status (e.g., Pending
, Running
, Completed
, Failed
).
Wait for the job's lifecycle status to become Completed
. Once completed, click on the job's ID in the list.
Clicking on a completed job's ID in the extract job list.
5. Review Job Details and Download Output
Clicking the job ID opens the specific Data Capture Extract Job
details page. This page displays:
A summary of the job configuration (status, data source, date range, model used).
Lifecycle timestamps (creation time, completion time).
One or more links to download the output file(s) in
.jsonl
format. These files contain the extracted log data based on your configuration.
Capture overview and Output files to download
You can now download the generated files for your analysis or fine-tuning workflows.
🔎 For details on the structure of the downloaded JSON Lines files, please refer to the article: What is the format of the log files you provide?