When you download the output from a completed Data Capture extract job, the data is provided as an archive containing your data in the JSON Lines format, plus an optional additional file
directory, containing the assets including in the different requests.
Understanding JSON Lines (.jsonl)
JSON Lines is a convenient format for storing structured data that may be processed record by record. It consists of sequence of valid JSON values, where each value is written on a separate line, delimited by a newline character (\n
).
Advantages of JSON Lines include:
Easy Parsing: Each line is an independent JSON object, making it simple to read and parse incrementally.
Streaming Friendly: It's well-suited for streaming data processing, as you can handle each line as soon as it's received.
Robustness: If a file is truncated or partially written, you can often still parse the complete lines successfully.
π You can learn more about the JSON Lines format specification and its use cases at jsonlines.org.
Log Entry Structure
Within the downloaded .jsonl
file, each line contains a single JSON object representing a logged interaction. The structure of this object typically includes the following key fields:
model
(string
): The identifier of the model that was used for the API call or the le Chat interaction (e.g.,mistral-large-latest
,open-mistral-7b
).request
(object
orstring
): The content of the request sent to the model. The structure depends on the endpoint or service used. For example, in a chat completion request, this object would typically contain themessages
array provided as input.response
(object
orstring
): The response generated by the model. For a chat completion, this would usually contain thechoices
array, including the model's generated message.request_date
(timestamp
): The date and time when the request was processed.file_mapping
(object
, optional): Included when files are associated with the request (e.g., in retrieval-augmented generation, files / images uploads or specific API calls). This is a key-value object where:Each
key
is the URL used to reference the file within therequest
orresponse
.Each
value
is the correspondingfile_id
.
π‘ You can use this file_id
to retrieve the file's content using the Mistral AI Files API.
By parsing these JSON objects line by line, you can easily access and utilize the captured interaction data for your specific needs, such as analysis or preparing fine-tuning datasets.