Skip to main content

What is the format of the log files you provide?

Updated over a week ago

When you download the output from a completed Data Capture extract job, the data is provided as an archive containing your data in the JSON Lines format, plus an optional additional file directory, containing the assets including in the different requests.

Understanding JSON Lines (.jsonl)

JSON Lines is a convenient format for storing structured data that may be processed record by record. It consists of sequence of valid JSON values, where each value is written on a separate line, delimited by a newline character (\n).

Advantages of JSON Lines include:

  • Easy Parsing: Each line is an independent JSON object, making it simple to read and parse incrementally.

  • Streaming Friendly: It's well-suited for streaming data processing, as you can handle each line as soon as it's received.

  • Robustness: If a file is truncated or partially written, you can often still parse the complete lines successfully.

📌 You can learn more about the JSON Lines format specification and its use cases at jsonlines.org.

Log Entry Structure

Within the downloaded .jsonl file, each line contains a single JSON object representing a logged interaction. The structure of this object typically includes the following key fields:

  • model (string): The identifier of the model that was used for the API call or the le Chat interaction (e.g., mistral-large-latest, open-mistral-7b).

  • request (object or string): The content of the request sent to the model. The structure depends on the endpoint or service used. For example, in a chat completion request, this object would typically contain the messages array provided as input.

  • response (object or string): The response generated by the model. For a chat completion, this would usually contain the choices array, including the model's generated message.

  • request_date (timestamp): The date and time when the request was processed.

  • file_mapping (object, optional): Included when files are associated with the request (e.g., in retrieval-augmented generation, files / images uploads or specific API calls). This is a key-value object where:

    • Each key is the URL used to reference the file within the request or response.

    • Each value is the corresponding file_id.

💡 You can use this file_id to retrieve the file's content using the Mistral AI Files API.

By parsing these JSON objects line by line, you can easily access and utilize the captured interaction data for your specific needs, such as analysis or preparing fine-tuning datasets.

Did this answer your question?