JSONStructuredExtraction
Extract structured JSON with LLMs
Convert unstructured text into a JSON object with predefined fields. Provide a schema name and the list of fields to extract. Compatible with OpenAI, Gemini, and Ollama.
type: "io.kestra.plugin.ai.completion.JSONStructuredExtraction"Examples
Extract person fields (Gemini)
id: json_structured_extraction
namespace: company.ai
tasks:
- id: extract_person
type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
schemaName: Person
jsonFields:
- name
- city
- country
- email
prompt: |
From the text below, extract the person's name, city, and email.
If a field is missing, leave it blank.
Text:
"Hi! I'm John Smith from Paris, France. You can reach me at john.smith@example.com."
systemMessage: You extract structured data in JSON format.
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
apiKey: "{{ kv('GEMINI_API_KEY') }}"
modelName: gemini-2.5-flash
Extract order details (OpenAI)
id: json_structured_extraction_order
namespace: company.ai
tasks:
- id: extract_order
type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
schemaName: Order
jsonFields:
- order_id
- customer_name
- city
- total_amount
prompt: |
Extract the order_id, customer_name, city, and total_amount from the message.
For the total amount, keep only the number without the currency symbol.
Return only JSON with the requested keys.
Message:
"Order #A-1043 for Jane Doe, shipped to Berlin. Total: 249.99 EUR."
systemMessage: You are a precise JSON data extraction assistant.
provider:
type: io.kestra.plugin.ai.provider.OpenAI
apiKey: "{{ kv('OPENAI_API_KEY') }}"
modelName: gpt-5-mini
Properties
jsonFields *Requiredarray
JSON Fields
List of fields to extract from the text
provider *RequiredNon-dynamicAmazonBedrockAnthropicAzureOpenAIDashScopeDeepSeekGoogleGeminiGoogleVertexAIHuggingFaceLocalAIMistralAIOciGenAIOllamaOpenAIOpenRouterWorkersAIZhiPuAI
Language Model Provider
schemaName *Requiredstring
Schema Name
The name of the JSON schema for structured extraction
configuration Non-dynamicChatConfiguration
{}Chat configuration
prompt string
Text prompt
The input text for structured JSON extraction.
systemMessage string
You are a structured JSON extraction assistant. Always respond with valid JSON.System message
Optional system instruction for the model.
Outputs
extractedJson string
Extracted JSON
The structured JSON output
finishReason string
STOPLENGTHTOOL_EXECUTIONCONTENT_FILTEROTHERFinish reason
schemaName string
Schema Name
The schema name used for the structured JSON extraction
tokenUsage TokenUsage
Token usage
Metrics
input.token.count counter
tokenLarge Language Model (LLM) input token count
output.token.count counter
tokenLarge Language Model (LLM) output token count
total.token.count counter
tokenLarge Language Model (LLM) total token count
Definitions
Azure OpenAI Model Provider
endpoint *Requiredstring
API endpoint
The Azure OpenAI endpoint in the format: https://{resource}.openai.azure.com/
modelName *Requiredstring
Model name
type *Requiredobject
apiKey string
API Key
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientId string
Client ID
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
clientSecret string
Client secret
serviceVersion string
API version
tenantId string
Tenant ID
Google VertexAI Model Provider
endpoint *Requiredstring
Endpoint URL
location *Requiredstring
Project location
modelName *Requiredstring
Model name
project *Requiredstring
Project ID
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
Google Gemini Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
Mistral AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
ZhiPu AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://open.bigmodel.cn/API base URL
The base URL for ZhiPu API (defaults to https://open.bigmodel.cn/)
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
maxRetries integerstring
The maximum retry times to request
maxToken integerstring
The maximum number of tokens returned by this request
stops array
With the stop parameter, the model will automatically stop generating text when it is about to contain the specified string or token_id
OciGenAI Model Provider
compartmentId *Requiredstring
OCID of OCI Compartment with the model
modelName *Requiredstring
Model name
region *Requiredstring
OCI Region to connect the client to
type *Requiredobject
authProvider string
OCI SDK Authentication provider
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
Deepseek Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://api.deepseek.com/v1API base URL
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
io.kestra.plugin.ai.domain.ChatConfiguration-ResponseFormat
jsonSchema object
JSON Schema (used when type = JSON)
Provide a JSON Schema describing the expected structure of the response. In Kestra flows, define the schema in YAML (it is still a JSON Schema object). Example (YAML):
responseFormat:
type: JSON
jsonSchema:
type: object
required: ["category", "priority"]
properties:
category:
type: string
enum: ["ACCOUNT", "BILLING", "TECHNICAL", "GENERAL"]
priority:
type: string
enum: ["LOW", "MEDIUM", "HIGH"]
Note: Provider support for strict schema enforcement varies. If unsupported, guide the model about the expected output structure via the prompt and validate downstream.
jsonSchemaDescription string
Schema description (optional)
Natural-language description of the schema to help the model produce the right fields. Example: "Classify a customer ticket into category and priority."
type string
TEXTTEXTJSONResponse format type
Specifies how the LLM should return output. Allowed values:
- TEXT (default): free-form natural language.
- JSON: structured output validated against a JSON Schema.
Anthropic AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
maxTokens integerstring
Maximum Tokens
Specifies the maximum number of tokens that the model is allowed to generate in its response.
OpenRouter Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
Ollama Model Provider
endpoint *Requiredstring
Model endpoint
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
OpenAI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://api.openai.com/v1API base URL
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
io.kestra.plugin.ai.domain.ChatConfiguration
logRequests booleanstring
Log LLM requests
If true, prompts and configuration sent to the LLM will be logged at INFO level.
logResponses booleanstring
Log LLM responses
If true, raw responses from the LLM will be logged at INFO level.
maxToken integerstring
Maximum number of tokens the model can generate in the completion (response). This limits the length of the output.
responseFormat ChatConfiguration-ResponseFormat
Response format
Defines the expected output format. Default is plain text.
Some providers allow requesting JSON or schema-constrained outputs, but support varies and may be incompatible with tool use.
When using a JSON schema, the output will be returned under the key jsonOutput.
returnThinking booleanstring
Return Thinking
Controls whether to return the model's internal reasoning or 'thinking' text, if available. When enabled, the reasoning content is extracted from the response and made available in the AiMessage object. It Does not trigger the thinking process itself—only affects whether the output is parsed and returned.
seed integerstring
Seed
Optional random seed for reproducibility. Provide a positive integer (e.g., 42, 1234). Using the same seed with identical settings produces repeatable outputs.
temperature numberstring
Temperature
Controls randomness in generation. Typical range is 0.0–1.0. Lower values (e.g., 0.2) make outputs more focused and deterministic, while higher values (e.g., 0.7–1.0) increase creativity and variability.
thinkingBudgetTokens integerstring
Thinking Token Budget
Specifies the maximum number of tokens allocated as a budget for internal reasoning processes, such as generating intermediate thoughts or chain-of-thought sequences, allowing the model to perform multi-step reasoning before producing the final output.
thinkingEnabled booleanstring
Enable Thinking
Enables internal reasoning ('thinking') in supported language models, allowing the model to perform intermediate reasoning steps before producing a final output; this is useful for complex tasks like multi-step problem solving or decision making, but may increase token usage and response time, and is only applicable to compatible models.
topK integerstring
Top-K
Limits sampling to the top K most likely tokens at each step. Typical values are between 20 and 100. Smaller values reduce randomness; larger values allow more diverse outputs.
topP numberstring
Top-P (nucleus sampling)
Selects from the smallest set of tokens whose cumulative probability is ≤ topP. Typical values are 0.8–0.95. Lower values make the output more focused, higher values increase diversity.
DashScope (Qwen) Model Provider from Alibaba Cloud
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://dashscope-intl.aliyuncs.com/api/v1API base URL
If you use a model in the China (Beijing) region, you need to replace the URL with: https://dashscope.aliyuncs.com/api/v1,
otherwise use the Singapore region of: "https://dashscope-intl.aliyuncs.com/api/v1.
The default value is computed based on the system timezone.
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
enableSearch booleanstring
Whether the model uses Internet search results for reference when generating text or not
maxTokens integerstring
The maximum number of tokens returned by this request
repetitionPenalty numberstring
Repetition in a continuous sequence during model generation
Increasing repetition_penalty reduces the repetition in model generation,
1.0 means no penalty. Value range: (0, +inf)
io.kestra.plugin.ai.domain.TokenUsage
inputTokenCount integer
outputTokenCount integer
totalTokenCount integer
LocalAI Model Provider
baseUrl *Requiredstring
API base URL
modelName *Requiredstring
Model name
type *Requiredobject
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
Amazon Bedrock Model Provider
accessKeyId *Requiredstring
AWS Access Key ID
modelName *Requiredstring
Model name
secretAccessKey *Requiredstring
AWS Secret Access Key
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
modelType string
COHERECOHERETITANAmazon Bedrock Embedding Model Type
HuggingFace Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://router.huggingface.co/v1API base URL
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.
WorkersAI Model Provider
accountId *Requiredstring
Account Identifier
Unique identifier assigned to an account
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
Base URL
Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).
caPem string
CA PEM certificate content
CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.
clientPem string
Client PEM certificate content
PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.