Hi there! Are you looking for the official Deno documentation? Try docs.deno.com for all your Deno learning needs.

GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics

import type { GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics } from "https://googleapis.deno.dev/v1/aiplatform:v1.ts";

Metrics for general pairwise text generation evaluation results.

interface GoogleCloudAiplatformV1SchemaModelevaluationMetricsPairwiseTextGenerationEvaluationMetrics {
accuracy?: number;
baselineModelWinRate?: number;
cohensKappa?: number;
f1Score?: number;
falseNegativeCount?: bigint;
falsePositiveCount?: bigint;
humanPreferenceBaselineModelWinRate?: number;
humanPreferenceModelWinRate?: number;
modelWinRate?: number;
precision?: number;
recall?: number;
trueNegativeCount?: bigint;
truePositiveCount?: bigint;
}

§Properties

§
accuracy?: number
[src]

Fraction of cases where the autorater agreed with the human raters.

§
baselineModelWinRate?: number
[src]

Percentage of time the autorater decided the baseline model had the better response.

§
cohensKappa?: number
[src]

A measurement of agreement between the autorater and human raters that takes the likelihood of random agreement into account.

§
f1Score?: number
[src]

Harmonic mean of precision and recall.

§
falseNegativeCount?: bigint
[src]

Number of examples where the autorater chose the baseline model, but humans preferred the model.

§
falsePositiveCount?: bigint
[src]

Number of examples where the autorater chose the model, but humans preferred the baseline model.

§
humanPreferenceBaselineModelWinRate?: number
[src]

Percentage of time humans decided the baseline model had the better response.

§
humanPreferenceModelWinRate?: number
[src]

Percentage of time humans decided the model had the better response.

§
modelWinRate?: number
[src]

Percentage of time the autorater decided the model had the better response.

§
precision?: number
[src]

Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the autorater thought the model had a better response. True positive divided by all positive.

§
recall?: number
[src]

Fraction of cases where the autorater and humans thought the model had a better response out of all cases where the humans thought the model had a better response.

§
trueNegativeCount?: bigint
[src]

Number of examples where both the autorater and humans decided that the model had the worse response.

§
truePositiveCount?: bigint
[src]

Number of examples where both the autorater and humans decided that the model had the better response.