Page 1 of 1

A language for transformation

Posted: Wed May 28, 2025 4:56 am
by MasudIbne756
dataweave allows you to easily read, manipulate, and write data in any format. Discover why dataweave is industry proven by trillions of transactions on mission-critical apps.

Learn more

evaluation methodology for the dataweave codegen project
the pass@k metric is an unbiased metric commonly used to evaluate code generation models. To calculate pass@k, n (where n is larger than k), code samples are generated per task. Among these n samples, if c samples are correct (i.e. Pass all the unit tests), then (an unbiased estimate of) pass@k is computed as follows:

pass@k metric
since our main goal is to provide the dataweave codegen as a tool inside the mulesoft ides like anypoint code builder, we only suggest one or two generated scripts to the user to choose from. As such, we evaluate the models for k=1 and 2.

Compilation percentage
one of the downsides of current ai-based code generation tools is that america phone number list the suggested (generated) code doesn’t even compile, nor is it necessarily correct, which impacts developers’ productivity.

In the dataweave codegen feature inside the ide, we plan to check if the generated code compiles; then, if its produced output on the given input sample matches the given output, we suggest it to the user. Apart from the pass@k on non-filtered (without considering whether the generated code compiles) generations, we also report the compilation percentage, the percentage of the generated code samples that compile correctly to produce an output together with the pass@k on the generations that compile.