Skip to main content
Version: v2

Writing User-Defined Functions (UDF)

Canner Enterprise provides a UDF framework, allowing users to write their business logic in Java and use it in SQL. The following introduces the Scalar Function and Aggregation Function.

Scalar Function

When UDF is written, annotations represent the function's relevant information, including name, description, return type, and parameter types. The following provides a simple example of varchar as lowercase lowercaser(VARCHAR).

public class ExampleStringFunction
{
@ScalarFunction("lowercaser")
@Description("Converts the string to alternating case")
@SqlType(StandardTypes. VARCHAR)
public static Slice lowercaser(@SqlType(StandardTypes.VARCHAR) Slice slice)
{
String argument = slice.toStringUtf8();
return Slices.utf8Slice(argument.toLowerCase());
}
}

Published to Canner Enterprise, ready to use in SQL

-- users data
-- | firstname |
-- | David |

select lowercaser(firstname) from users;

--result
-- | firstname |
-- | david |

Aggregation function

Aggregation functions use a framework similar to the scalar function, but with more state management; Canner Enterprise's UDF framework defines AccumulatorState as the state accumulation required.

The following example, avg_double implements DOUBLE type average.

@AggregationFunction("avg_double")
public class AverageAggregation
{
@InputFunction
public static void input(
LongAndDoubleState state,
@SqlType(StandardTypes. DOUBLE) double value)
{
state.setLong(state.getLong() + 1);
state.setDouble(state.getDouble() + value);
}

@CombineFunction
public static void combine(
LongAndDoubleState state,
LongAndDoubleState otherState)
{
state.setLong(state.getLong() + otherState.getLong());
state.setDouble(state.getDouble() + otherState.getDouble());
}

@OutputFunction(StandardTypes. DOUBLE)
public static void output(LongAndDoubleState state, BlockBuilder out)
{
long count = state. getLong();
if (count == 0) {
out.appendNull();
}
else {
double value = state. getDouble();
DOUBLE.writeDouble(out, value / count);
}
}
}

LongAndDoubleState in this example inherits AccumulatorState, and implements the getter and setter very simply so that the state can be accessed.

public interface LongAndDoubleState
extends AccumulatorState
{
long getLong();

void setLong(long value);

double getDouble();

void setDouble(double value);
}

In Canner Enterprise's SQL Engine, Aggregate function will execute the defined business logic under the MPP distributed computing architecture by calling the defined method.

We can look deeper at the three methods we need to write.

  • @InputFunction: @InputFunction annotation defines how input rows will be stored in AccumulatorState, accumulating its state.
  • @CombineFunction: The @CombineFunction annotation defines how to combine states under the distributed computing architecture (Combine)
  • @OutputFunction: @OutputFunction is the method that the output will execute after the final operation is completed.

In terms of operation, @InputFunction will be distributed and executed on different workers, and the results will be transmitted to different workers to execute @CombineFunction to combine the states, and finally output the operation results through @OutputFunction

After the Aggregation Function is uploaded to Canner Enterprise, it can be used in SQL, it can be used in SELECT statement, or it can be used with GROUP BY.

-- users data
-- | age | country |
-- | 10 | TW |
-- | 15 | TW |
-- | 20 | US |
-- | 30 | US |

select avg_double(age) from users;
--result
-- |avg_double|
-- | 37.5 |

select country, avg_double(age) from users group by country;
--result
-- | age | country |
-- | 12.5 | TW |
-- | 25 | US |