Writing User-Defined Functions (UDF)
Canner Enterprise provides a UDF framework, allowing users to write their business logic in Java and use it in SQL. The following introduces the Scalar Function
and Aggregation Function
.
Scalar Function
When UDF is written, annotations represent the function's relevant information, including name, description, return type, and parameter types. The following provides a simple example of varchar as lowercase lowercaser(VARCHAR)
.
public class ExampleStringFunction
{
@ScalarFunction("lowercaser")
@Description("Converts the string to alternating case")
@SqlType(StandardTypes. VARCHAR)
public static Slice lowercaser(@SqlType(StandardTypes.VARCHAR) Slice slice)
{
String argument = slice.toStringUtf8();
return Slices.utf8Slice(argument.toLowerCase());
}
}
Published to Canner Enterprise, ready to use in SQL
-- users data
-- | firstname |
-- | David |
select lowercaser(firstname) from users;
--result
-- | firstname |
-- | david |
Aggregation function
Aggregation functions use a framework similar to the scalar function, but with more state management; Canner Enterprise's UDF framework defines AccumulatorState
as the state accumulation required.
The following example, avg_double
implements DOUBLE
type average.
@AggregationFunction("avg_double")
public class AverageAggregation
{
@InputFunction
public static void input(
LongAndDoubleState state,
@SqlType(StandardTypes. DOUBLE) double value)
{
state.setLong(state.getLong() + 1);
state.setDouble(state.getDouble() + value);
}
@CombineFunction
public static void combine(
LongAndDoubleState state,
LongAndDoubleState otherState)
{
state.setLong(state.getLong() + otherState.getLong());
state.setDouble(state.getDouble() + otherState.getDouble());
}
@OutputFunction(StandardTypes. DOUBLE)
public static void output(LongAndDoubleState state, BlockBuilder out)
{
long count = state. getLong();
if (count == 0) {
out.appendNull();
}
else {
double value = state. getDouble();
DOUBLE.writeDouble(out, value / count);
}
}
}
LongAndDoubleState
in this example inherits AccumulatorState
, and implements the getter and setter very simply so that the state can be accessed.
public interface LongAndDoubleState
extends AccumulatorState
{
long getLong();
void setLong(long value);
double getDouble();
void setDouble(double value);
}
In Canner Enterprise's SQL Engine, Aggregate function will execute the defined business logic under the MPP distributed computing architecture by calling the defined method.
We can look deeper at the three methods we need to write.
@InputFunction
:@InputFunction
annotation defines how input rows will be stored inAccumulatorState
, accumulating its state.@CombineFunction
: The@CombineFunction
annotation defines how to combine states under the distributed computing architecture (Combine)@OutputFunction
:@OutputFunction
is the method that the output will execute after the final operation is completed.
In terms of operation, @InputFunction
will be distributed and executed on different workers, and the results will be transmitted to different workers to execute @CombineFunction
to combine the states, and finally output the operation results through @OutputFunction
After the Aggregation Function is uploaded to Canner Enterprise, it can be used in SQL, it can be used in SELECT statement, or it can be used with GROUP BY.
-- users data
-- | age | country |
-- | 10 | TW |
-- | 15 | TW |
-- | 20 | US |
-- | 30 | US |
select avg_double(age) from users;
--result
-- |avg_double|
-- | 37.5 |
select country, avg_double(age) from users group by country;
--result
-- | age | country |
-- | 12.5 | TW |
-- | 25 | US |