Jump to Content
Data Analytics

Extending BigQuery Functions beyond SQL with Remote Functions, now in preview

May 11, 2022
Christopher Crosbie

Group Product Manager, Google

Wei Hsia

Developer Advocate

Today we are announcing the Preview of BigQuery Remote Functions. Remote Functions are user-defined functions (UDF) that let you extend BigQuery SQL with your own custom code, written and hosted in Cloud Functions, Google Cloud’s scalable pay-as-you-go functions as a service.  A remote UDF accepts columns from BigQuery as input, performs actions on that input using a Cloud Function, and returns the result of those actions as a value in the query result. With Remote Functions, you can now write custom SQL functions in Node.js, Python, Go, Java, NET, Ruby, or PHP. This ability means you can personalize BigQuery for your company, leverage the same management and permission models without having to manage a server.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Extending_BigQuery_Functions.max-800x800.jpg

In what type of situations could you use remote functions?

Before today, BigQuery customers had the ability to create user defined functions or UDFs in either SQL or javascript that ran entirely within BigQuery. While these functions are performant and fully managed from within BigQuery, customers expressed a desire to extend BigQuery UDFs with their own external code. Here are some examples of what they have asked for:

  • Security and Compliance: Use data encryption and tokenization services from the Google Cloud security ecosystem for external encryption and de-identification. We’ve already started working with key partners like Protegrity and CyberRes Voltage on using these external functions as a mechanism to merge BigQuery into their security platform, which will help our mutual customers address strict compliance controls. 
  • Real Time APIs: Enrich BigQuery data using external APIs to obtain the latest stock price data, weather updates, or geocoding information.
  • Code Migration: Migrate legacy UDFs or other procedural functions written in Node.js, Python, Go, Java, .NET, Ruby or PHP. 
  • Data Science: Encapsulate complex business logic and score BigQuery datasets by calling models hosted in Vertex AI or other Machine Learning platforms.

Getting Started

Let’s go through the steps to use a BigQuery remote UDF. 

Setup the BigQuery Connection:
   1. Create a BigQuery Connection 
     a. You may need to enable the BigQuery Connection API

Deploy a Cloud Function with your code:
   1. Deploying your Cloud Function
     a. You may need to enable Cloud Functions API
     b. You may need to enable Cloud Build APIs

   2. Grant the BigQuery Connection service account access to the Cloud Function
     a. One way you can find the service account is by using the bq cli show command

Loading...

Define the BigQuery remote UDF: 
   1. Create the remote UDFs definition within BigQuery 
     a. One way to find the endpoint name is to use the gCloud cli functions describe command

Loading...

Use the BigQuery remote UDF in SQL:
   1. Write a SQL statement as you would calling a UDF 
   2. Get your results! 

How remote functions can help you with common data tasks

Let’s take a look at some examples of how using BigQuery with remote UDFs can help accelerate development and enhance data processing and analysis.

Encryption and Decryption

As an example, let’s create a simple custom encryption and decryption Cloud Function in Python. 

The encryption function can receive the data and return an encrypted base64 encoded string. 

In the same Cloud Function, the decryption function can receive an encrypted base64 encoded string and return the decrypted string. A data engineer would be able to enable this functionality in BigQuery.

The Cloud Function receives the data and determines which function you want to invoke. The data is received as an HTTP request. The additional userDefinedContext fields allow you to send additional pieces of data to the Cloud Function.

Loading...

The result is returned in a specific JSON formatted response that is returned to BigQuery to be parsed.

Loading...

This Python code is deployed to Cloud Functions where it awaits to be invoked.

Let’s add the User Defined Function to BigQuery so we can invoke it from a SQL statement. The additional user_defined_context is what is sent to Cloud Functions as additional context in the request payload so you can use multiple remote functions mapped to one endpoint.

Loading...

Once we’ve created our functions, users with the right IAM permissions can use them in SQL on BigQuery.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_Extending_BigQuery_Functions.max-1400x1400.jpg

If you’re new to Cloud Functions, be aware that there are very minimal delays known as “cold starts”. 

The neat thing is you can call APIs as well, which is how our partners at Protegrity and Voltage enable their platforms to perform encryption and decryption of BigQuery data.

Calling APIs to enrich your data

Users, such as data analysts, can use the user defined functions created easily without needing other tools and moving the data out of BigQuery.

You can enrich your dataset with many more APIs, for example, the Google Cloud Natural Language API to analyze sentiment on your text without having to use another tool.

Loading...

Once the Cloud Function is deployed and the remote UDF definition is created on BigQuery, you are able to invoke the NLP API and return the data from it for use in your queries.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Extending_BigQuery_Functions.max-700x700.jpg

Custom Vertex AI endpoint

Data Scientists can integrate Vertex AI endpoints and other APIs, all from the SQL console for custom models. 

Remember, the remote UDFs are meant for scalar executions.

You are able to deploy a model to a Vertex AI endpoint, which is another API, and then call that endpoint from Cloud Functions.

Loading...

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Extending_BigQuery_Functions.max-1500x1500.jpg

Try it out today

Try out the BigQuery remote UDFs today!

Posted in