One of the advantages of Oracle Machine Learning for Python (OML4Py) is embedded Python execution, which allows users to invoke user-defined Python functions from a REST interface using Python engines spawned and controlled by the Oracle Autonomous Database environment. In addition, those functions can be invoked in a data-parallel and task-parallel manner with multiple Python engines.

In this blog, we’ll focus on some tips when passing parameters to your user-defined Python functions via the REST interface. 

Passing parameters to an OML4Py user-defined function in a REST request differs somewhat from the corresponding Python interface. Understanding how this works requires a bit of knowledge on the structure of the cURL and JSON formatting required in a REST request:

cURL

We use the command line utility cURL because REST API documentation is typically written with cURL. If you understand how to use cURL, you’ll understand API documentation and it will be easy to perform requests with your preferred language or REST client.

When using query parameters with cURL, you need to prepend a backslash prior to some special characters, such as nested quotes. Special characters in the command line need the backslash to be escaped, or interpreted correctly.

JSON

REST APIs commonly use JSON (JavaScript Object Notation) for sending and requesting data. The response is also often in JSON format. As such, passing string parameters also requires JSON formatting.

A JSON object looks like a JavaScript Object. JSON data is written as key/value pairs. In JSON, keys must be strings, surrounded with double quotes, and string values contain double quotes, but numbers do not:

{
  "parameter1": "value1",
  "parameter2": 100,
}

Example: Putting it Together

To run a user-defined Python function from the OML4Py embedded Python execution REST API, it must be saved to the OML4Py script repository. The example function algo_select uses OML4Py’s AutoML algorithm selection to predict which algorithm is most applicable to a given data set based on the distribution of values in that data set.

First, define the function from the OML4Py Python interface, and save it to the OML4Py Script Repository.

algo_select = """def algo_select(input, mining_function, score_metric, parallel):
  import oml
  from oml import automl
  df = oml.push(input)
  train, test= df.split(ratio=(0.9, 0.1), seed = 1234)
  X, y = train[:,0:-1], train[:,-1]
  X_test, y_test = test[:,0:-1], test[:,-1]
  # Determine the best AutoML algorithm for Classification
  asel = AlgorithmSelection(mining_function=mining_function, score_metric=score_metric, parallel=parallel)
  scores = [ "{0}:{1:.3f}".format(m, s) for m, s in asel.select(X, y, k=None) ]
  return("Algorithm selection prediction scores: {}".format(scores))"""

As a best practice, run the script in Embedded Python Execution from the Python interface to ensure it’s working properly. Here we use oml.table_apply, passing the string parameters mining_function and score_metric within quotes in Python, and the numeric parameter parallel without quotes. Notice that in this example, AutoML selects neural network as the best algorithm to use with the input data, using the score metric accuracy, with a value of 0.78.

oml.table_apply(data=mydf, 
  func=algo_select, 
  mining_function="classification",        
  score_metric="accuracy", 
  parallel=2)
Algorithm selection prediction scores: ['nn:0.780', 'svm_linear:0.724', 'rf:0.714', 'glm_ridge:0.697', 'svm_gaussian:0.678', 'dt:0.667', 'nb:0.584', 'glm:0.383']

Now save the function to the OML4Py Script Repository as a global script with the name algo_select, the same name as the Python function. We replace the script if it already exists and include a brief description of the function.

oml.script.create("algo_select", 
  func=algo_select,  
  is_global=True, 
  overwrite=True, 
  description='AutoML algorithm selection with string inputs for data, mining_function, score_metric, and a numeric input for parallel')

Now you’re ready to run the algo_select script from the REST API. As a prerequisite, you’ll need to assign variable names that identify your OML tenancy, database, username and password, and obtain an authorization token to and include these variables in the request.

Here’s the cuRL command and result.  The function parameters mining_function, score_metric, and parallel are passed to the cURL’s -d argument:

$ curl -i -X POST --header "Authorization: Bearer ${token}" 
  --header 'Content-Type: application/json' 
  --header 'Accept: application/json' 
  -d '{"input": "select * from NARROW",  "parameters":"{"oml_input_type":"pandas.DataFrame","mining_function":"classification","score_metric":"accuracy","parallel":2}", "asyncFlag":true}' "${omlserver}/oml/tenants/${tenant}/databases/${database}/api/py-scripts/v1/table-apply/algo_select" 
{"result":"Algorithm selection prediction scores:     ['nn:0.780', 'svm_linear:0.724', 'rf:0.714',       'glm_ridge:0.697', 'svm_gaussian:0.678',     'dt:0.667', 'nb:0.584', 'glm:0.383']"}

Key points:

  • The Accept header tells the server that the format the REST endpoint expects, and the Content-Type header shows the content type to be returned from the request. The OML4Py REST API expects JSON for both headers.
  • The function parameters contain the names and values for string parameters mining_function (classification) and score_metric (accuracy) within double quotes because the REST endpoint is expecting JSON data. 
  • The backslash symbol is used as an escape character for the nested quotes around the string parameters in the cURL query.
  • The numeric input for the parameter parallel does not require special formatting.
  • The cURL request also contains parameters for the REST call itself, such as asyncFlag for asynchronous mode, and the database table supplied to the function.

Refer to the OML4Py REST API for Embedded Python Execution Guide for details on the available REST endpoints and examples. Additionally, check out the OML4Py REST API Template notebook example in your Autonomous Database OML user account for additional workflow examples.

So there you have it, a complete example for invoking a user-defined Python function from the REST API. You can try this yourself using an Always Free Autonomous Database instance. 

Special thanks to Yu Xiang for her input on this blog.



Source link

Leave a Reply

Your email address will not be published.