Monday, August 31, 2015

How Swagger helped me with AWS API Gateway


Motive and background

Recently, I had to stumble onto a problem where I had to apply some authentication on top of considerable large legacy code. Time was pretty much limited, I had look for ways of quick solution. Then, I was introduced to API gateway from Amazon, from the outset which looked a perfect fit for my problem, why? firstly my API was running on EC2 and secondly, I did not want a convoluted authentication mechanism, so that I can set to each of my API end point to demand an access token, before it is invoked. But again, API interface was quite a large one, I simply had write an invocation points for each method. And then the Swagger came along.
What is AWS API Gateway, it is a managed service platform, that enables you to write, publish your new APIs or integrate with other third party API and still scale it; not least, some monitoring services such logs and etc. It can act as a façade face of your existing API and add some additional functionality with few Lambda Functions. The getting started guide helped me to good extent.
How swagger came into picture?,  you have a system and there are some REST APIs, certainly you may not have the control of which language they were written; and at some point, you want to talk with them, and if your system can talk to any of these API in a language agnostic way and still if you can also understand what is going on, wouldn’t it be great? that’s what exactly the swagger does. Swagger works on swagger definition, using a swagger you can write an implementation (using swagger code generation tool), or generate definition file out of existing API that in turn will be used by client. So using the swagger, I generated the definition then used that definition file to import the API resource points to AWS API gateway( this simply the methods of my existing API), don’t get confused over “resource”, it is just AWS way marking the methods of API.
setup

Overview of Implementation

Essentially my steps are based on this well written guide, typically this is 3 step process
  1. Adding Swagger's dependencies to your project.
  2. Hook Swagger into your JAX-RS application configuration.
  3. Configure and Initialize Swagger.
My API was written using REST easy, this how started add maven dependency
<dependency>
  <groupId>io.swagger</groupId>
  <artifactId>swagger-jaxrs</artifactId>
  <version>1.5.0</version>
</dependency>

then you can let the swagger to scan your root context, or add some provider class to your rest easy providers, I chose the add some providers
<servlet>
<servlet-name>Jersey2Config</servlet-name>
<servlet-class>io.swagger.jaxrs.config.DefaultJaxrsConfig</servlet-class>
<init-param>
<param-name>api.version</param-name>
<param-value>1.0.0</param-value>
</init-param>
<init-param>
<param-name>swagger.api.basepath</param-name>
<param-value>http://localhost:8080/api</param-value>
</init-param>
<load-on-startup>2</load-on-startup>
</servlet>
Just a servlet with no path mapping, which will be started during the application startup.

Are done, yes but only with the setup, still you need to decorate your API with annotations. So that swagger can catch them, lets do it!, it would be likely done as follows,

@Path("/ping")
@Api(value = "/ping", description = "All is well")
public class PingService {

private static final Logger LOGGER = LoggerFactory
.getLogger(PingService.class);

@GET
@ApiOperation(value = "Just ping", notes = "just ping.", position = 1)
@Path("/hello")
@Produces(MediaType.APPLICATION_JSON)
@ApiResponses(value = {
@ApiResponse(code = 400, message = "system down"),
@ApiResponse(code = 404, message = "No available") })
public Response ping() {
LOGGER.info("Ping service active");
return new Response(ResponseStatus.SUCCESS,
"Welcome to TeletecAO2RWService");
}

}

And you have to annotate all the model objects as well,

So what do we have here now, we get the swagger definition file from context root of your API, such as above http://localhost:8080/api/swagger.json, which will look something like this very basic


{
"apiVersion": "",
"apis": [
],
"basePath": "http://192.168.1.1:8000",
"models": {
},
"resourcePath": "/api",
"swaggerVersion": "1.2"
}




while a full fledged definition may be like this

well swagger does the work for you, it crawls from your root context and find all API paths from there. At this point you don’t have to worry about any daunting size of definition file it generates.

So how to get this imported into the our API gateway, we have swagger importer, all you do is get the aws-apigateway-swagger-importer-{version}-dependencies.jar to your local folder and run the following,

./aws-api-import.sh --create path/to/swagger.json --profile optimusprime

Remember, the "optimusprime" is the profile that contains the valid access key and secret keys for you aws api gateway. but one thing to note that, you may to run it on your Linux machine with AWS CLI installed. And if you are to build this from maven scratch, build it from Linux machine itself. I had waste an hour debugging my build and moving the my EC2 RHEL instance. Script aws-api-import.sh will struggle to execute in your Linux environment if you do so.

Having everything worked fine, importer ran properly you now will see the resource points listed in your AWS. If you feeling relaxed at this point, all you need to do is just finish off with your rest of AWS API work.

Click on these method, and map it to your API, in my case it was on EC2, so when selecting the integration type, due to the fact my API was on EC2, I chose AWS service proxy. Before this, I created ARN for my existing service using a role. How exactly do that? you can find here.



Add authentication


To to your resource, and select method GET/POST or what ever, and select “Method Request”

API-Get

API-Get_security

You can select AWS-IAM and select the security token, at least this is all I had to, in your case if you insist the client to pass some header values you can add and then wire some lambda function to validate them (lambda function are interesting too, I suggest you to take a look at them). And then just navigate through rest of the flow to do the mapping etc. Once all done just deploy your API via deploy tool.

So what is the url, that you will be invoking the ARN which you can find the the “Method Request”, it will be such as this

arn:aws:execute-api:us-west-2:900272013338:02mubm2sui/*/GET/department

There is a pretty good written documentation from AWS for this, (if I can understand, your grandma will definitely understand), try this.

So that’s how the Rover landed in Mars, kinda cool IMHO, AWS api gateway seems to amazing feature rich tool, for a while I was wondering if this is an orchestration tool, such as Mule, but I would hardly think if that was the intention of API Gateway, probably not.

Getting the the definition file for other language written such Jersey or etc should be straight forward.

 Common Errors


1. ERROR - Could not load AWS configuration. Please run 'aws configure' - This is due to that importer could not find config file. which should be located at {user.home}/.aws/config. So you may have to create a one. To do this you need to install AWS CLI if you have not already done, instruction provided here. Once done you need to do configure a profile via
aws configure --profile optimusprime

you will be prompted for you access and secret keys.

2. ERROR -  Cross-account pass role is not allowed. (Service: null; Status Code: 403; Error Code: null; Request ID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)

This is because you have forgot to put you account id, "x-amazon-apigateway-integration" section of swagger definition file

so once you have done it should be something like as follows, ( just replace ACCOUNT_ID with your proper account id

.....
"x-amazon-apigateway-integration" : {
                    "type" : "aws",
                    "uri" : "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:865555555:function:myFunction/invocations",
                    "httpMethod" : "POST",
                    "credentials" : "arn:aws:iam::865555555:role/lambda_exec_role",
                    "requestTemplates" : {
                        "application/json" : "json request template 2",
                        "application/xml" : "xml request template 2"
                    }
.....

3. ERROR-  Invalid ARN specified in the request
This is again usually arises from wrong account id of format, remember your credential should be some thing like  "arn:aws:iam::865555555:role/lambda_exec_role", with exact number of colons.

4. Everything gone fine, but still i can't see the API listed in my console.
Nothing to worry, he might have been listed under another AWS region. Check the uri, if your under us-west-2, above uri should be

.....
"uri" : "arn:aws:apigateway:us-west-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:865555555:function:myFunction/invocations"
.....

 

 

Sunday, August 23, 2015

Writing Authority Connector for Apache ManifoldCF

Writing Authority Connector for Apache ManifoldCF


My interest in Apache ManifoldCF has been growing, this time I decided to spend some time on writing about the an Authority connector for ManifoldCF,  writing an authority connector is pretty much the same as repository connector, but it aim of a such connector is to retrieve some token values for the user against the repository.  One thing to keep remember about some default assumptions made by the ManifoldCF framework, that is, if you don’t specify any authority connector for your repository connector Manifold assumes by default it is Active Directory in charge, hence for these cases access token Active Directory SID. A more complete description is available here.

Security Model





How does ManifoldCF uses the authority connector? Framework will invoke all the authority connectors that are configured in ManifoldCF, and retrieve the tokens against each of those repositories. When you invoke the authority service which is available at http://<host>:8345/mcf-authority-service/UserACLs location it will scrap all the tokens against these repos.  Let say you have following authority connectors configured,  JIRAAuthorityConnector, ActiveDirectoryAuthorityConnector, LDAPAuthorityConnector and if you pass a username to retrieve the relevant tokens, authority connectors which understand this username will return the access tokens for that username. Finally all these tokens amalgamated and return as json. More over there is something called authority groups, when you create an authority connector you have to create it under a authority group, and a authority connector will belong to only one authority group. This allows some separation, means that token are valid only within the group.  For the complete understanding of how the ManifoldCF works, it is described in following location which is pretty good explanatory.



Ok, how can you get the access tokens for the user/username. it can be invoked from following http://localhost:8345/mcf-authority-service/UserACLs?username= leagueofshadows, and it will return tokens either in form of access tokens or deny tokens, if both present deny token will win over any access tokens
Sample,

AUTHORIZED:amazons3
TOKEN:myauthoritygroup:kuhajeyan

Overview of writing a authority connector.

So typically you would start extending from base connector org.apache.manifoldcf.authorities.authorities.BaseAuthorityConnector, so as it was about implementing/overriding some methods with repository connector there are few methods which you may have implement, flowingly

Method
What it should do
getAuthorizationResponse()
Obtain the authorization response, given a user name
outputConfigurationHeader()
Output the head-section part of an authority connection ConfigParams editing page
outputConfigurationBody()
Output the body-section part of an authority connection ConfigParams editing page
processConfigurationPost()
Receive and process form data from an authority connection ConfigParams editing page
viewConfiguration()
Output the viewing HTML for an authority connection ConfigParams object

And,
Connect – Some connection key values are initialized here.
Check – This will periodically check the, connection status. Some meaningful readable string is returned to inform the user/admin about the connection status at an instance.
isConnected – will return a boolean telling if the connection is alive or otherwise.
viewConfiguration – will be called when body of configuration page is displayed
outputConfigurationHeader – will be called in the header section of configuration page
outputConfigurationBody – will be called in body section of configuration page, but difference is that, this will be called when configuration is saved and posted
processConfigurationPost – will be called to process when configuration is posted
getAuthorizationResponse – Get the access token for a username against the repository
getDefaultAuthorizationResponse – Gets the default access token for the repository

Mainly we need to look into the implementation of getAuthorizationResponse here, returning the access token and how you want the tokens to be formatted (but it should be finally a string array) is solely dependent on your preference. A typical very simplem implementation would look like this
@Override
                public AuthorizationResponse getAuthorizationResponse(String userName)
                                                throws ManifoldCFException {
                                if (checkUserExists(userName))
                                                return new AuthorizationResponse(new String[] { userName },
                                                                                AuthorizationResponse.RESPONSE_OK);
                                return RESPONSE_USERNOTFOUND;
                }

A fully implemented version of code is available at this location

Monday, August 3, 2015

Repository Connector - Apache ManifoldCF

Writing Repository connector for Apache ManifoldCF

Apache Manifoldcf is framework that lets you connects some source repositories and index the documents, it has an in built security model that allows your target repositories to represent source security model. Target repository is, where you will have the indexes to reside. More on information about the technical structure about ManifoldCF can be found here. My aim would be walking through writing a repository connector, and I have chosen Atlassian confluence repository for the example, and we will be using confluence REST API to retrieve the confluence contents. 

ManifoldCF provides you a framework, that allows you to write repository connector, which is class that will be invoked by the jobs that will run on schedule. By writing this class, framework allows you to wire the UI elements such as form and etc. For example, if you want to write repository connector for confluence, you need some way of telling the ManifoldCF, how to get the confluence API url of the server and credentials that you will need to connect, those are the values coming from relevant UI forms. If you want to write a repository connector, you should start writing a one from inheriting base connector class BaseRepositoryConnector provided by ManifoldCF itself. There few methods that you need to provide implementation. 

You can get the source code that is built against ManifoldCF 1.8 here.

Methods to be overridden and implemented

connect() - public void connect(ConfigParams configParams) , this method lets you to make the connection to the source repository the configParams is sent from UI form of the repository connector. You can use these values to make a connection


check() -  public String check() throws ManifoldCFException,  this method allows you to check if the connection is valid with respect to the values that you have collected via connect method. it returns the string that gives you some description about the validity of current connection. For example if you cannot make the connection, you can simply let it return a string “Connection Failed”


isConnected - public boolean isConnected() returns a Boolean true, if the current connection status is successful, will be utilized by the framework when running the job.

addSeedDocuments - public void addSeedDocuments(ISeedingActivity activities,
                                                DocumentSpecification spec, long startTime, long endTime,
                                                int jobMode) throws ManifoldCFException, ServiceInterruption

This does the actual job of retrieving the contents from the source repository, retrieved contents will be inform of something called seeds, then process documents use this seeds to extract meta-data and indexes the document

getDocumentVersions - public String[] getDocumentVersions(String[] documentIdentifiers,
                                                DocumentSpecification spec) throws ManifoldCFException,
                                                ServiceInterruption

Framework, will use this version numbers to check if a content needs to be re-crawled or not, usually this version number is last modified date of the document


processDocuments - public void processDocuments(String[] documentIdentifiers,
                                                String[] versions, IProcessActivity activities,
                                                DocumentSpecification spec, boolean[] scanOnly)
                                                throws ManifoldCFException, ServiceInterruption

this method will use the seeds, and extract the meta-data and indexes each content, these will be typically transferred to your target repository such as Solr

viewConfiguration - public void viewConfiguration(IThreadContext threadContext,
                                                IHTTPOutput out, Locale locale, ConfigParams parameters)
                                                throws ManifoldCFException, IOException

UI utility method, typically you will fill the parameters with the values that were saved on earlier occasion. Such as, url, API credentials that were persisted in context (usually you would have retrieved those values initially when you tried to connect, using processConfiguration method). Method will be called when UI displays values in “view” mode.

outputConfigurationHeader - public void outputConfigurationHeader(IThreadContext threadContext,
                                                IHTTPOutput out, Locale locale, ConfigParams parameters,
                                                List<String> tabsArray) throws ManifoldCFException, IOException

UI method, which will be invoked by framework to populate the header details in UI. Implementation typically include tab information along with any defaults ones.

processConfigurationPost - public String processConfigurationPost(IThreadContext threadContext,
                                                IPostParameters variableContext, ConfigParams parameters)
                                                throws ManifoldCFException

You will save and posted values from UI, such as API url , API credentials etc.  You will retrieve values from variableContext and save them back to parameters

viewSpecification - public void viewSpecification(IHTTPOutput out, Locale locale,
                                                DocumentSpecification ds) throws ManifoldCFException, IOException


When you want to view the Job specification details of the repository connector, this method will be invoked.

processSpecificationPost - public String processSpecificationPost(IPostParameters variableContext,
                                                DocumentSpecification ds) throws ManifoldCFException

Identical to processConfigurationPost but values posted are relevant Job than repository.  You may process values such as any custom parameters to your API queries.  

outputSpecificationBody - public void outputSpecificationBody(IHTTPOutput out, Locale locale,
                                                DocumentSpecification ds, String tabName)
                                                throws ManifoldCFException, IOException

This method is invoked, when you view the specification details of the job.

outputSpecificationHeader - public void outputSpecificationHeader(IHTTPOutput out, Locale locale,
                                                DocumentSpecification ds, List<String> tabsArray)
                                                throws ManifoldCFException, IOException


Identical to outputConfigurationHeader, but this is for Job.

Structure of a Repository connector.

How does manifold recognize a new connector?, it all works on OSGI, you need create a jar file containing your repository connector and a security connector ( will be looked at later) and drop into connector libraries folder. Once it Manifold starts it will automatically pick your new connector and definitely you need to watch out for the log file manifoldcf.log that can be found in logs folder

1.       Create project, just extent a POM version from the parent Manifold you will have your most of the necessary dependencies imported. Sample pom file may look like this

2.       Resource files, this will contain typical html , javascript files that you need make them available on classpath to be picked by framework, such as file editConfiguration_conf_server.html will be loaded to contain your repository connector details. And you will explicitly locate these files in relevant UI methods described above.

connect - method
super.connect(configParams);

                                confprotocol = params
                                                                .getParameter(ConfluenceConfig.CONF_PROTOCOL_PARAM);
                                confhost = params.getParameter(ConfluenceConfig.CONF_HOST_PARAM);
                                confport = params.getParameter(ConfluenceConfig.CONF_PORT_PARAM);
                                confpath = params.getParameter(ConfluenceConfig.CONF_PATH_PARAM);
                                confsoapapipath = params
                                                                .getParameter(ConfluenceConfig.CONF_SOAP_API_PARAM);
                                clientid = params.getParameter(ConfluenceConfig.CLIENT_ID_PARAM);
                                clientsecret = params
                                                                .getObfuscatedParameter(ConfluenceConfig.CLIENT_SECRET_PARAM);

                                confproxyhost = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYHOST_PARAM);
                                confproxyport = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYPORT_PARAM);
                                confproxydomain = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYDOMAIN_PARAM);
                                confproxyusername = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYUSERNAME_PARAM);
                                confproxypassword = params
                                                                .getObfuscatedParameter(ConfluenceConfig.CONF_PROXYPASSWORD_PARAM);

                                try {
                                                getConfluenceService();
                                } catch (ManifoldCFException e) {
                                                Logging.connectors.error(e);
                                }


Take values available in configParam and use the to connect to confluence server

check –
try {
                                                return checkConnection();
                                } catch (ServiceInterruption e) {
                                                Logging.connectors.error("Error ", e);
                                                return "Connection temporarily failed: ";

                                } catch (ManifoldCFException e) {
                                                Logging.connectors.error("Error ", e);
                                                return "Connection failed: ";
                                }

Instantiating a separate thread the will check if the connection is valid, but it is not necessary that you need to do this via a thread.

protected String checkConnection() throws ManifoldCFException,
                                                ServiceInterruption {
                                String result = "Unknown";
                                getConfluenceService();
                                CheckConnectionThread t = new CheckConnectionThread(getSession(),
                                                                service);
                                try {
                                                t.start();
                                                t.finishUp();
                                                result = t.result;
                                } catch (InterruptedException e) {
                                                t.interrupt();
                                                throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
                                                                                ManifoldCFException.INTERRUPTED);
                                } catch (java.net.SocketTimeoutException e) {
                                                handleIOException(e);
                                } catch (InterruptedIOException e) {
                                                t.interrupt();
                                                handleIOException(e);
                                } catch (IOException e) {
                                                handleIOException(e);
                                } catch (ResponseException e) {
                                                handleResponseException(e);
                                }

                                return result;
                }

addSeedDocuments –
GetSeedsThread t = new GetSeedsThread(getSession(), confDriveQuery);
                                try {
                                                t.start();

                                                boolean wasInterrupted = false;
                                                try {
                                                                XThreadStringBuffer seedBuffer = t.getBuffer();

                                                                while (true) {
                                                                                String contentKey = seedBuffer.fetch();
                                                                                if (contentKey == null)
                                                                                                break;
                                                                                // Add the pageID to the queue
                                                                                activities.addSeedDocument(contentKey);
                                                                }
                                                } catch (InterruptedException e) {
                                                                wasInterrupted = true;
                                                                throw e;
                                                } catch (ManifoldCFException e) {
                                                                if (e.getErrorCode() == ManifoldCFException.INTERRUPTED)
                                                                                wasInterrupted = true;
                                                                throw e;
                                                } finally {
                                                                if (!wasInterrupted)
                                                                                t.finishUp();
                                                }
                                } catch (InterruptedException e) {
                                                t.interrupt();
                                                throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
                                                                                ManifoldCFException.INTERRUPTED);
                                } catch (java.net.SocketTimeoutException e) {
                                                handleIOException(e);
                                } catch (InterruptedIOException e) {
                                                t.interrupt();
                                                handleIOException(e);
                                } catch (IOException e) {
                                                handleIOException(e);
                                } catch (ResponseException e) {
                                                handleResponseException(e);
                                }

Here again a new thread is created to add the seeds, but framework does not necessarily required you to do so.

processDocuments –
for (int i = 0; i < documentIdentifiers.length; i++) {
                                                String nodeId = documentIdentifiers[i];
                                                String version = versions[i];

                                                long startTime = System.currentTimeMillis();
                                                String errorCode = "FAILED";
                                                String errorDesc = StringUtils.EMPTY;
                                                Long fileSize = null;
                                                boolean doLog = false;

                                                try {
                                                                if (Logging.connectors != null) {
                                                                                Logging.connectors.debug("Confluence "
                                                                                                                + ": Processing document identifier '" + nodeId
                                                                                                                + "'");
                                                                }

                                                                if (!scanOnly[i]) {
                                                                                if (version != null) {
                                                                                                doLog = true;

                                                                                                try {
                                                                                                                errorCode = processConfluenceDocuments(nodeId,
                                                                                                                                                activities, version, fileSize);
                                                                                                } catch (Exception e) {
                                                                                                                if (Logging.connectors != null) {
                                                                                                                                Logging.connectors.error(e);
                                                                                                                }
                                                                                                }

                                                                                } else {
                                                                                                activities.deleteDocument(nodeId);
                                                                                }

                                                                                // //
                                                                }
                                                } finally {
                                                                if (doLog)
                                                                                activities.recordActivity(new Long(startTime),
                                                                                                                ACTIVITY_READ, fileSize, nodeId, errorCode,
                                                                                                                errorDesc, null);
                                                }
                                }

You can simply loop through the available seeds and do anything relevant, such as extracting the meta-data or etc.
Due to keep this very brevity, I have omitted other methods, but you can have look on the source code to follow the rest