Showing posts with label ManifoldCF. Show all posts
Showing posts with label ManifoldCF. Show all posts

Sunday, August 23, 2015

Writing Authority Connector for Apache ManifoldCF

Writing Authority Connector for Apache ManifoldCF


My interest in Apache ManifoldCF has been growing, this time I decided to spend some time on writing about the an Authority connector for ManifoldCF,  writing an authority connector is pretty much the same as repository connector, but it aim of a such connector is to retrieve some token values for the user against the repository.  One thing to keep remember about some default assumptions made by the ManifoldCF framework, that is, if you don’t specify any authority connector for your repository connector Manifold assumes by default it is Active Directory in charge, hence for these cases access token Active Directory SID. A more complete description is available here.

Security Model





How does ManifoldCF uses the authority connector? Framework will invoke all the authority connectors that are configured in ManifoldCF, and retrieve the tokens against each of those repositories. When you invoke the authority service which is available at http://<host>:8345/mcf-authority-service/UserACLs location it will scrap all the tokens against these repos.  Let say you have following authority connectors configured,  JIRAAuthorityConnector, ActiveDirectoryAuthorityConnector, LDAPAuthorityConnector and if you pass a username to retrieve the relevant tokens, authority connectors which understand this username will return the access tokens for that username. Finally all these tokens amalgamated and return as json. More over there is something called authority groups, when you create an authority connector you have to create it under a authority group, and a authority connector will belong to only one authority group. This allows some separation, means that token are valid only within the group.  For the complete understanding of how the ManifoldCF works, it is described in following location which is pretty good explanatory.



Ok, how can you get the access tokens for the user/username. it can be invoked from following http://localhost:8345/mcf-authority-service/UserACLs?username= leagueofshadows, and it will return tokens either in form of access tokens or deny tokens, if both present deny token will win over any access tokens
Sample,

AUTHORIZED:amazons3
TOKEN:myauthoritygroup:kuhajeyan

Overview of writing a authority connector.

So typically you would start extending from base connector org.apache.manifoldcf.authorities.authorities.BaseAuthorityConnector, so as it was about implementing/overriding some methods with repository connector there are few methods which you may have implement, flowingly

Method
What it should do
getAuthorizationResponse()
Obtain the authorization response, given a user name
outputConfigurationHeader()
Output the head-section part of an authority connection ConfigParams editing page
outputConfigurationBody()
Output the body-section part of an authority connection ConfigParams editing page
processConfigurationPost()
Receive and process form data from an authority connection ConfigParams editing page
viewConfiguration()
Output the viewing HTML for an authority connection ConfigParams object

And,
Connect – Some connection key values are initialized here.
Check – This will periodically check the, connection status. Some meaningful readable string is returned to inform the user/admin about the connection status at an instance.
isConnected – will return a boolean telling if the connection is alive or otherwise.
viewConfiguration – will be called when body of configuration page is displayed
outputConfigurationHeader – will be called in the header section of configuration page
outputConfigurationBody – will be called in body section of configuration page, but difference is that, this will be called when configuration is saved and posted
processConfigurationPost – will be called to process when configuration is posted
getAuthorizationResponse – Get the access token for a username against the repository
getDefaultAuthorizationResponse – Gets the default access token for the repository

Mainly we need to look into the implementation of getAuthorizationResponse here, returning the access token and how you want the tokens to be formatted (but it should be finally a string array) is solely dependent on your preference. A typical very simplem implementation would look like this
@Override
                public AuthorizationResponse getAuthorizationResponse(String userName)
                                                throws ManifoldCFException {
                                if (checkUserExists(userName))
                                                return new AuthorizationResponse(new String[] { userName },
                                                                                AuthorizationResponse.RESPONSE_OK);
                                return RESPONSE_USERNOTFOUND;
                }

A fully implemented version of code is available at this location

Monday, August 3, 2015

Repository Connector - Apache ManifoldCF

Writing Repository connector for Apache ManifoldCF

Apache Manifoldcf is framework that lets you connects some source repositories and index the documents, it has an in built security model that allows your target repositories to represent source security model. Target repository is, where you will have the indexes to reside. More on information about the technical structure about ManifoldCF can be found here. My aim would be walking through writing a repository connector, and I have chosen Atlassian confluence repository for the example, and we will be using confluence REST API to retrieve the confluence contents. 

ManifoldCF provides you a framework, that allows you to write repository connector, which is class that will be invoked by the jobs that will run on schedule. By writing this class, framework allows you to wire the UI elements such as form and etc. For example, if you want to write repository connector for confluence, you need some way of telling the ManifoldCF, how to get the confluence API url of the server and credentials that you will need to connect, those are the values coming from relevant UI forms. If you want to write a repository connector, you should start writing a one from inheriting base connector class BaseRepositoryConnector provided by ManifoldCF itself. There few methods that you need to provide implementation. 

You can get the source code that is built against ManifoldCF 1.8 here.

Methods to be overridden and implemented

connect() - public void connect(ConfigParams configParams) , this method lets you to make the connection to the source repository the configParams is sent from UI form of the repository connector. You can use these values to make a connection


check() -  public String check() throws ManifoldCFException,  this method allows you to check if the connection is valid with respect to the values that you have collected via connect method. it returns the string that gives you some description about the validity of current connection. For example if you cannot make the connection, you can simply let it return a string “Connection Failed”


isConnected - public boolean isConnected() returns a Boolean true, if the current connection status is successful, will be utilized by the framework when running the job.

addSeedDocuments - public void addSeedDocuments(ISeedingActivity activities,
                                                DocumentSpecification spec, long startTime, long endTime,
                                                int jobMode) throws ManifoldCFException, ServiceInterruption

This does the actual job of retrieving the contents from the source repository, retrieved contents will be inform of something called seeds, then process documents use this seeds to extract meta-data and indexes the document

getDocumentVersions - public String[] getDocumentVersions(String[] documentIdentifiers,
                                                DocumentSpecification spec) throws ManifoldCFException,
                                                ServiceInterruption

Framework, will use this version numbers to check if a content needs to be re-crawled or not, usually this version number is last modified date of the document


processDocuments - public void processDocuments(String[] documentIdentifiers,
                                                String[] versions, IProcessActivity activities,
                                                DocumentSpecification spec, boolean[] scanOnly)
                                                throws ManifoldCFException, ServiceInterruption

this method will use the seeds, and extract the meta-data and indexes each content, these will be typically transferred to your target repository such as Solr

viewConfiguration - public void viewConfiguration(IThreadContext threadContext,
                                                IHTTPOutput out, Locale locale, ConfigParams parameters)
                                                throws ManifoldCFException, IOException

UI utility method, typically you will fill the parameters with the values that were saved on earlier occasion. Such as, url, API credentials that were persisted in context (usually you would have retrieved those values initially when you tried to connect, using processConfiguration method). Method will be called when UI displays values in “view” mode.

outputConfigurationHeader - public void outputConfigurationHeader(IThreadContext threadContext,
                                                IHTTPOutput out, Locale locale, ConfigParams parameters,
                                                List<String> tabsArray) throws ManifoldCFException, IOException

UI method, which will be invoked by framework to populate the header details in UI. Implementation typically include tab information along with any defaults ones.

processConfigurationPost - public String processConfigurationPost(IThreadContext threadContext,
                                                IPostParameters variableContext, ConfigParams parameters)
                                                throws ManifoldCFException

You will save and posted values from UI, such as API url , API credentials etc.  You will retrieve values from variableContext and save them back to parameters

viewSpecification - public void viewSpecification(IHTTPOutput out, Locale locale,
                                                DocumentSpecification ds) throws ManifoldCFException, IOException


When you want to view the Job specification details of the repository connector, this method will be invoked.

processSpecificationPost - public String processSpecificationPost(IPostParameters variableContext,
                                                DocumentSpecification ds) throws ManifoldCFException

Identical to processConfigurationPost but values posted are relevant Job than repository.  You may process values such as any custom parameters to your API queries.  

outputSpecificationBody - public void outputSpecificationBody(IHTTPOutput out, Locale locale,
                                                DocumentSpecification ds, String tabName)
                                                throws ManifoldCFException, IOException

This method is invoked, when you view the specification details of the job.

outputSpecificationHeader - public void outputSpecificationHeader(IHTTPOutput out, Locale locale,
                                                DocumentSpecification ds, List<String> tabsArray)
                                                throws ManifoldCFException, IOException


Identical to outputConfigurationHeader, but this is for Job.

Structure of a Repository connector.

How does manifold recognize a new connector?, it all works on OSGI, you need create a jar file containing your repository connector and a security connector ( will be looked at later) and drop into connector libraries folder. Once it Manifold starts it will automatically pick your new connector and definitely you need to watch out for the log file manifoldcf.log that can be found in logs folder

1.       Create project, just extent a POM version from the parent Manifold you will have your most of the necessary dependencies imported. Sample pom file may look like this

2.       Resource files, this will contain typical html , javascript files that you need make them available on classpath to be picked by framework, such as file editConfiguration_conf_server.html will be loaded to contain your repository connector details. And you will explicitly locate these files in relevant UI methods described above.

connect - method
super.connect(configParams);

                                confprotocol = params
                                                                .getParameter(ConfluenceConfig.CONF_PROTOCOL_PARAM);
                                confhost = params.getParameter(ConfluenceConfig.CONF_HOST_PARAM);
                                confport = params.getParameter(ConfluenceConfig.CONF_PORT_PARAM);
                                confpath = params.getParameter(ConfluenceConfig.CONF_PATH_PARAM);
                                confsoapapipath = params
                                                                .getParameter(ConfluenceConfig.CONF_SOAP_API_PARAM);
                                clientid = params.getParameter(ConfluenceConfig.CLIENT_ID_PARAM);
                                clientsecret = params
                                                                .getObfuscatedParameter(ConfluenceConfig.CLIENT_SECRET_PARAM);

                                confproxyhost = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYHOST_PARAM);
                                confproxyport = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYPORT_PARAM);
                                confproxydomain = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYDOMAIN_PARAM);
                                confproxyusername = params
                                                                .getParameter(ConfluenceConfig.CONF_PROXYUSERNAME_PARAM);
                                confproxypassword = params
                                                                .getObfuscatedParameter(ConfluenceConfig.CONF_PROXYPASSWORD_PARAM);

                                try {
                                                getConfluenceService();
                                } catch (ManifoldCFException e) {
                                                Logging.connectors.error(e);
                                }


Take values available in configParam and use the to connect to confluence server

check –
try {
                                                return checkConnection();
                                } catch (ServiceInterruption e) {
                                                Logging.connectors.error("Error ", e);
                                                return "Connection temporarily failed: ";

                                } catch (ManifoldCFException e) {
                                                Logging.connectors.error("Error ", e);
                                                return "Connection failed: ";
                                }

Instantiating a separate thread the will check if the connection is valid, but it is not necessary that you need to do this via a thread.

protected String checkConnection() throws ManifoldCFException,
                                                ServiceInterruption {
                                String result = "Unknown";
                                getConfluenceService();
                                CheckConnectionThread t = new CheckConnectionThread(getSession(),
                                                                service);
                                try {
                                                t.start();
                                                t.finishUp();
                                                result = t.result;
                                } catch (InterruptedException e) {
                                                t.interrupt();
                                                throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
                                                                                ManifoldCFException.INTERRUPTED);
                                } catch (java.net.SocketTimeoutException e) {
                                                handleIOException(e);
                                } catch (InterruptedIOException e) {
                                                t.interrupt();
                                                handleIOException(e);
                                } catch (IOException e) {
                                                handleIOException(e);
                                } catch (ResponseException e) {
                                                handleResponseException(e);
                                }

                                return result;
                }

addSeedDocuments –
GetSeedsThread t = new GetSeedsThread(getSession(), confDriveQuery);
                                try {
                                                t.start();

                                                boolean wasInterrupted = false;
                                                try {
                                                                XThreadStringBuffer seedBuffer = t.getBuffer();

                                                                while (true) {
                                                                                String contentKey = seedBuffer.fetch();
                                                                                if (contentKey == null)
                                                                                                break;
                                                                                // Add the pageID to the queue
                                                                                activities.addSeedDocument(contentKey);
                                                                }
                                                } catch (InterruptedException e) {
                                                                wasInterrupted = true;
                                                                throw e;
                                                } catch (ManifoldCFException e) {
                                                                if (e.getErrorCode() == ManifoldCFException.INTERRUPTED)
                                                                                wasInterrupted = true;
                                                                throw e;
                                                } finally {
                                                                if (!wasInterrupted)
                                                                                t.finishUp();
                                                }
                                } catch (InterruptedException e) {
                                                t.interrupt();
                                                throw new ManifoldCFException("Interrupted: " + e.getMessage(), e,
                                                                                ManifoldCFException.INTERRUPTED);
                                } catch (java.net.SocketTimeoutException e) {
                                                handleIOException(e);
                                } catch (InterruptedIOException e) {
                                                t.interrupt();
                                                handleIOException(e);
                                } catch (IOException e) {
                                                handleIOException(e);
                                } catch (ResponseException e) {
                                                handleResponseException(e);
                                }

Here again a new thread is created to add the seeds, but framework does not necessarily required you to do so.

processDocuments –
for (int i = 0; i < documentIdentifiers.length; i++) {
                                                String nodeId = documentIdentifiers[i];
                                                String version = versions[i];

                                                long startTime = System.currentTimeMillis();
                                                String errorCode = "FAILED";
                                                String errorDesc = StringUtils.EMPTY;
                                                Long fileSize = null;
                                                boolean doLog = false;

                                                try {
                                                                if (Logging.connectors != null) {
                                                                                Logging.connectors.debug("Confluence "
                                                                                                                + ": Processing document identifier '" + nodeId
                                                                                                                + "'");
                                                                }

                                                                if (!scanOnly[i]) {
                                                                                if (version != null) {
                                                                                                doLog = true;

                                                                                                try {
                                                                                                                errorCode = processConfluenceDocuments(nodeId,
                                                                                                                                                activities, version, fileSize);
                                                                                                } catch (Exception e) {
                                                                                                                if (Logging.connectors != null) {
                                                                                                                                Logging.connectors.error(e);
                                                                                                                }
                                                                                                }

                                                                                } else {
                                                                                                activities.deleteDocument(nodeId);
                                                                                }

                                                                                // //
                                                                }
                                                } finally {
                                                                if (doLog)
                                                                                activities.recordActivity(new Long(startTime),
                                                                                                                ACTIVITY_READ, fileSize, nodeId, errorCode,
                                                                                                                errorDesc, null);
                                                }
                                }

You can simply loop through the available seeds and do anything relevant, such as extracting the meta-data or etc.
Due to keep this very brevity, I have omitted other methods, but you can have look on the source code to follow the rest