A day in life of BI Engineer part 2

Read Part1:
Part 2:
First few days should understand business otherwise cannot create effective reports.
9:00 -10am Meet customer to understands key facts which affect business.
10-12 prepare HLD High level Document containing 10,000 feet view of requirement.
version 1. it may refined later subsequent days.
12-1:30 attend scrum meeting to update status to rest of team. co-ordinate with Team Lead, Architect and project Manager for new activity assignment for new reports.
Usually person handling one domain area of business would be given that domain specific reports as during last report development resource already acquired domain knowledge.
And does not need to learn new domain..otherwise if becoming monotonous and want to move to new area. (like sales domain report for Chip manufactuers may contain demand planning etc…)
1:30-2:00 document the new reports to be worked on today.
2:00-2:30 Lunch
2:30-3:30 Look at LLD and HLD of new reports. find sources if they exist otherwise Semantic layer needs to modified.
3:30-4:00 co-ordinate with other resource reports requirement with Architect to modify semantic layer, and other reporting requirements.
4:00-5:00 Develop\code reports, conditional formatting,set scheduling option, verify data set.
5:00-5:30 Look at old defects rectify issues.(if there is separate team for defect handling then devote time on report development).
5:30-6:00 attend defect management call and present defect resolved pending issue with Testing team.
6:00-6:30 document the work done. And status of work assigned.
6:30-7:30 Look at report pending issues. Code or research work around.
7:30-8:00 report optimisation/research.
8:00=8:30 Dinner return back home.
Ofcourse has to look at bigger picture hence need to see what reports other worked on.
Then Also needed to understand ETL design , design rules/transformations used for the project. try to develop frameworks and generic report/code which can be reused.
Look at integration of these reports to ERP (SAP,peopesoft,oracle apps etc ), CMS (joomla, sharepoint), scheduling options, Cloud enablement, Ajax-fying reports web interfaces using third party library or report SDK, integration to web portals, portal creation for reports.
So these task do take time as and when they arrive.

Architecture Difference between SAP Business Objects and IBM Cognos part1

Lets understand how Cognos product works internally

Most of BI product Architecture are almost similar internally.
BI Bus: Enterprise service Bus which surrounds all the services/servers which tool provide.
Typical ESB from Oracle BEA Aqualogic Stack engulfing many Web services looks like:
ESB_archNow you can compare this popular ESB with BI internal Architecture.
you can read more about ESB at : http://docs.oracle.com/cd/E13171_01/alsb/docs20/concepts/overview.html
Under 4 tier system: A client connects the Web server  (which is protected by firewall) using dispatcher. Dispatcher connects to Enterprise Service Bus (ESB) which surrounds all the application server services (Web services). ESB in case of cognos is Cognos BI Bus surrounds Web services Servers (like Report Server, Job server, Content Management server etc ). Mediation Layer Cognos BI Bus interacts with Non Java , C++ code which could not to converted or purposefully kept in C++ for may be more flexibility and speed
Cognos BI Bushttp://pic.dhe.ibm.com/infocenter/cbi/v10r1m1/index.jsp?topic=%2Fcom.ibm.swg.ba.cognos.crn_arch.10.1.1.doc%2Fc_arch_themulti-tierarchitecture.html

In case of SAP Business Objects (BO) ESB was not properly developed so an intermediate layer was created which works for interfacing between multiple servers like Job server, report server, page server etc. BO XI R2 came in pervius version was more in C++ to C++ to java bridge was created in ESB layer. Since Java was preferred language for coarse grain interoperability  provided by web services. Each server was developed using web services.
interaction between web server was routed through BI Bus.
BO-xi-r3.1-infrastructureIn latest version here u find a pipe connecting all components call Business Objects XI 3.1 Enterprise Infrastructure. Earlier version had different names. here you can see its connecting all server like Crystal report server, IFRS input file repository server( storing template of reports), OFRS Output file repository services, Program Job server(storing all programs which can be published on Portal (Infoview) ). This ESB does mediation between different server and achieves interoperability yet control of different components of products. This is in competitor product Cognos is called Cognos BI Bus.
For latest BO uses in memory product SAP HANA more about its competitors follow:

In Micro-strategy there are two important server Intelligent server which creates cubes

More I will cover in later issues:
Oracle BI Architecture:

Implementation OF BI system is not related to these product Architecture :
A  typical BI system under implementation haveing componets of ETL, BI, databases, Web server, app server, production server, test/development server looks like:
typical BI ArchtectureMore details: http://www.ibm.com/developerworks/patterns/bi/product-s390-web.html
Big Data Architecture:
From components perspective of ETL to BI implementation Aspect is little different

Hadoop Architecture layers:

Just like UDDI registry is repository of Web

Approach to Best collaboration Management system

Collaboration tools integrated offering (course grain integration using ) integration tools like TIBCO, Oracle BPEL, : Components to be integrated:
1. Content management system CMS  (SharePoint, Joomla, drupal) and
2. Document Management system like (liferay, Document-um, IBM file-net) can be integrated using flexible integration tools.

3. Communication platform like Windows Communication Foundation ,IBM lotus notes integrated with mail client and Social network like Facebook using Facebook API, LinkedIn API, twitter API ,skype API to direct plugin as well as data Analysis of Social networking platform unstructured data captured of the collaboration for the project discussion.
soft-phone using Skype offering recording conversation facility for later use.


Oracle Web centre:
4. Integrated Project specific Wikki/Sharepoint/other CMS pages integrated with PMO site Artefacts, Enterprise Architecture Artefacts.
5. seamless integration to Enterprise Search using Endeca or Microsoft FAST for discovery of document, information, answers from indexed,tagged repository of data.
6. Structured and Unstructured data : hosted on Hadoop clusters using Map-reduce algorithm to Analyse data, consolidate data using Hadoop Hive, HBase and mining to discover hidden information using data mining library in Mahout for unstructured data.
Structured data kept in RDBMS clusters like RAC rapid application clusters.

7. Integrated with Domain specific Enterprise resource planning ERP packages the communication, collaboration,Discovery, Search layer.
8. All integrated with mesh up architecture providing real-time information maps of resource located and information of nearest help.
9. messaging and communication layer integrated with all on-line company software.
10.Process Orchestration and integration Using Business Process Management tool BPM tool, PEGA BPM, Jboss BPM , windows workflow foundation depending landscape used.
11. Private cloud integration using Oracle cloud , Microsoft Azure, Eucalyptus, open Nebula integrated with web API other web platform landscape.
12. Integrated BI system with real time information access by tools like TIBCO spotfire which can analyse real time data flowing between integrated systems.
Data centre API and virtualisation plaform can also throw in data for analysis to hadoop cluster.
External links for reference: http://www.sap.com/index.epx
http://www.oracle.com, http://www.tibco.com/,http://spotfire.tibco.com/,
AP XI: http://help.sap.com/saphelp_nw04/helpdata/en/9b/821140d72dc442e10000000a1550b0/content.htm

Oracle Web centre: http://www.oracle.com/technetwork/middleware/webcenter/suite/overview/index.html

CMS: http://www.joomla.org/,http://www.liferay.com/http://www-03.ibm.com/software/products/us/en/filecontmana/
Hadoop: http://hadoop.apache.org/

Map reduce: http://hadoop.apache.org/docs/stable/mapred_tutorial.html
acebook API: https://developers.facebook.com/docs/reference/apis/
inkedin API: http://developer.linkedin.com/apis
witter API: https://dev.twitter.com/

Cloud Computing, 3V ,Data warehousing and Business Intelligence

The 3V volume, variety, velocity Story:

Datawarehouses maintain data loaded from operational databases using Extract Transform Load ETL tools like informatica, datastage, Teradata ETL utilities etc…
Data is extracted from operational store (contains daily operational tactical information) in regular intervals defined by load cycles. Delta or Incremental load or full load is taken to datwarehouse containing Fact and dimension tables which are modeled on STAR (around 3NF )or SNOWFLAKE schema.
During business Analysis we come to know what is granularity at which we need to maintain data. Like (Country,product, month) may be one granularity and (State,product group,day) may be requirement for different client. It depends on key drivers what level do we need to analyse business.

There many databases which are specially made for datawarehouse requirement of low level indexing, bit map indexes, high parallel load using multiple partition clause for Select(during Analysis), insert( during load). data warehouses are optimized for those requirements.
For Analytic we require data should be at lowest level of granularity.But for normal DataWarehouses its maintained at a level of granularity as desired by business requirements as discussed above.
for Data characterized by 3V volume, velocity and variety of cloud traditional datawarehouses are not able to accommodate high volume of suppose video traffic, social networking data. RDBMS engine can load limited data to do analysis.. even if it does with large not of programs like triggers, constraints, relations etc many background processes running in background makes it slow also sometime formalizing in strict table format may be difficult that’s when data is dumped as blog in column of table. But all this slows up data read and writes. even is data is partitioned.
Since advent of Hadoop distributed data file system. data can be inserted into files and maintained using unlimited Hadoop clusters which are working parallel and execution is controlled by Map Reduce algorithm . Hence cloud file based distributed cluster databases proprietary to social networking needs like Cassandra used by facebook etc have mushroomed.Apache hadoop ecosystem have created Hive (datawarehouse)

With Apache Hadoop Mahout Analytic Engine for real time data with high 3V data Analysis is made possible.  Ecosystem has evolved to full circle Pig: data flow language,Zookeeper coordination services, Hama for massive scientific computation,

HIPI: Hadoop Image processing Interface library made large scale image processing using hadoop clusters possible.

Realtime data is where all data of future is moving towards is getting traction with large server data logs to be analysed which made Cisco Acquired Truviso Rela time data Analytics http://www.cisco.com/web/about/ac49/ac0/ac1/ac259/truviso.html

Analytic being this of action: see Example:

with innovation in hadoop ecosystem spanning every direction.. Even changes started happening in other side of cloud stack of vmware acquiring nicira. With huge peta byte of data being generated there is no way but to exponentially parallelism data processing using map reduce algorithms.
There is huge data out yet to generated with IPV6 making possible array of devices to unique IP addresses. Machine to Machine (M2M) interactions log and huge growth in video . image data from vast array of camera lying every nuke and corner of world. Data with a such epic proportions cannot be loaded and kept in RDBMS engine even for structured data and for unstructured data. Only Analytic can be used to predict behavior or agents oriented computing directing you towards your target search. Bigdata which technology like Apache Hadoop,Hive,HBase,Mahout, Pig, Cassandra, etc…as discussed above will make huge difference.

kindly answer this poll:

Some of the technology to some extent remain Vendor Locked, proprietory but Hadoop is actually completely open leading the the utilization across multiple projects. Every product have data Analysis have support to Hadoop. New libraries are added almost everyday. Map and reduce cycles are turning product architecture upside down. 3V (variety, volume,velocity) of data is increasing each day. Each day a new variety comes up, and new speed or velocity of data level broken, records of volume is broken.
The intuitive interfaces to analyse the data for business Intelligence system is changing to adjust such dynamism  since we cannot look at every bit of data not even every changing data we need to our attention directed to more critical bit of data out of heap of peta-byte data generated by huge array of devices , sensors and social media. What directs us to critical bit ? As given example
or Hedge funds use hedgehog language provided by :
such processing can be achieved using Hadoop or map-reduce algorithm. There are plethora of tools and technology which are make development process fast. New companies are coming  from ecosystem which are developing tools and IDE to make transition to this new development  easy and fast.

When market gets commodatizatied as it hits plateu of marginal gains of first mover advantage the ability to execute becomes critical. What Big data changes is cross Analysis kind of first mover validation before actually moving. Here speed of execution will become more critical. As production function Innovation gives returns in multiple. so the differentiate or die or Analyse and Execute feedback as quick and move faster is market…

This will make cloud computing development tools faster to develop with crowd sourcing, big data and social Analytic feedback.

Future of cloud 2020 will convergence BI,SOA,App dev and security

We know we need to create data warehouse in order to analyse we need to find granularity  but can we really do when there is huge explosion of data from cloud think like data from youtube,twitter, facbook and devices, geolocations etc….no..Skill set required for future cloud BI , webservices ,SOA and app developement converge and with that also security.

Can we really afford Extract , transform , load cycles in cloud.With huge data needs to migrated and put in cloud for analysis not exactly.

ETL will remain valid for Enterprise all but kind of applications like social Applications ,cloud computing huge data we have alternative sets of technology which start emerging Hadoop , Hive ,HBase are one such but we cannot even afford these when data is really huge.We can relie on analytics to predict and data mine to find trends.But these assumptions are also based on models. The mathematical model we think based on evaluation we start working on implementing and hence predicting trends but what id model we thought was not the right model may be right 40% time and ignoring 60% or rigt always but set it beleived changed over period of time..I cloud we have 3V, volume : huge volume of data,

Variety: huge variety of data from desperate sources like social sites, geo feeds,video,audio, sensor networks.

Velocity: The data comes up really fast always we need to analyse the lastest voulme of data. Some may get tera byte data in day may need to analyse only that data..like weather applications, some many need months data, some week etc..

So we cannot model such variety and velocity and vloume in traditional datawarehouses.So one size fits all is not solution. So can we maintain multiple sets of tools for each data analytics, ETL, CDI, BI, database model, data mining etc…etc.. Are we not going to miss many aspect of problem when intersection between these was required.My guess is yes currently
Also we need to integrate everything to web and web to everything and integrate each other so web services come in handy.And when we need to present this over cloud so that’s where all cloud technologies set in.So convergence of BI,SOA,Application development and cloud technology is inevitable. As all cloud apps will require input and output and presentations from BI,SOA,data mining, analytic etc. Already we see hadoop as system which is mixture of Java and data warehouse BI,web services, cloud computing,parallel programming.

What about security it will be most important characteristic o which cloud will be based.? Already we have lots of cloud analytics security products based on analysis from cloud.We have IAM is most important in cloud. Identity and Access management. Increasingly applications and apps require data from IAM and from network stack keep increasing driving them closer…SAAS, and PAAS this going to be most important characteristics.