ab-initio ETL Tool training part 1 and part 2

Ab initio beginner's course topic 1 from Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Ab initio beginner's course topic 2 from Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Data Integration , map Reduce algorithm , virtualisation relation and trends

In year 2011 This reply i did to a discussion. would later structure it into proper article.

As of 2010 data virtualization had begun to advance ETL processing. The application of data virtualization to ETL allowed solving the most common ETL tasks of data migration and application integration for multiple dispersed data sources. So-called Virtual ETL operates with the abstracted representation of the objects or entities gathered from the variety of relational, semi-structured and unstructured data sources. ETL tools can leverage object-oriented modeling and work with entities’ representations persistently stored in a centrally located hub-and-spoke architecture. Such a collection that contains representations of the entities or objects gathered from the data sources for ETL processing is called a metadata repository and it can reside in memory^[1] or be made persistent. By using a persistent metadata repository, ETL tools can transition from one-time projects to persistent middleware, performing data harmonization and data profiling consistently and in near-real time.

———————————————————————————————————————————————-

– More then colmunar databases i see probalistic databases : link:http://en.wikipedia.org/wiki/Probabilistic_database

A probabilistic database is an uncertain database in which the possible worlds have associated probabilities. Probabilistic database management systems are currently an active area of research. “While there are currently no commercial probabilistic database systems, several research prototypes exist…”^[1]

Probabilistic databases distinguish between the logical data model and the physical representation of the data much like relational databases do in the ANSI-SPARC Architecture. In probabilistic databases this is even more crucial since such databases have to represent very large numbers of possible worlds, often exponential in the size of one world (a classical database), succinctly.

————————————————————————————————————————————————

For Bigdata analysis the software which is getting popular today is IBM big data analytics

: http://www-01.ibm.com/software/data/infosphere/bigdata-analytics.html

I am writing about this too..already written some possible case study where and how to implement.

Understanding Big data PDF attached.

———————————————————————————————————————————————–

There are lot of other vendors which are also moving in products for cloud computing..in next release on SSIS hadoop feed will be available as source.

— Microstraegy and informatica already have it.

— this whole concept is based on mapreduce algorithm from google..There are online tutorials on mapreduce.(ppt attached)

—————————————————————————————————————————————–

Without a doubt, data analytics have a powerful new tool with the “map/reduce” development model, which has recently surged in popularity as open source solutions such as Hadoop have helped raise awareness.

Tool: You may be surprised to learn that the map/reduce pattern dates back to pioneering work in the 1980s which originally demonstrated the power of data parallel computing. Having proven its value to accelerate “time to insight,” map/reduce takes many forms and is now being offered in several competing frameworks.

If you are interested in adopting map/reduce within your organization, why not choose the easiest and best performing solution? ScaleOut StateServer’s in-memory data grid offers important advantages, such as industry-leading map/reduce performance and an extremely easy to use programming model that minimizes development time.

Here’s how ScaleOut map/reduce can give your data analysis the ideal map/reduce framework:

Industry-Leading Performance

ScaleOut StateServer’s in-memory data grids provide extremely fast data access for map/reduce. This avoids the overhead of staging data from disk and keeps the network from becoming a bottleneck.
ScaleOut StateServer eliminates unnecessary data motion by load-balancing the distributed data grid and accessing data in place. This gives your map/reduce consistently fast data access.
Automatic parallel speed-up takes full advantage of all servers, processors, and cores.
Integrated, easy-to-use APIs enable on-demand analytics; there’s no need to wait for batch jobs.

http://www.scaleoutsoftware.com/solutions/mapreduce/

0 Comments Leave a comment

A Day in Life of Datawarehouse Architect part 1

A data warehouse Architect generally help to design datawarehouse , requirement gathering in ETL Low level design LLD, and HLD high level design, setting up database infrastructure design for datawarehouse like Storage Area Network requirements, Rapid application Clusters for database of datawarehouse more details read
Datawarehousing consists of three main area :
1. ETL(data migration, data cleansing, data scrubbing, data loading )
2. Datawarehouse design
3. Business Intelligence (BI) Reporting infrastructure.
BI
Read These Two part article for BI
– https://sandyclassic.wordpress.com/2014/01/26/a-day-in-life-of-bi-engineer-part-2/
– https://sandyclassic.wordpress.com/2014/01/26/a-day-in-life-of-business-intelligence-engineer/
And Architect
https://sandyclassic.wordpress.com/2014/02/02/a-day-in-life-of-business-intelligence-bi-architect-part-1/

Design : Now Coming to part 2 (is generally work of Data warehouse architect)
Read Some details More would be covered in future articles
https://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/
9:00-9:30 Read and reply mails.
9:30-10:30 Scrum Meeting
10:30-11:30 update documents According to Scrum meeting like burn down chart etc..update all stake holders.
11:30-12:00 Meeting with Client to understand new requirements. create/update design specification from requirement gathered.
12:00-13:30 create HLD/LLD from the required user stories according to customer Landscape of technology used.
13:30-14:00 Lunch Break.
14:00-14:30 Update the Estimations ,coding standards , best practises for project.
14:30-15-30 Code walk through update team on coding standards.
15-30-16:30–Defect call with Testing and development Team to understand defects, reasons of defects, scope creep, defect issuse with defect manager, look at issue/defect register
16:30-17:30 Work on specification of Design of datawarehouse modelling Star or Snow flake schema design according to business requirements granularity requirements.
17:30-18:30 Look at Technical Challenges requiring Out of Box thinking, thought leadership issue, Proof of concept of leading Edge and Breeding Edge technologies fitment from project prospective.
18:30-19:30 onwards Code for POC and Look a ways of tweaking , achieving technology POC code.
19:30- 20:30 onwards Forward thinking issue might be faced ahead by using a particular technology is continuous never ending process as there can be multiple combination possible to achieve as well as using particular component or technology should not create vendor lock in, cost issues, make/buy cost decisions, usability, scalability, security issues (like PL/SQL injection, SQL injection using AJAX or web services may be affected by (XSS attack or web services Schema poisoning), Environmental network scalability issues. Affect due to new upcoming technology on Existing code.
20:30– Dinner
Available on Call.. for any deployment, production emergency problems.

8 Comments Leave a comment

A day in Life of datawarehousing Engineer Part-2

Read Previous part

A day in Life of datawarehousing Engineer Part-1

Normal Schedule for development role :
9:00-9:30 Check all mail communications of late night loads Etc.
9:30-10:30 Attend Scrum meeting to discuss update status of completed task mappings and mapping for New user stories requirements, understand big picture of work completed by other staff status.
10:30 am -1:30 pm Look at LLD, HLD to create source to target transformations after understanding business logic and coding that in transformations available with tool.
1:30-2:00 Lunch break
2:00-3:00 Unit test data set to validate as required between source and target.
3:00-3:30 Documentation requirements of completed work.
3:30-4:30 Attend defect Call To look into new defects in code and convey back if defects not acceptable as out of scope or not according to specifications.
4:30-5:00 Status update daily work to Team Lead.
5:00-5:30 sit with Team lead, architect code walk through and update with team.
5:30-6:30 Take up any defects raised in Defect meting and Code walk through.

0 Comments Leave a comment

Coke Vs Pepsi of :Datawarehousing ETL Vs ELT

The Coke and Pepsi are always fighting to have bigger pie in international drinks market.
Both are present in 180+ countries agressively pursuing the pie of market share.

The Datwarehouses are different animals on block. They are databases But they are not normalized. They do not follow all 12 Codd Rules. But yet source and Target are RDBMS.
The Structure Where its saved Whether in Star Schema or Snow-flake is denormalized as possible like flat file structures. More Constraints slows down the join process.
Read more: https://sandyclassic.wordpress.com/2014/01/26/a-day-in-life-of-business-intelligence-engineer/
So there are less restrained much Faster file based alternatives for databases which Emerged for need to store unstructured data and achieve 5V massive volume, variety, velocity etc.. Read below links:

Cloud Computing, 3V ,Data warehousing and Business Intelligence

Which are have also found favour in ETL world with Hadoop. Now Every ETL allows hadoop connector or adapter to Extract data from hadoop HDFS so service in HDFS and similar.
https://sandyclassic.wordpress.com/2013/06/18/bigdatacloud-business-intelligence-and-analytics/
(Adapters use-case for product offering Read:https://sandyclassic.wordpress.com/2014/02/05/design-pattern-in-real-world/)
ETL process
ETL Extract-Transform-Load
ETL where transformation happens in staging area.
Extract data from sources , put in staging area cleanse it, transform data and Then Load in Target Datawarehouse. So popular Tools like informatica, datastage or ab-initio use this approach. Like in Informatica for fetching data or Extract Phase we can use fast source-qualifier transformation OR use Joiner transformation when we have multiple different databases like both SQL Server and Oracle although may be slow but can take both input but Source qualifier may require single vendor but is fast.
After Extracting We can use Filter transformation to filter out unwanted rows in staging area. Then load into target Databases.

ELT Extract Load and then Transform.
Extract data from disparate sources , Load the data into RDBMS engine first after . Then use RDBMS facility to Cleanse and Transform Data. This Approach was popularised By Oracle because Oracle Already had Database Intellectual property and was motivated to increase its usage.So Why does cleansing and Transformation outside the RDBMS into staging area rather within RDBMS engine. Oracle ODI Oracle Data integrator uses this concept of ELT not ETL bit reversal from routine.

So like Pepsi Vs Cola wave of Advertisement and gorilla marketing or To showcase Each other products strengths and hide weakness Games continue here Also in ETL world of data warehousing. Each one has its own merits and demerits.

Cloud Computing relation to Business Intelligence and Datawarehousing
Read :
1. https://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/
2. https://sandyclassic.wordpress.com/2013/06/18/bigdatacloud-business-intelligence-and-analytics/

Cloud Computing and Unstructured Data Analysis Using
Apache Hadoop Hive
Read: https://sandyclassic.wordpress.com/2013/10/02/architecture-difference-between-sap-business-objects-and-ibm-cognos/
Also it compares Architecture of 2 Popular BI Tools.

Cloud Data warehouse Architecture:
https://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/

Future of BI
No one can predict future but these are directions where it moving in BI.
https://sandyclassic.wordpress.com/2012/10/23/future-cloud-will-convergence-bisoaapp-dev-and-security/

8 Comments Leave a comment

A Day in Life of Business Intelligence (BI) Architect- part 1

BI Architect most important responsibility is maintaining semantic Layer between Datawarehouse and BI Reports.
There are basically Two Roles of Architect: BI Architect or ETL Architect in data warehousing and BI. (ETL Architect in Future posts).
Semantic Layer Creation
Once data-warehouse is built and BI reports Needs to created. Then requirement gathering phase HLD High level design and LLD Low Level design are made.
Using HLD and LLD BI semantic layer is built in SAP BO its called Universe, in IBM Cognos using framework manager create Framework old version called catalogue, In Micro strategy its called project.
Once this semantic layer is built according to report data SQL requirements.
Note: Using semantic layer saves lot of time in adjustment of changed Business Logic in future change requests.
Real issues Example: Problems in semantic Layer creation like in SAP BO: Read

How to solve Fan trap and Chasm trap?

Report Development:
Reports are created using objects created by semantic layer.Complex reporting requirement for
1. UI require decision on flavour of reporting Tool like within
There are sets of reporting tool to choose from Like in IBM Cognos choose from Query Studio, Report Studio, Event Studio, Analysis Studio, Metric Studio.
2. Tool modification using SDK features are not enough then need to modify using Java/.net of VC++ API. At html level using AJAX javascript API or integrating with 3rd party API.
3. Report level macros/API for better UI.
4. Most important is data requirement my require Coding procedure at database or consolidations of various databases. Join Excel data with RDBMS and unstructured data using report level features. Data features may be more complex than UI.
5. user/data level security,LDAP integration.
6. Complex Scheduling of reports or bursting of reports may require modification using rarely Shell script or mostly Scheduling tool.
List is endless
Read More:
details of

A day in life of BI Engineer part 2

Integration with Third party and Security
After This BI’s UI has to fixed to reflect customer requirement. There might be integration with other products and seamless integration of users By LDAP. And hence Objects level security, User level security of report data according to User roles.
Like a Manager see report with data The same data may not be visible to clerk when he sees same report. Due filtering of data by user roles using User Level security.

BI over Cloud
setting BI over cloud Read blog.
Cloud Computing relation to Business Intelligence and Datawarehousing

Read :
1. https://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/

2. https://sandyclassic.wordpress.com/2013/06/18/bigdatacloud-business-intelligence-and-analytics/

Cloud Data warehouse Architecture:
https://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/

Future of BI
No one can predict future but these are directions where it moving in BI.
https://sandyclassic.wordpress.com/2012/10/23/future-cloud-will-convergence-bisoaapp-dev-and-security/

7 Comments Leave a comment

A day in life of BI Engineer part 2

Read Part1:

A day in Life of Business Intelligence Engineer

Part 2:
First few days should understand business otherwise cannot create effective reports.
9:00 -10am Meet customer to understands key facts which affect business.
10-12 prepare HLD High level Document containing 10,000 feet view of requirement.
version 1. it may refined later subsequent days.
12-1:30 attend scrum meeting to update status to rest of team. co-ordinate with Team Lead, Architect and project Manager for new activity assignment for new reports.
Usually person handling one domain area of business would be given that domain specific reports as during last report development resource already acquired domain knowledge.
And does not need to learn new domain..otherwise if becoming monotonous and want to move to new area. (like sales domain report for Chip manufactuers may contain demand planning etc…)
1:30-2:00 document the new reports to be worked on today.
2:00-2:30 Lunch
2:30-3:30 Look at LLD and HLD of new reports. find sources if they exist otherwise Semantic layer needs to modified.
3:30-4:00 co-ordinate with other resource reports requirement with Architect to modify semantic layer, and other reporting requirements.
4:00-5:00 Develop\code reports, conditional formatting,set scheduling option, verify data set.
5:00-5:30 Look at old defects rectify issues.(if there is separate team for defect handling then devote time on report development).
5:30-6:00 attend defect management call and present defect resolved pending issue with Testing team.
6:00-6:30 document the work done. And status of work assigned.
6:30-7:30 Look at report pending issues. Code or research work around.
7:30-8:00 report optimisation/research.
8:00=8:30 Dinner return back home.
Ofcourse has to look at bigger picture hence need to see what reports other worked on.
Then Also needed to understand ETL design , design rules/transformations used for the project. try to develop frameworks and generic report/code which can be reused.
Look at integration of these reports to ERP (SAP,peopesoft,oracle apps etc ), CMS (joomla, sharepoint), scheduling options, Cloud enablement, Ajax-fying reports web interfaces using third party library or report SDK, integration to web portals, portal creation for reports.
So these task do take time as and when they arrive.

10 Comments Leave a comment

Enterprise service Bus ESB, BPM orchestration and Process Re-engineering

This is next topic going to write
I covered ESB and BI architecture similarity in last post.

https://sandyclassic.wordpress.com/2013/10/02/architecture-difference-between-sap-business-objects-and-ibm-cognos/
Every Java web server has its own ESB in market like BEA weblogic’s Aqualogic as shown above.
BPM business Process Re-engineering BPR efforts are on everywhere due to troubled times this where Business Process Managment BPM products comes in where you can dynamically orchestrate business Processes to reflect change quite quickly using BPM products. Each Business Process is basically a web service or bundle of web service.
some like Pega BPM provide Extra feature:
http://www.pega.com/products/business-process-management
then there is XML structured language for BPM called BPEL Like Oracle BPEL which is declarative language for orchestrating Business Processes.

There vendor which have All in one product like Oracle Weblogic BPM, BPEL, ESB.
Aqualogic ESB is more popular

2 Comments Leave a comment

Approach to Best collaboration Management system

Collaboration tools integrated offering (course grain integration using ) integration tools like TIBCO, Oracle BPEL, : Components to be integrated:
1. Content management system CMS (SharePoint, Joomla, drupal) and
2. Document Management system like (liferay, Document-um, IBM file-net) can be integrated using flexible integration tools.

3. Communication platform like Windows Communication Foundation ,IBM lotus notes integrated with mail client and Social network like Facebook using Facebook API, LinkedIn API, twitter API ,skype API to direct plugin as well as data Analysis of Social networking platform unstructured data captured of the collaboration for the project discussion.
soft-phone using Skype offering recording conversation facility for later use.

https://sandyclassic.wordpress.com/2013/06/19/how-to-do-social-media-analysis/

Oracle Web centre:
https://sandyclassic.wordpress.com/2011/11/04/new-social-computing-war-oracle-web-centre/
4. Integrated Project specific Wikki/Sharepoint/other CMS pages integrated with PMO site Artefacts, Enterprise Architecture Artefacts.
5. seamless integration to Enterprise Search using Endeca or Microsoft FAST for discovery of document, information, answers from indexed,tagged repository of data.
6. Structured and Unstructured data : hosted on Hadoop clusters using Map-reduce algorithm to Analyse data, consolidate data using Hadoop Hive, HBase and mining to discover hidden information using data mining library in Mahout for unstructured data.
Structured data kept in RDBMS clusters like RAC rapid application clusters.
https://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/

https://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/
7. Integrated with Domain specific Enterprise resource planning ERP packages the communication, collaboration,Discovery, Search layer.
8. All integrated with mesh up architecture providing real-time information maps of resource located and information of nearest help.
9. messaging and communication layer integrated with all on-line company software.
10.Process Orchestration and integration Using Business Process Management tool BPM tool, PEGA BPM, Jboss BPM , windows workflow foundation depending landscape used.
11. Private cloud integration using Oracle cloud , Microsoft Azure, Eucalyptus, open Nebula integrated with web API other web platform landscape.
https://sandyclassic.wordpress.com/2011/10/20/infrastructure-as-service-iaas-offerings-and-tools-in-market-trends/
12. Integrated BI system with real time information access by tools like TIBCO spotfire which can analyse real time data flowing between integrated systems.
Data centre API and virtualisation plaform can also throw in data for analysis to hadoop cluster.
External links for reference: http://www.sap.com/index.epx
http://www.oracle.com, http://www.tibco.com/,http://spotfire.tibco.com/,
http://scn.sap.com/thread/1228659
SAP XI: http://help.sap.com/saphelp_nw04/helpdata/en/9b/821140d72dc442e10000000a1550b0/content.htm

Oracle Web centre: http://www.oracle.com/technetwork/middleware/webcenter/suite/overview/index.html

CMS: http://www.joomla.org/,http://www.liferay.com/, http://www-03.ibm.com/software/products/us/en/filecontmana/
Hadoop: http://hadoop.apache.org/

Map reduce: http://hadoop.apache.org/docs/stable/mapred_tutorial.html
facebook API: https://developers.facebook.com/docs/reference/apis/
Linkedin API: http://developer.linkedin.com/apis
Twitter API: https://dev.twitter.com/

9 Comments Leave a comment

Master Data Management Tools in market.

MDM:-> What does it do?

MDM seeks to ensure that an organization does not use multiple version/terms (potentially inconsistent) versions of the same master data in different parts of its operations, which can occur in large organizations.Thus CRM, DW/BI, Sales,Production ,finance each has its own way of representing things

There are lot of Products in MDM space One that have good presence in market are:

Tibco Information collaboration tool leader

Collaborative Information Manager.

– work on to standardize across ERP,CRM,DW,PLM

– cleanising and aggregation.

– distribute onwers to natural business users of data(sales,Logistics,Finance,HR,Publishing)

– automated Business Processes to clollaborate to maintain info asset and data governace poilcy

– built in data models can extended (industry template,validation rule)

– built in process to manage change elliminate confusion manageing change ,estb clear audit and governace trail for reporting.

– sync relevant subset of info downstream application trading partner and exchanges.SOA to pass data to as web service to composite applications.

IBM MDM Inforsphere MDM Server

Still its incomplete i will continue to add on this.

Product detail( informatica.com)

source: (http://www.biia.com/wp-content/uploads/2012/01/White-Paper-1601_big_data_wp.pdf)

Short Notes below taken from source:+ My comments on them.

Informatica MDM capabilities:

Informatica 9.1 supplies master data management (MDM) and data quality technologies to

enable your organization to achieve better business outcomes by delivering authoritative, trusted data to business processes, applications, and analytics, regardless of the diversity or scope of Big

Data.

Single platform for all MDM architectural styles and data domains Universal MDM capabilities

in Informatica 9.1 enable your organization to manage, consolidate, and reconcile all master

data, no matter its type or location, in a single, unified solution. Universal MDM is defined by four

characteristics:

• Multi-domain: Master data on customers, suppliers, products, assets, locations, can be managed, consolidated, and accessed.

• Multi-style: A flexible solution may be used in any style: registry, analytical, transactional, or

co-existence.

• Multi-deployment: The solution may be used as a single-instance hub, or in federated, cloud, or service architectures.

• Multi-use: The MDM solution interoperates seamlessly with data integration and data quality technologies as part of a single platform.

Universal MDM eliminates the risk of standalone, single MDM instances—in effect, a set of data silos meant to solve problems with other data silos.

• Flexibly adapt to different data architectures and changing business needs

• Start small in a single domain and extend the solution to other enterprise domains, using any style

• Cost-effectively reuse skill sets and data logic by repurposing the MDM solution

“No data is discarded anymore!

U.S. xPress leverages a large scale of transaction data and a diversity of interaction data, now extended

to perform big data processing like Hadoop with Informatica 9.1. We assess driver performance with image files and pick up

customer behaviors from texts by customer service reps. U.S. xPress saved millions of dollars per year by reducing fuels and optimizing

routes augmenting our enterprise data with sensor, meter, RFID tags, and geospatial data.” Tim Leonard Chief Technology Officer

Source: U.S. xPress Big Data Unleashed: Turning Big Data into Big Opportunities with the Informatica 9.1 Platform.

Reusable data quality policies across all project types Interoperability among the MDM, data quality, and data integration capabilities in Informatica 9.1 ensures that data quality rules can

be reused and applied to all data throughout an implementation lifecycle, across both MDM and data integration projects (see Figure 3).

• Seamlessly and efficiently apply data quality rules regardless of project type, improving data accuracy

• Maximize reuse of skills and resources while increasing ROI on existing investments

• Centrally author, implement, and maintain data quality rules within source applications and propagate downstream

Proactive data quality assurance Informatica 9.1 delivers technology that enables both business and IT users to proactively monitor and profile data as it becomes available, from

internal applications or external Big Data sources. You can continuously check for completeness, conformity, and anomalies and receive alerts via multiple channels when data quality issues are

found.

• Receive “early warnings” and proactively identify and correct data quality problems before they happen

• Prevent data quality problems from affecting downstream applications and business processes

• Shorten testing cycles by as much as 80 percent

Putting Authoritative and Trustworthy Data to Work

The diversity and complexity of Big Data can worsen the data quality problems that exist in

many organizations. Standalone, ad hoc data quality tools are ill equipped to handle large-scale

streams from multiple sources and cannot generate the reliable, accurate data that enterprises

need. Bad data inevitably means bad business. In fact, according to a CIO Insight report, 46

percent of survey respondents say they’ve made an inaccurate business decision based on bad or

outdated data.9

MDM and data quality are prerequisites for making the most of the Big Data opportunity. Here are

two examples:

•

Using social media data to attract and retain customers For some organizations, tapping

social media data to enrich customer profiles can be putting the cart before the horse. Many

companies lack a single, complete view of their customers, ranging from reliable and consistent

names and contact information to the products and services in place. Customer data is

often fragmented across CRM, ERP, marketing automation, service, and other applications.

Informatica 9.1 MDM and data quality enable you to build a complete customer profile from

multiple sources. With that authoritative view in place, you’re poised to augment it with the

intelligence you glean from social media.

•

Data-driven response to business issues Let’s say you’re a Fortune 500 manufacturer and

a supplier informs you that a part it sold you is faulty and needs to be replaced. You need

answers fast to critical questions: In which products did we use the faulty part? Which

customers bought those products and where are they? Do we have substitute parts in stock?

Do we have an alternate supplier?

But the answers are sprawled across multiple domains of your enterprise—your procurement

system, CRM, inventory, ERP, maybe others in multiple countries. How can you respond swiftly

and precisely to a problem that could escalate into a business crisis? Business issues often

span multiple domains, exerting a domino effect across the enterprise and confounding

an easy solution. Addressing them depends on seamlessly orchestrating interdependent

processes—and the data that drives them.

With the universal MDM capabilities in Informatica 9.1, our manufacturer could quickly locate

reliable, authoritative master data to answer its pressing business questions, regardless of

where the data resided or whether multiple MDM styles and deployments were in place.

Self-Service

Big Data’s value is limited if the business depends on IT to deliver it. Informatica 9.1 enables your

organization to go beyond business/IT collaboration to empower business analysts, data stewards,

and project owners to do more themselves without IT involvement with the following capabilities

Analysts and data stewards can assume a greater role in

defining specifications, promoting a better understanding of the data, and improving productivity

for business and IT.

• Empower business users to access data based on business terms and semantic metadata

• Accelerate data integration projects through reuse, automation, and collaboration

• Minimize errors and ensure consistency by accurately translating business requirements into

data integration mappings and quality rules

Application-aware accelerators for project owners:

empowers project owners to rapidly understand and access data for data

warehousing, data migration, test data management, and other projects. Project owners can

source business entities within applications instead of specifying individual tables that require

deep knowledge of the data models and relational schemas.

•Reduce data integration project delivery time

•Ensure data is complete and maintains referential integrity

• Adapt to meet business-specific and compliance requirements

Informatica 9.1 introduces complex event processing (CEP) technology into data quality and

integration monitoring to alert business users and IT of issues in real time. For instance, it will notify an analyst if a data quality key performance indicator exceeds a threshold, or if integration processes differ from the norm by a predefined percentage.

• Enable business users to define monitoring criteria by using prebuilt templates

• Alert business users on data quality and integration issues as they arise

• Identify and correct problems before they impact performance and operational systems

• Speeding and strengthening business effectiveness Informatica 9.1 makes “MDM-aware”

everyday business applications such as Salesforce.com, Oracle, Siebel, SAP for CRM, ERP, and

others by presenting reconciled master data directly within those applications. For example,

Informatica’s MDM solution will advise a salesperson creating a new account for “John Jones”

that a customer named Jonathan Jones, with the same address, already exists. Through

the Salesforce interface, the user can access complete, reliable customer information that

Informatica MDM has consolidated from disparate applications.

She can see the products and services that John has in place and that he follows her

company’s Twitter tweets and is a Facebook fan. She has visibility into his household and

business relationships and can make relevant cross-sell offers. In both B2B and B2C scenarios,

MDM-aware applications spare the sales force from hunting for data or engaging IT while

substantially increasing productivity.

• Giving business users a hands-on role in data integration and quality Long delays and

high costs are typical when the business attempts to communicate data specifications to

IT in spreadsheets. Part of the problem has been the lack of tools that promote business/IT

collaboration and make data integration and quality accessible to the business user.

As Big Data unfolds, Informatica 9.1 gives analysts and data stewards a hands-on role. Let’s

say your company has acquired a competitor and needs to migrate and merge new Big Data

into your operational systems. A data steward can browse a data quality scorecard and identify

anomalies in how certain customers were identified and share a sample specification with IT.

Once validated, the steward can propagate the specification across affected applications. A

role-based interface also enables the steward to view data integration logic in semantic terms

and create data integration mappings that can be readily understood and reused by other

business users or IT. Big Data Unleashed: Turning Big Data into Big Opportunities with the Informatica 9.1 Platform

4 Comments Leave a comment

Enterprise Architecture TOGAF,ITIL,Zachman,eTom,NGOSS

A list of ALL ENTERPRISE FRAMEWORKS…

– Business Frameworks (e.g. BMM, Six Markets, Porters 5 Forces, McKinsey 7S, etc)
– Architecture Frameworks (e.g. PEAF, MODAF, TOGAF, TEAF, IAF, etc)
– Programming Frameworks (e.g. .NET, J2EE, etc)
– Project Management Frameworks (e.g. PRINCE2, MSP, etc)
– Service Management Frameworks (e.g. ITIL, etc)
– Industry Operations Frameworks (e.g. eTOM, Pragmatic Marketing, etc)
– Any other type of framework in any other domain so long as it is related to an ENTERPRISE.

There are various field of Architecture floating around in IT industry.Specially in indian IT industry where things are still taking shape.HR department is india in most forgotten department it still need to learn more about Labour laws and nothing more..Top institution in india for HR which was essentially focus on labour laws and nothing more.Other subjects are taught not with great emphassis..Problem that is okay for manufacturing industry where management is dealing with unskilled uneducated manpower..but same is not true for Industry.Institutes still need to inculcate basic principle of psychological analysis , service industry and basic computer knowledge so people can visualize trends in skills needed to shape future of employee in company.

So For Industry still under trap of HR to define there own terminology for different job skills.In country outside india for which indian IT companies are executing project Position are well defined.Take a case Architect in software industry.

Architects can be classied into two category:

1)Enterprise Architects.

Architect working at enterprise level with various certification level like ITIL( process based for Service Management),TOGAF for Thne

Zackman for domain model, there are customised framework like NGOSS and e-TOM for Telecom industry

2) Domain specific Architects.

Domain Architect can be classied into four major types:

Application or Language Architect (Java Architect,.NET Architect,PHP Architect, open source Architect, Integration Architect, Platform As Service Architect) . These architects are primarily concerned with design patterns and application integration and language specific issues, options available like in integration web service is course grain integration but there are other option like CORBA,JNI web services is last option. So Java architect is SME for Java related design issues. same way .NET Architect, PHP Architect.Integration Architect are looking at how integrated different layer(ESB,Web services, COM,CORBA etc..).
Database/Data Architect (Data Architect, ETL Architect, BI Architect, Performance Architect)

Data Architect: concerned with logical data modelling (LDM) and Physical database modelling (PDM), design of OLTP system and design normalized scehmas and table structure.
ETL Architect : Takes care of Data Integration layer (Extract Transform Load data from SQL server to Oracle from SAP populated tables into target datawarehouse. Designing or Star schema or extended star schema for Data warehouse.
BI Architect: Looks to create semantic Layer which helps user to convert dynamically dragged objects to there respective SQL equivalent.and reporting layouts ,data structure which support them,layout to of report on web on mobile or on portal.reports from tactical to strategic reporting to Exception reporting and business events.
Performance Architects are concerned performance to Web servers , data bases performance ,query response time, latency etc.

3. Network Architect (Storage Architect, Network Architect, System Architect,Virtualisation or Infrastructure As Service IAAS.

Storage Architect: Concerned with SAN storage and design of storage network its interface with network components. whether to go for SAN or NAS or DAS ,RAID levels, SCSI or Fiber channel , servers, vendors to go with (EMC Clarion,netapp,hitachi) etc.analysts predict that enterprise storage will account for 75% of all computer hardware expenditures so storage is critical.
Network Architect: Network architecture have exploded in recent decade.Network architect needs to look into network Design, specification,Network security. Like Cisco defines 5 fields (1. Voice like VOIP, 2. Security, 3. Design, 4.Routing and Switching, 5.Wireless).Each one can have its own architects.person may not be specific to Cisco but can cut across to like juniper network junos firewall, or sonic firewall is in network security.
System Architect: concerned Types of servers used and Os Admin related activities.e,g Red hat admin subject matter expert with server datacenter know how, blade vs RACK, virtualisation layers,hypervisor (vmware ESXi, Microsoft hyperV, IBM AIX LPAR) etc.

4. Architect not defined above,

ERP Architects:

SAP Architect(since SAP has its own solution framework like SAP Solution Manager which help in maintaining and upkeep of solution and framework at solution or practice level, Also like SAP BI is domain specific architect who know out for 16 reporting tool flavor from WebI,Xcelsius,DesKI, live office, Quaas which one to use when).
Peoplesoft Architect: Looks into PS specific techno-functional issues customization , enhancement, or performance issue, installation and data migration, its integration with different web server app server.
, Oracle Apps Architect.: Oracle Application domain specific solution to use of Oracle BPEL, ESB, SOA suite, fusion middle ware vs old form report customization Vs used of OAF or ADF.