28

数据仓库实施方法论介绍

1.   Different Approaches

There are several ways to build a data warehouse, among which the most famous two are Top-down approach and Bottom-up approach.

Besides these industrial approaches, approaches most frequently used by unprofessional DW builders are building firstly Data marts then DW from reporting needs, building DW by simply copying of transactional databases, and there are some other methods they can find when meeting the urgent schedule of the project.

Building from reporting needs means data marts design is only targeted to the reports have been known, and data for these reports are guaranteed to be prepared in data warehouse. This will result in isolated data marts, redundant DW, inconsistent data and limited capability.

Building from copying of transactional databases is too simple to support anything more than a backup. Data in DW is still mysterious to business users by this approach. Only the data used by the requirements we have specified now might be modeled by service provider, and the rest would be left in original state.

Besides the four I mentioned, there are still more approaches, such as hybrid approach, federated approach, etc.

2. Top-down approach

Centralized data warehouse, distributed data marts. Normalized (3NF) enterprise wide data model, so called enterprise DW, is used to integrate data from various source systems. Departmental data marts are built upon this DW.

Pros: 

1) Enterprise oriented DW can act as an agent when integrating complex, numerous legacy systems, which may distributed in branches (located in different cities/countries).

2) Normalized detail data can be rearranged and re-purposed when meet unexpected needs.

Cons: 

1) Initialization is longer and costs more, because modeling is costly and is difficult to judge whether it’s reasonable.

2) No unique way to access summary data in data marts and detail data in DW. Though we can drill through from data marts to DW, it’s unnatural to business users and must be in fixed path we set beforehand.

3) Gap between data and needs, large depending of IT staffs.

Comments:

       This method is frequently used in the projects called ‘Large-Centralizing of Data’. This approach integrates underlying data first, and it’s suitable for these large, distributed customers to gather their data to support unknown decision purposes later.

It is prefered in large-scale organizations, or companies expanding itself with a purchasing-and-acquirement style frequently. The effective use of DW in application level is decided by data marts.

3. Bottom-up approach

Emphasize on business process dimensional model, which are based on conformed dimensions. Conformed dimensions are regarded as the backbone of the data warehouse, just like data bus or master data. All through the DW, data is organized in a dimensionally structured model, analyzing-targeted. The fact table’s grain is at the lowest, most atomic level so that users can roll up or drill down as needed.

Pros: 

1) Comes from business processes at the beginning, organized in dimensional style, business oriented and user friendly.

2) Deployed rapidly. Conformed (unified and shared) dimensions eliminate redundant data extracts, heavy infrastructure data structures, and data inconsistency;

3) Data warehouse contains detailed data in dimensional style from the beginning. All facts can be drilled down & up, rotated, sliced, etc. Summarized fact or OLAP can be built later to improve performance but keeps the same user experience (detailed data and summarized data have the same dimensional structure).

4) With conformed dimensions, new changes are not impacted, but accumulated into DW iteratively.

Cons: 

1) Consistent use of dimensions and facts throughout the enterprise is demanded.

2) Must understand the business to identify processes and activities. So it needs business divisions’ participating more.

Comments:

        It’s applied in projects when business scope is definite and the organization structure is not complex, business is supported by centralized systems, and there are clear processes and activities all through the enterprise.

4. Reference

http://www.kimballuniversity.com/

http://www.inmoncif.com/

VN:F [1.9.3_1094]
Rating: 0.0/10 (0 votes cast)

相关文章:

  1. 增量数据抽取的策略及方法介绍
  2. 商业智能发展趋势及BI 2.0定义的相关观点汇总
  3. [商业智能翻译小组] 数据仓库成熟度模型

当前没有评论!

第一个在本文留言。

发表评论

名字(必须)
邮箱(必须),(永不被公布)
网址(建议)

字体为 粗体 是必填项目,邮箱地址 永远不会 公布。

允许部分 HTML 代码:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
URLs(网站链接)必须完整有效 (比如: http://www.bi-professional.com),所有标签都必须完整的关闭。

超出部分系统将会自动分段及换行。

请保证评论内容是与日志或 Blog 内容相关的,灌水、攻击性或不恰当的评论 可能 会被编辑或删除。