-->

Monday 20 December 2010

New Pentaho Kettle book

 

Hi all,

Last year, at the same time, I wrote and article about the first Pentaho book published. Simply called Pentaho Solutions, this book covers the basics of datawarehousing and Pentaho tools. You can find the original article here and order this valuable book here.

Recently, I received my own review copy of a long awaited Pentaho book : Pentaho Kettle Solutions – Building Open Source ETL Solutions with Pentaho Data Integration.

Thanks to Roland, Matt and Jos for sending me this new book.

image

ISBN: 978-0-470-63517-9

Matt Casters is the Pentaho Chief of Data Integration and Kettle founder (Kettle’s dad, it’s him). Have a look to his prolific blog here.

Roland is an IT expert, ranging from web application development and business process analysis to business intelligence. He co-authored the MySQL Cluster 5.1 Certification Study Guide. Please, have a look to his blog.

Jos is a BI expert with more than 15 years of experience. He created Tholis Consulting and is also covering BI developments for the Dutch Database Magazine.

 

What is this book about ?

The first book, Pentaho Solutions, was aimed at discovering the basics of BI and Pentaho usage. Now, with this new book, we go deeper into hardcode dataprocessing and datawarehousing using Kettle. But it is not exclusively focused on Kettle : a strong emphasis is placed on data processing basics, technics and theory (Codd vs Kimbal …). Reading this book will get you to the next upper level on these two topics :

  • Data processing and how to build / feed a datawarehouse,
  • Kettle development, customization and advanced usage.

 

Book summary

  • Introduction
    • What’s an ETL and what are Kettle key concepts
    • How to install Kettle and configure it
    • Real life example : Sakila datawarehouse
  • ETL and ETL subsystems
    • What are the famous ETL Subsystems (Kimabal). A very detailed and inspiring chapter.
    • Extraction
    • Cleansing and conforming
    • Handling dimension tables
    • Loading facts tables
    • Working with OLAP data.
  • Management and deployment
    • Typical ETL development lifecycle. A must read here !
    • Scheduling and monitoring
    • Versioning
    • Lineage and auditing.
  • Performance and scalability
    • Performance tuning. Here again, a must read that will give you precious information on how to make your Kettle set up reach the hills of performance and stability.
    • Parallelization, clustering and Partitioning : my favorite. You have big data and / or strong constraints ? Think parallel and start building your own Kettle cluster / parallel set up. As to me, the best chapter ever written on this topic, all ETLs included.
    • Dynamic clustering in the cloud. Once again my favorite. You all know my passion for Cloud Computing ! Very technical article, you need real experience on using AWS tools and APIs.
  • Advanced topics
    • Data Vault Management : interesting concept. You will learn about Data Vault and discover this mixed (Codd with 3NF / Kimbal with star schema) approach in detail.
    • Handling complex data formats.
    • Web Services. I love that one too ! More and more datawarehouses are now feeded by using web services. Learn how to feed yours by leveraging Kettle.
    • Kettle integration
    • Extending Kettle. Yummy ! If, like me, you created your own Kettle plugins or want to, this chapter is a must read. Java programing experience is needed.
  • The Kettle Ecosystem
    • Kettle enterprise edition features : comparative matrix.
    • Built in variables and properties reference : a must read in order to be aware of Kettle internals, and be able to create fully automatised / self sufficient jobs.

My opinion

This book is a fantastic concentration of knowledge. You will learn from ETL basics, advanced topics, performance management, Kettle development and cloud dataprocessing. Matt, Roland and Jos met a risky challenge : writing a book that do the splits from basic knowledge to high level technics while staying focused on how to use Kettle to solve actual and concrete data problems.

They succeeded.

This book is now sitting on my reference BI shelf, it entered my personal BI Book Hall of Fame.

10 comments:

Sylvain (osbi.fr) said...
This comment has been removed by the author.
Sylvain (osbi.fr) said...

Hi Vincent

I totally agree with you: "Pentaho Kettle Solutions" is a 3 stars book on ETL/data warehousing with Kettle.
More info on other books about Pentaho here (Pentaho User Group France)

See you again for sharing some things about OSBI ;-)

Sylvain

Macrobid said...

hi,
This book is a fantastic concentration of knowledge.we are a risky challenge :this book writing that do the splits from basic knowledge to high level and concrete data problems.
_________________

Loestrin said...

hi,
I want to learn about datawarehouse and data mining
is this book benificial for me?
can I learn database management system from this book also...
please give me suggestion...
____________________

Anonymous said...

Great blog right here! Also your web site loads up very fast!
What web host are you the use of? Can I am getting your affiliate hyperlink
for your host? I wish my website loaded up as fast as yours lol

Here is my site; www.joiningthedots.org

Anonymous said...

You really make it appear really easy with your presentation but I find this matter to be actually something
which I think I would never understand. It kind of feels too complex and very
huge for me. I am having a look forward in your subsequent put up, I
will try to get the hold of it!

My weblog; fiber supplements especially
my web site > health benefits

Anonymous said...

I'm gone to convey my little brother, that he should also pay a visit this website on regular basis to obtain updated from latest news.

Here is my weblog ... cebustreets.net

Anonymous said...

What's Taking place i am new to this, I stumbled upon this I have found It positively useful and it has helped me out loads. I am hoping to give a contribution & assist other customers like its helped me. Good job.

Feel free to surf to my weblog - great tasting coffee

Anonymous said...

What's Going down i am new to this, I stumbled upon this I've found It
absolutely helpful and it has aided me out loads. I'm hoping to give a contribution & assist other customers like its aided me. Good job.

Visit my web page; vintage clothing

Anonymous said...

This is a really good tip especially to those new to the blogosphere.
Brief but very accurate information… Many thanks for sharing this one.
A must read post!

my web page: http://www.omc-boats.com/wiki/index.php?title=Glamorous_Women_s_Clothing_For_Over_40