This article describes how to connect Tableau to a Cloudera Hadoop database and set up the data source.
Hi, I've download cloudera CDH5. Now I'm configuring Eclipse (on my host machine) woth hadoop plugin. What parameters I have to specify in: 'Define Hadoop Location - Map/Reduce Master - Port' and 'Define Hadoop Location - DFS Master - Port'? I also specify Host with ip of Clouder CDH5 VMware. May 04, 2011. Apr 19, 2017.
Note: For new connections to Impala databases, use the Impala connector rather than this one. (You can continue using this connector for existing connections.)
Before you begin
Before you begin, gather this connection information:
Driver required
This connector requires a driver to talk to the database. You might already have the required driver installed on your computer. If the driver is not installed on your computer, Tableau displays a message in the connection dialog box with a link to the Driver Download(Link opens in a new window) page where you can find driver links and installation instructions.
Note: Make sure you use the latest available drivers. To get the latest drivers, see Cloudera Hadoop(Link opens in a new window) on the Tableau Driver Download page.
Make the connection and set up the data source
Sign in on a Mac
If you use Tableau Desktop on a Mac, when you enter the server name to connect, use a fully qualified domain name, such as mydb.test.ourdomain.lan, instead of a relative domain name, such as mydb or mydb.test.
Alternatively, you can add the domain to the list of Search Domains for the Mac computer so that when you connect, you need to provide only the server name. To update the list of Search Domains, go to System Preferences > Network > Advanced, and then open the DNS tab.
Work with Hadoop Hive data![]() Work with date/time data
Tableau supports TIMESTAMP and DATE types natively. However, if you store date/time data as a string in Hive, be sure to store it in ISO format (YYYY-MM-DD). You can create a calculated field that uses the DATEPARSE or DATE function to convert a string to a date/time format. Use DATEPARSE() when working with an extract, otherwise use DATE(). For more information, see Date Functions.
For more information about Hive data types, see Dates(Link opens in a new window) on the Apache Hive website.
NULL value returned
A NULL value is returned when you open a workbook in Tableau 9.0.1 and later and 8.3.5 and later 8.3.x releases that was created in an earlier version and has date/time data stored as a string in a format that Hive doesn't support. To resolve this issue, change the field type back to String and create a calculated field using DATEPARSE() or DATE() to convert the date. Use DATEPARSE() when working with an extract, otherwise use the DATE() function.
High latency limitation
Hive is a batch-oriented system and is not yet capable of answering simple queries with very quick turnaround. This limitation can make it difficult to explore a new data set or experiment with calculated fields. Some of the newer SQL-on-Hadoop technologies (for example, Cloudera's Impala and Hortonworks' Stringer project) are designed to address this limitation.
See also
Cloudera Data Science Workbench
Cloudera Data Science Workbench enables fast, easy, and secure self-service data science for the enterprise.
Cloudera Virtualbox DownloadHortonworks Sandbox
Hortonworks Sandbox can help you get started learning, developing, testing and trying out new features on HDP and HDF.
Cloudera Manager
A unified interface to manage your enterprise data hub. Express and Enterprise editions available.
Hortonworks Data Platform (HDP)Cloudera Virtual Machine Download
Hortonworks Data Platform (HDP) helps enterprises gain insights from structured and unstructured data. It is an open source framework for distributed storage and processing of large, multi-source data sets.
Download Cloudera Quickstart VmCloudera CDH
Cloudera's open source software distribution including Apache Hadoop and additional key open source projects
Cloudera DataFlow (Ambari)
Cloudera DataFlow (Ambari)âformerly Hortonworks DataFlow (HDF)âis a scalable, real-time streaming analytics platform that ingests, curates and analyzes data for key insights and immediate actionable intelligence.
Cloudera Workload XM
Workload XM proactively assists, de-risks, and advises Cloudera Platform users at every phase of your data intensive application lifecycle
DataPlaneCloudera Hadoop Cluster
A unified platform for a hybrid data environment.
Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2020
Categories |