Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Fidji Simo joins OpenAI as new CEO of Applications
    • 12 Terrific Sci-Fi Movies You Need on Your Prime Video Watch List
    • Uber Revenue Is Up 14%, Despite Economic Fears
    • AI and Behavioral Finance: Predicting Investor Behavior
    • Why the humanoid workforce is running late
    • Your Robotiq Configurator just got smarter: Meet PowerPick Multi
    • Psychedelics disrupt fear immune system link in new study
    • The unlikely rise of ‘boring’ tech: 2025’s most surprising marketing trend
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Friday, May 9
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»No More Tableau Downtime: Metadata API for Proactive Data Health
    Artificial Intelligence

    No More Tableau Downtime: Metadata API for Proactive Data Health

    Editor Times FeaturedBy Editor Times FeaturedMarch 21, 2025No Comments15 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    In as we speak’s world, the reliability of knowledge options is the whole lot. Once we construct dashboards and studies, one expects that the numbers mirrored there are right and up-to-date. Based mostly on these numbers, insights are drawn and actions are taken. For any unexpected cause, if the dashboards are damaged or if the numbers are incorrect — then it turns into a fire-fight to repair the whole lot. If the problems usually are not mounted in time, then it damages the belief positioned on the info staff and their options. 

    However why would dashboards be damaged or have unsuitable numbers? If the dashboard was constructed accurately the primary time, then 99% of the time the difficulty comes from the info that feeds the dashboards — from the info warehouse. Some potential eventualities are:

    • Few ETL pipelines failed, so the brand new information isn’t but in
    • A desk is changed with one other new one 
    • Some columns within the desk are dropped or renamed
    • Schemas in information warehouse have modified
    • And lots of extra.

    There may be nonetheless an opportunity that the difficulty is on the Tableau website, however in my expertise, many of the occasions, it’s all the time because of some modifications in information warehouse. Although we all know the basis trigger, it’s not all the time easy to start out engaged on a repair. There may be no central place the place you may test which Tableau information sources depend on particular tables. When you have the Tableau Data Management add-on, it might assist, however from what I do know, its onerous to search out dependencies of customized sql queries utilized in information sources.

    However, the add-on is simply too costly and most firms don’t have it. The actual ache begins when it’s a must to undergo all the info sources manually to start out fixing it. On high of it, you’ve gotten a string of customers in your head impatiently ready for a quick-fix. The repair itself may not be troublesome, it will simply be a time-consuming one.

    What if we might anticipate these points and establish impacted information sources earlier than anybody notices an issue? Wouldn’t that simply be nice? Nicely, there’s a means now with the Tableau Metadata API. The Metadata API makes use of GraphQL, a question language for APIs that returns solely the info that you simply’re considering. For more information on what’s potential with GraphQL, do try GraphQL.org.

    On this weblog publish, I’ll present you ways to hook up with the Tableau Metadata API utilizing Python’s Tableau Server Consumer (TSC) library to proactively establish information sources utilizing particular tables, to be able to act quick earlier than any points come up. As soon as you recognize which Tableau information sources are affected by a selected desk, you may make some updates your self or alert the homeowners of these information sources in regards to the upcoming modifications to allow them to be ready for it.

    Connecting to the Tableau Metadata API

    Lets connect with the Tableau Server utilizing TSC. We have to import in all of the libraries we would wish for the train!

    ### Import all required libraries
    import tableauserverclient as t
    import pandas as pd
    import json
    import ast
    import re

    With a purpose to connect with the Metadata API, you’ll have to first create a private entry token in your Tableau Account settings. Then replace the & with the token you simply created. Additionally replace along with your Tableau website. If the connection is established efficiently, then “Related” will probably be printed within the output window.

    ### Hook up with Tableau server utilizing private entry token
    tableau_auth = t.PersonalAccessTokenAuth("", "", 
                                               site_id="")
    server = t.Server("https://dub01.on-line.tableau.com/", use_server_version=True)
    
    with server.auth.sign_in(tableau_auth):
            print("Related")

    Lets now get a listing of all information sources which can be revealed in your website. There are lots of attributes you may fetch, however for the present use case, lets maintain it easy and solely get the id, identify and proprietor contact info for each information supply. This will probably be our grasp checklist to which we are going to add in all different info.

    ############### Get all of the checklist of knowledge sources in your Web site
    
    all_datasources_query = """ {
      publishedDatasources {
        identify
        id
        proprietor {
        identify
        electronic mail
        }
      }
    }"""
    with server.auth.sign_in(tableau_auth):
        end result = server.metadata.question(
            all_datasources_query
        )

    Since I would like this weblog to be focussed on the way to proactively establish which information sources are affected by a selected desk, I’ll not be going into the nuances of Metadata API. To raised perceive how the question works, you may consult with a really detailed Tableau’s personal Metadata API documentation.

    One factor to notice is that the Metadata API returns information in a JSON format. Relying on what you might be querying, you’ll find yourself with a number of nested json lists and it may possibly get very difficult to transform this right into a pandas dataframe. For the above metadata question, you’ll find yourself with a end result which would love beneath (that is mock information simply to offer you an thought of what the output seems like):

    {
      "information": {
        "publishedDatasources": [
          {
            "name": "Sales Performance DataSource",
            "id": "f3b1a2c4-1234-5678-9abc-1234567890ab",
            "owner": {
              "name": "Alice Johnson",
              "email": "[email protected]"
            }
          },
          {
            "identify": "Buyer Orders DataSource",
            "id": "a4d2b3c5-2345-6789-abcd-2345678901bc",
            "proprietor": {
              "identify": "Bob Smith",
              "electronic mail": "[email protected]"
            }
          },
          {
            "identify": "Product Returns and Profitability",
            "id": "c5e3d4f6-3456-789a-bcde-3456789012cd",
            "proprietor": {
              "identify": "Alice Johnson",
              "electronic mail": "[email protected]"
            }
          },
          {
            "identify": "Buyer Segmentation Evaluation",
            "id": "d6f4e5a7-4567-89ab-cdef-4567890123de",
            "proprietor": {
              "identify": "Charlie Lee",
              "electronic mail": "[email protected]"
            }
          },
          {
            "identify": "Regional Gross sales Tendencies (Customized SQL)",
            "id": "e7a5f6b8-5678-9abc-def0-5678901234ef",
            "proprietor": {
              "identify": "Bob Smith",
              "electronic mail": "[email protected]"
            }
          }
        ]
      }
    }

    We have to convert this JSON response right into a dataframe in order that its simple to work with. Discover that we have to extract the identify and electronic mail of the proprietor from contained in the proprietor object. 

    ### We have to convert the response into dataframe for straightforward information manipulation
    
    col_names = end result['data']['publishedDatasources'][0].keys()
    master_df = pd.DataFrame(columns=col_names)
    
    for i in end result['data']['publishedDatasources']:
        tmp_dt = {ok:v for ok,v in i.objects()}
        master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
    
    # Extract the proprietor identify and electronic mail from the proprietor object
    master_df['owner_name'] = master_df['owner'].apply(lambda x: x.get('identify') if isinstance(x, dict) else None)
    master_df['owner_email'] = master_df['owner'].apply(lambda x: x.get('electronic mail') if isinstance(x, dict) else None)
    
    master_df.reset_index(inplace=True)
    master_df.drop(['index','owner'], axis=1, inplace=True)
    print('There are ', master_df.form[0] , ' datasources in your website')

    That is how the construction of master_df would seem like:

    Pattern output of code

    As soon as we’ve got the principle checklist prepared, we are able to go forward and begin getting the names of the tables embedded within the information sources. In case you are an avid Tableau person, you recognize that there are two methods to choosing tables in a Tableau information supply — one is to immediately select the tables and set up a relation between them and the opposite is to make use of a customized sql question with a number of tables to attain a brand new resultant desk. Subsequently, we have to deal with each the instances.

    Processing of Customized SQL question tables

    Under is the question to get the checklist of all customized SQLs used within the website together with their information sources. Discover that I’ve filtered the checklist to get solely first 500 customized sql queries. In case there are extra in your org, you’ll have to use an offset to get the following set of customized sql queries. There may be additionally an possibility of utilizing cursor methodology in Pagination if you wish to fetch giant checklist of outcomes (refer here). For the sake of simplicity, I simply use the offset methodology as I do know, as there are lower than 500 customized sql queries used on the positioning.

    # Get the info sources and the desk names from all of the customized sql queries used in your Web site
    
    custom_table_query = """  {
      customSQLTablesConnection(first: 500){
        nodes {
            id
            identify
            downstreamDatasources {
            identify
            }
            question
        }
      }
    }
    """
    
    with server.auth.sign_in(tableau_auth):
        custom_table_query_result = server.metadata.question(
            custom_table_query
        )

    Based mostly on our mock information, that is how our output would seem like:

    {
      "information": {
        "customSQLTablesConnection": {
          "nodes": [
            {
              "id": "csql-1234",
              "name": "RegionalSales_CustomSQL",
              "downstreamDatasources": [
                {
                  "name": "Regional Sales Trends (Custom SQL)"
                }
              ],
              "question": "SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Gross sales s JOIN ecommerce.sales_data.Areas r ON s.region_id = r.region_id GROUP BY r.region_name"
            },
            {
              "id": "csql-5678",
              "identify": "ProfitabilityAnalysis_CustomSQL",
              "downstreamDatasources": [
                {
                  "name": "Product Returns and Profitability"
                }
              ],
              "question": "SELECT p.product_category, SUM(s.revenue) AS total_profit FROM ecommerce.sales_data.Gross sales s JOIN ecommerce.sales_data.Merchandise p ON s.product_id = p.product_id GROUP BY p.product_category"
            },
            {
              "id": "csql-9101",
              "identify": "CustomerSegmentation_CustomSQL",
              "downstreamDatasources": [
                {
                  "name": "Customer Segmentation Analysis"
                }
              ],
              "question": "SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Prospects c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location"
            },
            {
              "id": "csql-3141",
              "identify": "CustomerOrders_CustomSQL",
              "downstreamDatasources": [
                {
                  "name": "Customer Orders DataSource"
                }
              ],
              "question": "SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = 'Accomplished'"
            },
            {
              "id": "csql-3142",
              "identify": "CustomerProfiles_CustomSQL",
              "downstreamDatasources": [
                {
                  "name": "Customer Orders DataSource"
                }
              ],
              "question": "SELECT c.customer_id, c.customer_name, c.section, c.location FROM ecommerce.sales_data.Prospects c WHERE c.active_flag = 1"
            },
            {
              "id": "csql-3143",
              "identify": "CustomerReturns_CustomSQL",
              "downstreamDatasources": [
                {
                  "name": "Customer Orders DataSource"
                }
              ],
              "question": "SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r"
            }
          ]
        }
      }
    }

    Identical to earlier than after we have been creating the grasp checklist of knowledge sources, right here additionally we’ve got nested json for the downstream information sources the place we would wish to extract solely the “identify” a part of it. Within the “question” column, your complete customized sql is dumped. If we use regex sample, we are able to simply seek for the names of the desk used within the question.

    We all know that the desk names all the time come after FROM or a JOIN clause and so they typically comply with the format ... The is optionally available and many of the occasions not used. There have been some queries I discovered which used this format and I ended up solely getting the database and schema names, and never the entire desk identify. As soon as we’ve got extracted the names of the info sources and the names of the tables, we have to merge the rows per information supply as there will be a number of customized sql queries utilized in a single information supply.

    ### Convert the customized sql response into dataframe
    col_names = custom_table_query_result['data']['customSQLTablesConnection']['nodes'][0].keys()
    cs_df = pd.DataFrame(columns=col_names)
    
    for i in custom_table_query_result['data']['customSQLTablesConnection']['nodes']:
        tmp_dt = {ok:v for ok,v in i.objects()}
    
        cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
    
    # Extract the info supply identify the place the customized sql question was used
    cs_df['data_source'] = cs_df.downstreamDatasources.apply(lambda x: x[0]['name'] if x and 'identify' in x[0] else None)
    cs_df.reset_index(inplace=True)
    cs_df.drop(['index','downstreamDatasources'], axis=1,inplace=True)
    
    ### We have to extract the desk names from the sql question. We all know the desk identify comes after FROM or JOIN clause
    # Be aware that the identify of desk will be of the format ..
    # Relying on the format of how desk is named, you'll have to modify the regex expression
    
    def extract_tables(sql):
        # Regex to match database.schema.desk or schema.desk, keep away from alias
        sample = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b'
        matches = re.findall(sample, sql, re.IGNORECASE)
        return checklist(set(matches))  # Distinctive desk names
    
    cs_df['customSQLTables'] = cs_df['query'].apply(extract_tables)
    cs_df = cs_df[['data_source','customSQLTables']]
    
    # We have to merge datasources as there will be a number of customized sqls utilized in the identical information supply
    cs_df = cs_df.groupby('data_source', as_index=False).agg({
        'customSQLTables': lambda x: checklist(set(merchandise for sublist in x for merchandise in sublist))  # Flatten & make distinctive
    })
    
    print('There are ', cs_df.form[0], 'datasources with customized sqls utilized in it')

    After we carry out all of the above operations, that is how the construction of cs_df would seem like:

    Pattern output of code

    Processing of standard Tables in Information Sources

    Now we have to get the checklist of all of the common tables utilized in a datasource which aren’t part of customized SQL. There are two methods to go about it. Both use the publishedDatasources object and test for upstreamTables or use DatabaseTable and test for upstreamDatasources. I’ll go by the primary methodology as a result of I would like the outcomes at a knowledge supply stage (mainly, I would like some code able to reuse after I wish to test a selected information supply in additional element). Right here once more, for the sake of simplicity, as an alternative of going for pagination, I’m looping by means of every datasource to make sure I’ve the whole lot. We get the upstreamTables inside the sphere object in order that must be cleaned out.

    ############### Get the info sources with the common desk names utilized in your website
    
    ### Its greatest to extract the tables info for each information supply after which merge the outcomes.
    # Since we solely get the desk info nested below fields, in case there are a whole lot of fields 
    # utilized in a single information supply, we are going to hit the response limits and will be unable to retrieve all the info.
    
    data_source_list = master_df.identify.tolist()
    
    col_names = ['name', 'id', 'extractLastUpdateTime', 'fields']
    ds_df = pd.DataFrame(columns=col_names)
    
    with server.auth.sign_in(tableau_auth):
        for ds_name in data_source_list:
            question = """ {
                publishedDatasources (filter: { identify: """"+ ds_name + """" }) {
                identify
                id
                extractLastUpdateTime
                fields {
                    identify
                    upstreamTables {
                        identify
                    }
                }
                }
            } """
            ds_name_result = server.metadata.question(
            question
            )
            for i in ds_name_result['data']['publishedDatasources']:
                tmp_dt = {ok:v for ok,v in i.objects() if ok != 'fields'}
                tmp_dt['fields'] = json.dumps(i['fields'])
            ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])
    
    ds_df.reset_index(inplace=True)

    That is how the construction of ds_df would look:

    Pattern output of code

    We are able to have to flatten out the fields object and extract the sphere names in addition to the desk names. Because the desk names will probably be repeating a number of occasions, we must deduplicate to maintain solely the distinctive ones.

    # Perform to extract the values of fields and upstream tables in json lists
    def extract_values(json_list, key):
        values = []
        for merchandise in json_list:
            values.append(merchandise[key])
        return values
    
    ds_df["fields"] = ds_df["fields"].apply(ast.literal_eval)
    ds_df['field_names'] = ds_df.apply(lambda x: extract_values(x['fields'],'identify'), axis=1)
    ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_values(x['fields'],'upstreamTables'), axis=1)
    
    # Perform to extract the distinctive desk names 
    def extract_upstreamTable_values(table_list):
        values = set()a
        for inner_list in table_list:
            for merchandise in inner_list:
                if 'identify' in merchandise:
                    values.add(merchandise['name'])
        return checklist(values)
    
    ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_upstreamTable_values(x['upstreamTables']), axis=1)
    ds_df.drop(["index","fields"], axis=1, inplace=True)

    As soon as we do the above operations, the ultimate construction of ds_df would look one thing like this:

    Pattern output of code

    We’ve all of the items and now we simply must merge them collectively:

    ###### Be part of all the info collectively
    master_data = pd.merge(master_df, ds_df, how="left", on=["name","id"])
    master_data = pd.merge(master_data, cs_df, how="left", left_on="identify", right_on="data_source")
    
    # Save the outcomes to analyse additional
    master_data.to_excel("Tableau Information Sources with Tables.xlsx", index=False)

    That is our last master_data:

    Pattern Output of code

    Desk-level Impression Evaluation

    Let’s say there have been some schema modifications on the “Gross sales” desk and also you wish to know which information sources will probably be impacted. Then you may merely write a small perform which checks if a desk is current in both of the 2 columns — upstreamTables or customSQLTables like beneath.

    def filter_rows_with_table(df, col1, col2, target_table):
        """
        Filters rows in df the place target_table is a part of any worth in both col1 or col2 (helps partial match).
        Returns full rows (all columns retained).
        """
        return df[
            df.apply(
                lambda row: 
                    (isinstance(row[col1], checklist) and any(target_table in merchandise for merchandise in row[col1])) or
                    (isinstance(row[col2], checklist) and any(target_table in merchandise for merchandise in row[col2])),
                axis=1
            )
        ]
    # For instance 
    filter_rows_with_table(master_data, 'upstreamTables', 'customSQLTables', 'Gross sales')

    Under is the output. You’ll be able to see that 3 information sources will probably be impacted by this alteration. It’s also possible to alert the info supply homeowners Alice and Bob upfront about this to allow them to begin engaged on a repair earlier than one thing breaks on the Tableau dashboards.

    Pattern output of code

    You’ll be able to try the entire model of the code in my Github repository here.

    That is simply one of many potential use-cases of the Tableau Metadata API. It’s also possible to extract the sphere names utilized in customized sql queries and add to the dataset to get a field-level impression evaluation. One may also monitor the stale information sources with the extractLastUpdateTime to see if these have any points or must be archived if they don’t seem to be used any extra. We are able to additionally use the dashboards object to fetch info at a dashboard stage.

    Last Ideas

    When you have come this far, kudos. This is only one use case of automating Tableau information administration. It’s time to replicate by yourself work and assume which of these different duties you can automate to make your life simpler. I hope this mini-project served as an pleasant studying expertise to know the facility of Tableau Metadata API. In case you appreciated studying this, you may also like one other one in all my weblog posts about Tableau, on a few of the challenges I confronted when coping with huge .

    Additionally do try my earlier weblog the place I explored constructing an interactive, database-powered app with Python, Streamlit, and SQLite.


    Earlier than you go…

    Observe me so that you don’t miss any new posts I write in future; you’ll find extra of my articles on my . It’s also possible to join with me on LinkedIn or Twitter!





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    AI and Behavioral Finance: Predicting Investor Behavior

    May 7, 2025

    The Future of AI in ESG Investing

    May 4, 2025

    AI Video Generators: Transforming Corporate Training Materials

    April 30, 2025

    “Empowering Innovation: Dr. Zarkaish Ismail, a Pakistani Woman Tech Entrepreneur, Takes the Helm at VEDO AI & Robotics USA”

    April 29, 2025

    Features of AI Girlfriend Chatbots

    April 27, 2025

    User Privacy Concerns with AI Sexting Apps

    April 23, 2025

    Comments are closed.

    Editors Picks

    Fidji Simo joins OpenAI as new CEO of Applications

    May 8, 2025

    12 Terrific Sci-Fi Movies You Need on Your Prime Video Watch List

    May 8, 2025

    Uber Revenue Is Up 14%, Despite Economic Fears

    May 8, 2025

    AI and Behavioral Finance: Predicting Investor Behavior

    May 7, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Is Google Search Cooked? + We’re Getting a U.S. Crypto Reserve? + What You’re Vibecoding

    March 7, 2025

    Best iPad to Buy (and Some to Avoid) in 2024

    October 16, 2024

    OpenAI Unveils New A.I. Agent for Research

    February 3, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.