MBI - Ask an Expert

NOTE
Adobe Commerce Intelligence was previously known as Magento Business Intelligence (MBI). Recordings for past events reflect the previous name.

Session 1:

  • What customers say about Commerce Intelligence
  • Business transformation through intelligent data

Session 2:

  • SQL optimization using Common Table Expressions
  • Best practices in SQL-based reports
  • SQL-based reporting demo
  • Questions and answers

Recorded webinar content

Ask an Expert is comprised of two sessions:

Session 1

Transcript

Welcome everybody and thank you for joining us today for the MBI Ask an Expert webinar session one. This is the first of two sessions. Today we’ll cover some topics around MBI and business operations and we’ll continue this on Thursday with some more technical topics. So please join us again later this week at this same link. I’m Webster Love and I’ll be your host today. Joining me are my co-hosts, Sonika Viraneni, and your presenter for today’s session, Deepak Kumar. We’re all from the product team for MBI at Adobe. We’ll also be featuring a very exciting guest speaker, Shane Newton from Pixels, and Deepak will introduce him in a minute. Today we’re playing a recording of Shane’s content and then we’ll have Shannon Dupock doing a live Q&A at the end of our session. Before we dive in, we have a couple of technical notes. For any audio issues, audio is through Adobe Connect, so your speakers or headphones must be unmuted and then ensure that the speaker icon is on at the top of your Connect room. We don’t currently have an option for calling in via phone. Adobe Connect doesn’t always work well with Bluetooth headphones, so if you’re using those and having audio difficulties, you can try switching to wired headphones. You can also try logging out of the room and back in, or switching to the desktop app instead of the browser version. Throughout the webinar, if you face any technical issues with your audio or the screen share, or if you have questions related to the functionality of MBI or to the content that’s being presented, please feel free to put those questions in the Q&A pod at the left side of your screen, as we’ll be monitoring that while recording today’s, well the recording of today’s content, please. We’ll try to answer those questions either right away through the chat or during our Q&A segment at the end of the webinar. We will be sending a link to a recording of the webinar out afterward. Alright, now I’ll hand it over to Deepak.

Hi everyone, thank you for joining this webinar. We have with us our guest speaker, Shan Newton. I’m going to introduce him shortly. This webinar is a great platform to understand five key aspects. What do our customers say about MBI? Business transformation through MBI. How MBI is contributing to our client’s growth. In this session, we have a success story of one of our prestigious clients, Pixels Inc. SQL optimization using common table expressions. Our MBI users playing a data analyst role can get benefited with CTE style of writing SQL queries. Best practices in MBI SQL base reports. A quick demo on creation of SQL base reports using common table expressions. We will wrap up with a Q&A session.

A little bit about Shan’s background. Shan has two master’s degrees to his credit. Four years of experience working with American Airlines, the world’s largest airline. He worked at American Airlines headquarters in Dallas, Texas. He was associated with several departments in and around digital marketing as an information architect and senior developer. He helped the company transition from a legacy monolith codebase to cloud-based microservices. He also spent six months as a full-time A-B test developer using Adobe Target’s advanced features. Namely, JavaScript injection. Shan left American Airlines to join Pixels Incorporated, the world’s largest image editing service for e-commerce. Where he transitioned into data science and business intelligence reporting. Shan has been with Pixels for three years now. He helped the startup become profitable through an organizational transformation in which data science and Magento BI played a key role. He also manages Google Analytics as a professional certification in search engine optimization. And manages other data systems at Pixels. MBI in particular. He has written tens of thousands of lines of SQL statements. Author over a thousand custom reports. He manages dozens of dashboards for various departments. He’s a super user of MBI and has pushed the limits of what MBI can do. And he has even found workarounds to push those limits further. Thank you Shan for being with us. Hi Deepak, thank you. Very nice introduction. Could you tell us more about Pixels? Well, Pixels is, as you said before, the largest image editing service in the world for e-commerce. And it’s for e-commerce only. So we’re not retouching photos of weddings or portraits. This is purely for e-commerce. And if you are a retailer or a producer of products and you want to sell your products online. You need to have great photographs of your products. Probably multiple photographs per product from different angles. And many companies have tens of thousands of images per month of new products. And so this is a really interesting industry. And every image is data. Every image that comes to our company is a row in a database. And then what happens to that image? There are people editing these images and it goes through different steps of editing. So there’s different workers touching these images. Each one of those touch points is a piece of data in our database. So we have a really interesting ecosystem of data at our company. And we have this really great online platform for studios. You can see here, this is an idea of the type of editing that we do. So Photo Studio is taking a photo of a product on a mannequin. They send it to us along with thousands of other images. And we remove the background, we remove the mannequin, and we return back the final product. And there’s a lot of players in this industry. There’s not a lot of players in this industry that have online software that complement this industry. For example, maybe 10 years ago it was very common to send all your images over an email or Dropbox. But we have a software package in our browser. So you log in, you can upload images through that, you can see the before and after of what’s done. You can mark rejections and send them back to us to rework. There’s color matching, there’s all sorts of interesting things going on. We have AI technology that assists our editors. So we even have an API. And so there are customers who have built applications that plug into our API. And that whole system is automated. So for example, this company has an app where they can take a photo of their furniture. It can be any background, it could be in your home. And it automatically sends that to us, we remove the background, send it back to the app. So it can be beautiful.

That’s awesome. You have been working with MBI tools since three years now. What was your experience like working with MBI? Well, when I first joined Pixels, our data team was not focused on Magento BI. And I don’t know the reason for that. But we had a data team of four data scientists. And most of our queries would go directly onto our relational database. Queries would run slow. People weren’t sharing code or queries. It was a different kind of environment. As we learned how powerful Magento BI is, we began using it more and more. So currently, we have 109 users, we have over 500 dashboards, and we have over 9000 automated reports currently in Magento BI. We actually pulled those statistics this week. So we have over three databases actually pouring data into the MBI data warehouse and over four terabytes of data. So I think four terabytes is pretty large. For a data scientist or BI analyst, that’s a great place to be. We like data. So here’s an example of one of the dashboards that I’ve made in MBI. And I like showing this. Of course, I blurred out a few things. I can’t give all our secrets away. This is a niche product, a niche service that we offer. And so what I wanted to show about this dashboard, one is having small statistics at the top. You know, when you build these dashboards, it can be however you want. If you want a giant chart at the top, you can have that. It could be just a plain old data table. It looks like a big spreadsheet. But this is how I like to design them. Something high value and interesting at the top. So at a quick glance, you see exactly where you were last month and where you are this month. This makes dashboards kind of fun. Like you want to check these dashboards regularly so you can see how things are moving. I like to have a row of a few charts, high value KPIs. And so at a quick glance, you’re getting everything above the fold. And then a bit lower, I like to have larger charts, data tables, and everything else. So as we talk about some of my opinions about business intelligence and tools. And so if I were on a leadership team and I’m thinking about a new organization and we’re looking for a solution for business intelligence, maybe we have an old vendor, maybe we don’t, but we’re just looking around for the best tool in the industry. I would not feel attracted to a purely codeless point and click kind of BI tool. I see a lot of this marketing. And I know that a lot of people feel the same way that I do. That codeless is a bit overhyped. And there’s a lot of reasons I say that. But I really believe that code first, data analysis, and business intelligence is the best way for a company to make progress. So in MBI, luckily there’s both kinds of Report Builder. So there is the point and click visual Report Builder. And it is great. It’s fast. It’s useful. It has group by features. But SQL Report Builder is where I spend 99% of my time. Out of those thousand reports that I’ve made, I would say maybe less than five have been built in Visual Report Builder. And I would challenge anyone. I think I could use SQL Report Builder and be faster or just as fast as the Visual Report Builder. Just to drill down on some of those problems encountered by codeless point and click tools. For one, knowing all the filters and the nuances of the data can be really challenging. It’s also possible to make charts and reports that give you answers. But those answers aren’t necessarily correct. And they might not line up with what other reports other people are making. And so that comes to this problem of having a single source of truth. And that’s something that a lot of companies are struggling with. Lastly, it’s easy to forget why you did certain things. It’s a little bit more difficult, in my opinion, than having code. With code, I can copy and paste work from someone else that I trust or from a report that I trust. Whereas with a point and click interface, sometimes it can be difficult to duplicate some of those filters and nuances that should be applied to various reports. And there’s a lot of people writing about this online. But today I want to focus on the SQL Report Builder. OK, so let’s look at a typical MBI setup. So this is how our company is set up with Magento BI. We have a live database. And this is normal. It would be for the application only. You would have a separate database that’s for employee facing tools and for your BI reports, data science queries. But that’s still a traditional database and it’s syncing every night. Maybe you have a different sync pattern, but I think this is a very common thing that we see in enterprise organizations. So live DB for the application only, read only DB for BI and employee tools. And then we have our Magento BI Data Warehouse is syncing data from our read only.

So this does mean that our MBI data is 48 hours behind the live DB. But that’s something that we can live with.

So talking about some of the benefits of MBI for us, I can give a really good example. There is a query. It’s extremely large. It’s about 500 lines. And it’s extremely important for our organization. It really helped us go through an organizational transformation using this one query. It gave us a new type of insight and a new KPI. So we look at this, the results of this query every day.

It’s a really complex query and it takes about 50 minutes to run. And this is after having optimized it as much as possible. It still is taking 40 to 80 percent of our CPU. You know, it spikes up to 80 percent sometimes. And this is unacceptable. Actually, if I run this on our database after about 10 minutes, I’ll probably get a message from our DevOps team. Someone will notice that I’ve blown up our read-only database. Conversely, in Magento BI, this query runs in 57 seconds. So, really complex query, 50 times faster in Magento BI. And because Magento BI caches the results, the second time that I run this query, it’s virtually instant. Less than a second. Thanks for the example you gave on SQL query performance in MBI. This serves as a testimonial as to how MBI’s SaaS infrastructure is utilized to its best potential. With MBI coming into the picture, the data analyst can now avoid the wait time involved in connecting to the databases, save and retrieve SQL queries and of course, the query response time as you mentioned. Yeah, and this really, it goes further than just having a fast query or not having to run this on a slow database.

As an SQL writer, when I’m discovering new data or trying to solve a new problem, having instant feedback is really important. I would never write a 500-line query and then run it to see what kind of errors I’m going to get back. You know, it’s always small chunks. Small little bit of a query, get the results, take a look at those results, figure out that you actually need to filter a few more things. So, having that instant feedback is really important and if you can imagine, 50 times faster. So, if it takes me on a traditional database, let’s say two or three days to solve a complex problem, theoretically, most of that time is spent waiting for results to get back. If I can move 50 times faster, I can get that done in maybe two hours.

There are a lot of clients who might be looking forward to use a full stack SaaS tool like Magento BI. How do you think MBI can help them transform their business to the next level? So, now I’ll talk about organizational transformation and how Magento BI can play a role in that. And at our own company, we went through a large organizational transformation two or three years ago. And I’ll share some of that story as well. This is a non-optimal process where we had a database, the traditional database, and if someone wants data, we actually had a ticketing system. So, we would do 600 tickets a year of various departments requesting data. Maybe it’s just three weeks of data for this one particular question. Another team wants to check on some of their workers and pull performance data on that. But in any case, we would go to the database and then there needs to be a data worker who creates a query, gets that raw data, puts it into a spreadsheet. We like to use Google Sheets, easy to share. At some point, there’s some sort of filtering and cleansing process with the data. Then we’re going to do some pivots, some analysis, and eventually we get to this final insight.

From a stakeholder point of view, there’s all this waiting. So, you share your data requirements. Maybe you send a ticket. You wait because they have a backlog. So, you might be waiting one or two days before they even begin your questions. Then they send you back the raw data. You see that raw data and most of the time, there’s new questions. Maybe the data is not quite what they expected or now they have new questions. They need to add something to that data. So, now they’re waiting more. By the time you get to this final analysis, it’s outdated. I mean, it is a historical snapshot of what happened at that time, but it’s not an automated report that you could check every month. So, this whole process can span days, maybe weeks. If it’s urgent, maybe you can get it done in a day or two. But this hassle of getting data and this slow process, it can deter some people from using data in their decision making. Conversely, some people can be a little bit paralyzed by this and not want to make a decision until they have the data. And so, that also slows decision making productivity.

So, the optimal flow is what we found in Magento BI. Our company transitioned from a ticketing system to only using Magento BI. And any time a new ticket came in, we would figure out what kind of question that is, build it into an automated report, and give it to that department’s dashboards. We started building dashboards for each department and we would just put it on there. So, then if they ever have that question again, they can self-service that data themselves. Another thing that this allowed us to do was really scale up the amount of problems that we could solve without hiring extra data experts. And just to show about the self-service aspect, so this is a glimpse of what it would look like in MBI. Sometimes it’s a simple spreadsheet type of data that a department wants. So, you can see here in the background, sort of spreadsheet-like. And they can just click here and export to CSV, then they can import it into a Google Sheet, and they can work on this data themselves. So, there were so many times we would get a request for data and we’d realize, oh, we actually built that three months ago for someone else. So, you can, you know, here’s the link to that report. You can go pull the data yourself.

Yeah, good to know how MBI helped your organization to automate the data. I’m sure a lot of people in this audience can relate to practical challenges in data warehousing and business intelligence implementations. One, of course, is the replication of data periodically from source systems, analysis of data using database tools to create datasets, consumption of these datasets to making meaningful dashboards. Even after doing the visualization, sometimes there are iterations that are needed. As you rightly pointed out, that could prove costly. At an enterprise level, there is a constant pressure in making the right data available at the right time. Senior management would be very unhappy as their decision-making gets delayed. Not to mention the business suffering due to this. Yeah, absolutely.

So, a few other tips that I would have in an organizational transformation. So, our company, three years ago, I think we would still classify it as a startup. And with startups, there’s investment money, and that money keeps the company afloat and allows it to scale, but would not necessarily profitable at that time. And there comes a certain point where you have to get profitable as a startup. And we were going through that during the pandemic. And there comes a certain point where you have to get profitable as a startup. And we were going through that transition. Maybe for some other enterprise level companies, it’s not that, but maybe you have challenging KPIs. And I think the best way to achieve those KPIs is to make data transparent, not just to the workers in your department, but across departments. So, what we did is we made certain key dashboards, performance indicators of the whole company, but also individual departments. We started sharing those with other departmental leaders and their teams. And it gave people this feeling that not only are they responsible for their KPIs, but they get a sense of recognition when they move the data in the right direction. And there was this sense of excitement. The first thing everyone does on Monday is they log into Magento BI, and they check on their data and their dashboards. And they get excited to see the full week of last week’s data. And if the dashboards are built in a really nice way, that’s a fun process. And it’s exciting to look at different charts and explore that data. From the data team perspective, we moved away from this code repository. We were using Git. That was something I introduced to our team. Before that, there was not even code sharing going on. So, we had a code repository, but it’s a little bit cumbersome. There’s a learning curve to using Git if someone’s not coming from a development background. And also just remembering to save your code to the code repository after you’ve delivered your results to the end customer. In other ways, maybe you would paste your query into the spreadsheet whenever you deliver that data to a customer. In any case, Magento BI is far superior because every chart and every query that we write in the SQL Report Builder is saved. And so, anyone with edit access can click here, click edit, and then open up and see the code underneath. So, for us, this was tremendously valuable. So, here’s an example of a high-value dashboard that people like to look at. We have a lot of cost savings from a project that we did. But you can see here, I’ve given view access to a lot of people. Janus here has edit access. That’s our CTO, and he’s also an excellent writer of SQL. So, I always give him edit access, and I don’t think he minds showing his name here. But you can see, I can scroll down a lot. A lot of people have view access, and this gets people excited about these projects. So, I highly recommend making dashboards visible instead of siloing your teams into only seeing the need-to-know kind of information. Could you describe what was your team headcount and composition so our audience can relate to the amount of work versus team strength? Yeah, in the past, we had four active team members. Since then, it’s actually grown smaller for various reasons. And so, at the moment, my protégé, who I trained using MBI, has moved into a sister company, so the same founders, but a new startup. And so, he’s managing it entirely over there. And I’m managing our enterprise entirely over here. Because we had four people building all these dashboards at one point in time and automating everything, there came a point where the workload decreased a lot because everything is automated. So, teams are now self-servicing their data, and there’s this single source of truth. So, a lot of the complexities and problems that we were facing three years ago sort of vanished as we relied on MBI more frequently. And I’m able to maintain all of that because I rarely encounter problems or questions anymore. And so, now I’m more focused on helping us with new data roles, like in our marketing aspects. Great to hear that, Shan. Being a data evangelist, I’m sure you must have acquired a unique role I’m sure you must have acquired a unique way of developing reports in MBI. Any specific technique or operating procedure you follow in your team? Okay. So, I want to introduce this idea about common table expressions. What is a common table expression? It’s a style of writing an SQL query that creates a temporary table and it saves that table in memory only while the query is running, and then you can write another query that queries your previous query, if that makes sense, and I’ll show you what I mean by that. So, here’s a very simple query, and I’m going to use this concept throughout the walkthroughs. So, we’ve got a table called item, and we’re getting a list of items when they were created and some other information. To make this a CTE, it’s very simple. I wrap it in parentheses. So, now I’m giving a name, CTE1, to this. I could change that to any other name. In fact, I would recommend a better name, like, I don’t know, items. And then I can write another query below that, and it’s querying off of CTE1. So, CTE1 is not a database table. It is the results of this output. Maybe if you’re familiar with databases, this sounds like a view. You could create a view in the database, but a view remains permanently, and it uses processing power from your SQL server. Whereas this, as soon as this query runs and the query is finished, the CTE1 disappears. It’s just a temporary spot in memory. So, we would get some results from this. And why would we do this style? I mean, this is actually maybe a bad example because there’s only one, but you can actually chain five, six, seven, eight of these CTEs in a row. Well, for one, I think it’s far more readable, and we’ll show some examples of that. Number two, you can spot check it. So, if you have five or six of these CTEs in a row, it’s very easy to comment out CTE three, four, and five, and then select star from CTE2 and see the results of that output. So, you can spot check through the CTEs. It makes it much easier to debug. You’ve got more reusability. It’s easier to copy and paste snippets. And I think it leads to faster problem solving. Maybe some people would say that, you know, if you were writing this for an application, in an application, you need speed and technical excellence, and that is number one. You want the most efficient query possible, and you’re talking about shaving off milliseconds, and you would consider that a big win. But in Magento BI, queries are so fast, it doesn’t matter if a query runs in one second or three seconds, or it’s really complex, so it takes ten seconds, then it’s cached, and after that, it runs in less than one second. So, it’s really, I think, more important, especially for teams, to have readable code that other people can come and see what you intended, and to easily debug that, or to copy and paste part of your query, so I can quickly solve another problem. So, that’s what MBI really empowers you to do, and I think MBI is the perfect tool for common table expressions.

So, now we can do a more practical walkthrough of an example. I want to start this by showing us the final result. What we’d really like to have is a line chart, going through time, so let’s say weekly data, and let’s make it simple. We’ll say a count of a particular type of item, we’ll say furniture items, and just to make it interesting, we’ll say you can’t include returns, so maybe there’s some returned items for whatever reason.

So, we would start by writing the CTE we just looked at, so we’re getting a list of items, the data would come out looking something like this, so you’ve got a list of item IDs, you’ve got your weeks, they’re not in order, you’ve got item type, we don’t know what these numbers mean yet, but they’re here, and we see some prices. I think it’s interesting, I wrote priceUSD is not null, and maybe this is an example of how someone could screw up in a visual report builder, you think you’ve got the right filter, but it just turns out your database also has some zero values. So, now if you take an average of price, you’re going to have zeros in there skewing your average, and if you’re not aware of those zeros, maybe you want to filter these out, maybe it’s a nuance of your database, I think all databases have nuances like this, so we would probably even want to filter out zero prices.

And that’s what this spot check is here, so you can see the dash dash, that means comment, so it’s not going to run, but I can easily delete that dash dash, I can spot check this, and that’s where I would catch those zeros, and then maybe I would update this query. So the CTE2, I’m saying from return item, so in this hypothetical example, this means we have a database table of items that were returned. I would say that’s the best way to engineer this, maybe a database engineer would put a column on the item table, have a zero or one if it’s returned, I don’t think that would be the right way to do it, this would be a nice engineer. So we’ve got a table return items, we get a list of all those return items in the same timeframe, and then we write our, and it might look something like this, so you’ve got the same IDs, and then you’ve got a date. So our third CTE, now we’re filtering on return item ID is null.

So in other words, since we did a left join, it means there’s going to be a null value here if the item was not returned, and since we have this where clause, we have now removed any return items. So, you know, basic SQL concept there. But it’s more readable in the CTE format. You can see what we’re doing, we can name these CTEs, personally, I wouldn’t write CTE3, I would just write returns, something like that. So now we move into our analysis section. And it’s pretty easy to see what we’re doing here. We want data that has the week, the item type, and then we’ll do our functions, so count and sum. The results might look like this. So we’ve got different item types, we’ve got the weeks now, we’ve got the counts, and our question asked for furniture items, so we need to go figure out what are these item type IDs, what do those map to. So in our final, I call this the naked query, so these are wrapped in parentheses in the final naked query, I think that is a technical term, I’ve seen it somewhere. So here is where I would add item type ID equals three, so I’ve, you know, in this example, I’ve done the research to figure out what those item types map to instead of interjoining some item type table and getting the name, I would just say equal three and leave a comment. So I like this style because now if someone comes to me and says, oh, I’d really like to see that same chart but for a different item type, I could copy and paste this entire thing and just change the ID here.

Oh, and so the output might look like this. So why not write this as one dense terse query? This is probably what most people are familiar with seeing SQL queries, and they can get much larger, they can have nested subqueries, and it’s really common to see it written this way, especially in application development.

For one, I think this is less readable, it’s more difficult to understand the author’s intention, why is price USD not null here? You know, it’s disconnected from the CTEs where it’s very illogical problem solving flow. I could probably figure out what this left join is doing, but this return item ID is null, it’s all the way down here, and now I’ve got these two nulls, so I feel a little bit confused about that. Item type ID, that’s pretty understandable, but I’ve got a group by on this, so the whole thing is complex, I guess technically this probably wouldn’t even run, it should be grouped by one and two. So, in any case, this is difficult to understand. If I want to debug it and figure out what’s wrong with it, it’s also going to be more challenging, I’m going to have to copy this left join and open a new window, run that over there, figure out those results, it’s just much more complicated. I don’t think I could reuse anything from this, because it’s so customized to solving that one particular problem, and it’s really not a logical flow. I don’t think anyone writes queries like this the first time. Usually they’re writing smaller pieces of this query, and then they stitch together this beautifully dense query, and it feels nice to do that sometimes, and it’s impressive, and someone else looks at it and they don’t understand any of it, and they say, wow, you’re really smart, because you wrote this difficult to understand query, but I don’t see it that way, I’d rather it be readable. So, let’s do another example, and this will be a little bit more in depth, I think more practical. So, the end result, what we’d like to have is here, a stacked bar chart. So, it’s going to be a count of some items per week, and we’re going to not include returns, just like last time, but we want to stack it, and let’s say there’s locations, so maybe there’s warehouses, or three offices, or if we adapted this example to, let’s say, a marketing team, maybe it’s three different websites, or three different web pages for customer conversions, it could be any example, but in this case, we can copy and paste everything we just did, the items, and we filtered out returns, and it’s really clear and easy to read that, especially if we’ve labeled our CTEs descriptively. So, copy and paste that, and jump right into the analysis, but there’s one more thing we need, we said we want facility location, so we’re just going to open up our CTE1, and add facility ID here, and we don’t need to change anything else, because you can see down here in CTE3, it says CTE1.star, meaning all of the columns from here, so this is really a nice, well-written query, sort of a, I call it a master query, so a master query is something that’s really easy to adapt to solve many problems. So, we write our fourth CTE, and this is the analysis part, and at first glance, I think an application developer would frown at this, it’s kind of cringey, when you see a wall of case win statements embedded in a function, but Magento BI is so fast, you can do this, it’s okay to not be technically excellent in Magento BI, because it’s going to run so fast, every column is indexed, you don’t need to worry about that. So, this is more of a master query style, because we can now answer a lot of questions with one query. So, the output of this, it might look like this, we’ve got our weeks, facility IDs, three different facilities, one, two, and three, and we’ve got our counts of various product types, or item types. So, our final naked query, to finish the result, the final one is always naked and never wrapped in parentheses. So, we’d say select all from the analysis, here I’m saying, don’t show me this week, I only want to see full weeks of data, so get today’s date, find out what week that is, and in the chart, only show weeks that are less than this current week. And so, then we get a nice chart like this. But, in this example, we’re using facility ID, and maybe not everyone in the organization knows what facility one, two, and three is. So, we would do a little extra modification here, enter join, we probably have a table called facility, or maybe it’s office, or warehouse, something like that, a reference table. We modify our query slightly, and then we get a chart like this, and in Magento BI, when you hover on one of these bars, you’ll get a pop-up showing your data. So, you can see in this case, I’ve made it say accessories count, we didn’t specify in the question, but accessories count, it would be so easy to adapt this to show something else, it could be furniture count, or handbags count. So, you’re doing this in the SQL report builder, it has a way of making charts super easy, and here we have the facility name instead of the ID, we have the count, and the week. Yeah, I like the way you organize the use case, Shan. Starting from an end objective, and systematically breaking down the ask in terms of separate CTE chunks. In addition to making the query run faster, CTEs would also help make it more readable and reusable within the team. Often there is a need to create the dataset based on which further analysis could be done, or visual reports could be built. Being temporary in nature, I think CTE output does not occupy physical space in data warehouse, thereby saving efforts in cleaning them up periodically. Yeah, I agree, and actually I want to expand on that in the next slide.

So, I want to show you how great this chart builder is for the SQL report builder. So, the visual report builder is a little bit separate and probably more intuitive to use, but for someone who is writing their SQL code, the results tab is always, you know, you could think of it, it looks like a spreadsheet, it’s the raw data. So, it’s so easy, you click on the chart tab, and then you have three drop downs. And I remember when I first started making these charts, I was a little bit confused. Do I click label, what should be the series, what do I put under category? And usually I would just click around with trial and error until I got it right. But I’m going to give away my secret for how to understand this three step process. Step one, start with the label. And the label is always the time series. So, it is the x-axis. And I think probably 95% of the reports that I build in RGMetrics are always weekly or monthly or daily data. And if someone asks for, oh, I just want this three week chunk of time and that’s all I need, well, I’m going to build that into a weekly report. I’m not going to waste my time just making a one time analysis of those three weeks because I feel sure that person is going to come back with more questions. So, I’ll just go ahead and automate it for them. So, step one, label. You click that drop down, you’ll see the options here. This is an example from a dashboard I actually have. So, I created a measurement for waste time. And so, here I’m going to click on month. And right now, the chart is still, it doesn’t make sense. It still needs more. It has to have a series. So, that’s step two. Step two is what we’re measuring. It’s the y-axis in most cases. Maybe if you change to one of these other report styles, it might be a different axis, but in general, it’s always the y-axis of what we’re measuring. So, I would open that drop down, I would select what I want, in this case, waste time. And then step three, this only applies to stack charts. So, there’s a few different types of stack charts in Magento BI. I never use category unless it’s a stack chart. So, in this case, I would choose offices. And now we have a beautiful visual. So, you were mentioning before about how this is powerful for enterprises. Well, if you’re an enterprise, and you’d like to have this data report, and you don’t have Magento BI, you need to ask a developer team to build a custom dashboard just for you, and add that to your employee-facing tools. Well, that’s really expensive. Not only is it expensive, but if your requirements change, and you want a slight modification to that, that’s going to be another expense. And then maybe after six months, this data report’s not really important anymore. This is actually a really good example. We fixed our waste time problem, and now no one’s looking at this report anymore, because it’s already been successfully handled. So, now that would be a sunk cost that we spent on a report that we don’t need anymore. I think that is why executives like to have a business intelligence team or a few data scientists on hand, so you’re not creating a developer swim lane to build endless reports and dashboards for employee-facing tools that are constantly changing and becoming outdated. Instead, you have Magento BI, and you can build a limited amount of reports using just one or two data experts. You know, in my experience at American Airlines, I would say that’s under the umbrella of big corp and, you know, like giant enterprise level, and dev swim lanes are slow. There’s so much bureaucracy, and there’s a lot of approvals that these things need to go through, and there’s huge backlogs. So, if you wanted this data report, and you didn’t have Magento BI, and you didn’t have a team of data experts or business intelligence, and you’re trying to go through a developer path, you might not get this report for six months, eight months. That would be, I think, typical at American Airlines enterprise level type of organization, but if you have Magento BI, you could have this in, I think, an hour, depending if there’s a backlog or your urgency. There’s many times I’ve been pulled aside on something urgent. Can we have this before 3 p.m., and it’s 11 a.m. my time? So, I’d say, absolutely. I can do that in about 30 minutes. Actually, do you want to stay on the call with me, and I’ll share my screen, and we’ll just do it together? Magento BI is that fast, and that kind of collaboration actually helps people explore the data even further than their original request. Personally, I’m not shy about writing SQL with other people watching, so I like to share my screen and have the stakeholder stay on a call with me, and so the hour meeting that they booked is not an hour spent describing the problem to me and discussing it. Instead, it’s an hour spent actually actively solving the problem. All right. Thank you, Deepak and Shan, Shan, for that great presentation. There’s lots of good info there. Remember that we have the second session of Shan’s presentation this coming Thursday at the same time of day, and the same link. Everyone registered for today’s session will get a reminder email for Thursday’s session in a couple of days. All right. Now, let’s move on to the Q&A portion on the topics that Shan covered today. Please submit any additional questions in the Q&A pod on the left side of your screen, and also please submit your feedback in the poll questions now showing on your screen. These are really helpful for us to improve our content in the future. If you have questions after the webinar, or if we don’t get to your question during today’s Q&A, please feel free to send an email to mbi-webinar at adobe.com. We’ll receive questions there for about the next week. You will get a follow-up email in the next day or two with a link to a recording of today’s webinar and to the resources library where other webinar recordings are already posted. It will also contain the email address that I just mentioned for follow-up questions. All right. So, let’s see. We’ve got some questions that have come in. Okay. I think this first one is – looks like most of them are for Shan. Got just a little bit of time here for questions. As a novice with SQL, what is the fastest way to build good SQL skills? The fastest way is definitely air programming. There’s no faster way to learn SQL than to work with an experienced SQL writer and learn from them. So, when I say pair programming, I don’t mean they need to be sitting side by side each other, but it means the novice and the expert should have an internet call. The novice should screen share their computer. So, the novice is actually writing the query, exploring the database, and the expert is backseat driving and helping the novice work through some problems. This can feel a little bit painful in the beginning, you know, telling the novice, oh, you missed a comma, not there, but over there, yeah, up two lines, line 27, okay, now break a line, you know, things like that. But this initial pain, it dissolves quickly after the first two or three days of pair programming. Everything starts to fall in place and you start solving problems together very efficiently. And also, maybe 10 to 15 minutes per day, the novice can watch the expert write some queries, but it’s really more important that the novice do most of the typing. So, it’s a lot like learning algebra or calculus. You can watch the teacher solve some problems and you might think you understand it, but you don’t really integrate those skills if you don’t personally write and solve those problems yourself. So, just two or three hours per day in this pair programming style, writing meaningful queries, this will quickly convert a novice into an intermediate SQL writer who’s totally self-sufficient after a week or two, sort of like a boot camp for SQL training. And from that point, the person can teach themselves more and they just need practice to solidify those skills and work their way towards mastery. So, the novice, they need to feel comfortable with joins and group buys in the CTE style, and after that, they can self-learn a lot of the rest with some practice. Okay, great. Some good tips. All right, next question says, writing simple queries is easy, but sometimes the query seems correct and even provides data, but the data is wrong because of something in the query. How can we know the data is correct? This might be one of the most important issues with writing SQL, getting data results that appear correct but are actually flawed, and this can easily happen with visual report builders too. So, this issue is a good reason to always have at least one SQL expert in the organization or team and have code reviews or report reviews with that expert before using that data in decision making. So, when looking for those possible errors, the most common error is probably using an inner join when a left join should have been used instead, and the second most common error is probably in the where clauses, like forgetting to add a filter on something like a status ID or a type ID or removing null values or having duplicate rows. So, always look out for those, and if the query is written as a CTE style, it’s easy to spot check those CTEs and look for those errors, and it’s also easier to avoid those errors when writing the CTE style. Okay, great. I think we’ve got time for just one more question here. This is also for Shan, so it’s a little longer. The data team receives, I guess their data team receives many requests for data, but the requests are often in different formats, sometimes a long complex paragraph, sometimes a short ambiguous description, which leads to a lot of back and forth communication before creating a data report, or it sometimes leads to a data report that does not fit the expectation of the stakeholder. Do you have any recommendations for improving this process? Yeah, we’ve definitely experienced those issues also. It’s important to give stakeholders some expectations about what you expect in a data request. Personally, I like to have two things in my data requests. I’ve communicated this preference to all departments. The first part is that I want to have some context about why the stakeholder wants this data. Give me some context about the problem they are trying to solve or how the data will be used, and this helps me have a holistic understanding, and I can often include new insights the stakeholder didn’t know existed, and this usually makes stakeholders happy and feel like the data team is truly supportive. The second part is not required, but if the stakeholder is wanting a spreadsheet of raw data, maybe to do their own analysis separately, I like to see a spreadsheet example of the columns they expect to receive, mostly because I think this helps stakeholders think about the problem and clarify their own thinking before submitting the request. Okay, so if you feel you can answer this question that just came in sort of quickly, we have just a couple minutes left. I want to make sure we respect folks’ time and don’t run over. It says, how did you manage 9,000-plus reports and 500-plus dashboards? How many users did you have overall? Did you use a separate tool or manage directly from your MBI account user by user? So maybe you can cover at least part of that. Yeah, good question. We have over 100 users, and we only use the MBI administrative tools that come out of the box with Magento. It’s as simple as adding a company email address. They’ll receive the email, and then they can log in. I think it could be any email address, so it doesn’t necessarily have to be within your company. Maybe if you have some contractors working outside of the company, they can also join, just add their email address. They get an email, they log in, create a password. So one trick we do use for managing those users is we’ll put their department name, like marketing or customer support, we’ll put that department name in parentheses beside their username. This helps when sharing dashboards, and you have over 100 different people you could possibly share to. This helps make sure you share to the right person. Some people have the same or similar names in different departments. Most people create their own personal dashboards, like a playground or a sandbox, and it’s private. They’re not sharing that with many people. And it’s great for exploring data and creating reports and sitting on those for a while while you vet the data. We generally don’t consider a report verified for decision making unless it has been created or reviewed by the business intelligence team. And at one point, we decided we have too many unused or outdated dashboards and reports. It’s really easy to have a lot of fun in Magento and create all sorts of things that maybe seem interesting at the time or are relevant for a few years ago but are not relevant anymore. So we decided we are going to have a formal cleanup process. We sent a support ticket to Magento. They have a great support team, very responsive. And we asked about our unused dashboards or if they could give us any insights to help us in this cleanup process. So they gave us a spreadsheet list of dashboards that did not have views in the last 30 days and in the last 90 days. They were able to give us the emails of the users who created those dashboards. So then we could reach out and we could ask those people if those dashboards are still necessary or could they be deleted or could some reports be removed from those. And we removed a few dozen dashboards that way. And then we worked with department leaders to review the dashboards that we want to keep but delete some unnecessary reports from those. And that just helps people find data that they’re looking for more easily when they’re not distracted with outdated reports. All right, great. Thank you so much, Shan. And thanks to everybody who attended. We are out of time for this webinar. Hope you got some really great information today and we hope to see you again on Thursday. Have a great rest of your day.

Session 2

Transcript
Welcome everyone and thank you for joining us today for the MBI Ask an Expert webinar session two. This is the second of our two sessions. On Monday we covered some topics around MBI and business operations and we’ll continue today with some more technical topics. Thanks to those of you joining for both sessions, but if you couldn’t make it to session one, we’ll do a quick summary today and you can watch the full recording later once it’s posted on our resources library. I’m Websterlove and I’ll be your host for today. Joining me are my co-hosts Sonika Veremnaini and your presenter for today’s session Deepak Kumar. We’re all from the product team for MBI at Adobe. In today’s session we’ll continue featuring a guest speaker, Sean Newton from the data team at Pixels. Today we’re playing a recording of Sean’s content and then we’ll have Sean and Deepak doing a live Q&A at the end. Before we dive in, a couple of technical notes. For audio issues, audio is through Adobe Connect, so your speakers or headphones must be unmuted and then ensure the speaker icon is on at the top of the Connect room. We don’t currently have an option for calling in via phone. Adobe Connect doesn’t always work well with Bluetooth headphones, so if you’re using those and having audio difficulties, you can try switching to wired headphones. You can also try logging out of the room and back in or switching to the desktop app instead of the browser version. Throughout the webinar, if you face any technical issues with your audio or the screen share, or if you have questions related to the functionality of MBI or to the content that’s being presented, please feel free to put those questions in the Q&A pod at the left of your screen, as we’ll be monitoring that while the recording of today’s content plays. We will try to answer them either right away through the chat or during our Q&A segment at the end of the webinar. We will be sending a recording to a link to a recording of the webinar out afterward. Thank you all for joining the second part of our webinar. So far, we covered a product review of MBI, a case study from our guest speaker Shan about organizational transformation using MBI. We looked at common table expressions and how it could help data analysts using MBI make their SQL code more readable, reusable, and run queries faster. Today, we will cover a few examples of common table expressions, some of the tips and tricks to write SQL-based reports, a short demo on SQL-based reports using common table expressions. We will wrap up the session with a Q&A. Over to you, Shan. Okay, so let’s do one more practical example. And this one is a complex example that I actually did about two weeks ago at our own company. So I’ve abstracted this example out to be a hypothetical situation. So let’s say that you have a process flow. Since we’ve been talking about items, furniture items, handbags, or something, let’s say they go through some sort of production process. And there could be multiple processes that they go through, different items going through different processes, sometimes the same steps, sometimes different steps, sometimes different lengths of processes. And there’s hundreds of different variations in this example. We want to know what’s the most common. You could adapt this to a marketing example. You could say what customers are going to different pages of your website before they convert. And if you have that data in a database, you could do this exact same example. So what’s the most common process flows? It has the most amount of items going through or most amount of web visitors before converting. And how do you find the answer? It’s actually really challenging if you’re trying to do this in pure SQL. Even if you wanted to pull out the raw data and do it in Python, you might get a little stuck trying to figure this out. So let’s say that we have this step table. This is our reference table of all the different steps of the process that it could go through. And then we’ve been talking about items. So let’s say that we have an item step table. So this is a mini to one table. And you can see here the green. So this is the same item. And it’s going through different steps. But there’s a new row entry for each step, if that makes sense. So how would you solve this? You want to find which of these is the most common. Maybe the worst idea you could have is you’re going to manually code every permutation and try to count it that way. Probably not a good idea. Maybe another way. You could probably do this in Python. I don’t know if you could do it in SQL. But you could. I guess you could. You could make a sort of like a case when statement for each possible step. And you could say zero or one if it went through that step. But that would also be pretty cumbersome and doing it manually like that. And you still have to string concatenate all those zeros and ones together and then start comparing them. It would get tricky. So let’s look at how I solved this problem two weeks ago. This is the final output. This is what we want to see. So we want to see a string comma separated of each step that it went through and how many. And I figure why not do the ID also. So we’ll do the ID and then the corresponding name of that step. And we’ll get the count of items that went through that step. So this is what we want to achieve. And where do we begin? Well, with CTEs, I think it’s really simple. You always start with just like the most basic raw data. So in this case, we would query that item step table. We would just get out all the items and the steps they went through. And let’s go ahead and get that step name involved. In this case, I said a step ID and I left a little comment just to show different variations of writing queries. Next, we actually move directly into solving this problem. There is a function called listag. And it’s been around for a few years. If you’re working in a legacy database, like a traditional database, and it’s an older version, you might not have this function. But this function is available in MBI, Magento BI. So what I’m doing here is I’m selecting a row. I want a row for each ID, a group by one, so group by the ID. And I want to count how many steps did it go through. And then this listag function, I’ll just walk us through this. So here I’m saying get these IDs, so these over here, and comma separate them. And then let’s within group, it’s just the syntax of this function. So order by item ID. So I want to see, and then order by step. So basically, the items gone through, let’s say, nine different steps. So I want to see that item grouped together, all its rows together, and then put it in order from like 01691315. So put it in order, and then collapse that into a string. It’s a pretty powerful SQL function. And I guarantee if you run that on a traditional database with, you know, more than a month of data, it’s going to blow up your server. It’s going to use a lot of CPU, depending on how powerful it is. So we do the same thing with the name. And we’re almost done. Like, actually, almost simple when you know what functions are available. So last part is select the top 50. So I just want the most interesting process flows, give me the ID string, name string, and give me the count. Here I said where count is more than three, just to make it interesting. So that’s it. And just to elaborate on that, so how did I know that I have this listag function available? Well, Magento BI is built on top of Redshift. So this is the SQL function reference that I use to know what kind of functions do I have available to me in Magento BI. And it’s all the modern functions that I could possibly want. I think this aggregate function became available in SQL Server 14. So if someone was working in a version prior to 14, they would not be able to solve this problem without using Magento BI. Okay, so let’s do a quick recap. What did we learn? For one, I think a major takeaway is that CTEs encourage a logical problem solving flow. And actually in the past, I’ve done some performance comparisons and CTEs are not necessarily slower than a condensed terse query. I’ve performance tested this with set statistics and set IO. If you know about that in SQL, you can test the performance of your own query. And I’ve spent time to just compare CTE styles with dense styles and they’re almost exactly the same. The SQL engine underneath the hood is really smart at interpreting CTEs and finding the most efficient way to do that. And sometimes it’s faster than a dense query. These CTEs, they also make long queries more readable and more reusable, easy to copy and paste and reuse in other contexts. Most importantly, I think in Magento BI, you can automate these results. And by using that CTE style, you won’t be getting raw data and then using spreadsheets to pivot it and create your charts. It’s all automated. There’s nothing you can do in Excel or a spreadsheet that you can’t do using CTEs with Magento BI. Also, Magento BI is exceptionally fast. So even if you write your query in a really wonky style and it’s not efficient and you’re using all the worst practices, it’s still going to be lightning fast. I mean, it’s really hard to make a slow query in Magento BI. I actually did one for this presentation. I’m going to show you in a few slides. It took me maybe 30 minutes to get this error message. So also advanced functions. So if you do have any experts on the team wondering if you’re going to have special functions available, they’re all available and it’s regularly updated. I’m sure our audience will be benefited a lot with the CTE style of writing SQL queries. Any additional tips from you so our end users can make use of MBI SQL based reports in a better way? So now I’ll just share a few more best practices and this is technical about SQL and this is specifically best practices for Magento BI when using this CTE style. For one, date ranges. When we first started using Magento BI, the first year, we made hundreds of reports and we were used to using Datepart because so many of our stakeholders who would send data requests, it was always two weeks of data from this one particular time or this one customer and the images they sent us for a certain three week period. So we would frequently use Datepart. It was normal and easy. And it breaks. When you move into a new year and you have a Datepart, the query won’t give you an error. It’ll just start counting two years. So month 10 being October. If this said where create date time is greater than 2019, so now it’s 2021. So you would have three years of count here, not one year. So it won’t tell you that it’s wrong, but it is wrong. So in order to solve that challenge, don’t use Datepart. Use to underscore car. Car meaning like character. So to car is better than Datepart and it looks nicer too. But there is one gotcha. You’ve got to be careful with weekly data. So when you use to car, there’s two types of format and you want to use the ISO standard for weekly data. And here’s an example of that. So this year in 2021, we had three days as week one and almost every company in the world does not consider January 1st, 2nd and 3rd week one of 2021. That’s the first full week that we consider to be the first week of 2021. And that would be the ISO standard. So if you did not say IW here, you would get 2021, 01 and it would only count these three days. But if you use the IW, then it will correctly show this week, these three days will be part of week 53 in 2020. Yeah I don’t know how many users may have missed this convention and ended up missing one week of data. Yeah. Yeah. It’s an easy mistake to make and it’s embarrassing if you make that mistake. So don’t do that. Don’t use Datepart. And if you use 2CAR, make sure on weekly data, you use ISO standard, but on monthly data, you don’t need to worry about that. Also for daily data in Magento BI, I would recommend using CAST and then CAST your date dates time as date. So CAST as date, and then that will be your daily data. And the reason I’ve added this underscore here, it’s kind of a common practice, but maybe not every team is familiar with that. So it’s letting the reader know that I created this column name month because month is also an SQL keyword. So if you were writing in IDE and it’s giving the color to your code as you’re writing, so month might show up as a color because it thinks it’s part of the SQL language, but actually it’s the name of a column that you created. So if you add the underscore, it’s very clear what you’re doing. Maybe a little bit like private variables and application development. Okay, so let’s look at a common error message. I think everyone will get this error message when they first start writing queries in Magento BI. This error message, it’s saying that there is a row limit of 10,000 rows. And so if you’re first starting out, you’re writing SQL in Magento BI and you maybe you’re using the CTE style. And even if you’re not going to get an output of 10,000 rows, the engine is trying to estimate how many rows you might return. And so it won’t show that unless you use top 100 or top 1000 or top 10,000, or you can write it as limit. Either way, it’s fine and this will circumvent this error and you’ll have no problems. Maybe that could be an enhancement for Magento BI in the future to automatically assume that the final naked query should automatically have a top 10,000. And then if it needs to use that limit, then display a little notice like note, we’re only showing you the top 10,000 rows. But sometimes, for example, if I write a really large CTE chain, maybe I’ve got eight or nine different chains of CTEs before I get to this final naked query, the engine will think it is going to calculate some of those CTEs and it’s going to say, whoa, you’re using a million rows and we can’t just show you a million rows. So it’s going to need this top 1000. But actually, maybe my naked query only expects three rows. So it’s not a perfect estimation and it’s just something that’s, I don’t know, I’ve learned to live with it. It doesn’t bother me at all. Doesn’t hinder my workflow, but a new user might get stuck on that. So another tip, this is an error that you’ll probably more rarely encounter, but it’s a fun error. If you do hit it, it means you’re doing something awesome, something interesting. This is an explain cost error. And so if you have a really large complex query and it’s burning up so much memory on the engine and it’s got a lot of different functions and calculations, then you might get this error. It’s very rare. I’ve only gotten it a few times in my three years of working in MBI. Just to get this screenshot, it took me 20 or 30 minutes to write a really nasty query to get this. So there’s a way to get around this. This typically happens with a CTE chain where here, like CTE two will query on CTE one and then CTE three queries on the previous and you kind of work your way down through your problem solving flow. But if for some reason down in a deeper CTE, in this case CTE four, you’re querying on the previous, but you decide to pull in some data from CTE one. So the SQL engine, when it’s evaluating this, it knows that it needs to hold on to this data until it gets to CTE four because CTE four is going to use it. And therefore the garbage collector won’t remove this from memory. But in the example, let’s say CTE three queries CTE two. So after that, the SQL engine knows it doesn’t need CTE two anymore. So it can garbage collect that and remove it from temporary memory. So it’s only when you do this, it basically just don’t do this. That’s the workaround is don’t pull in CTEs from previous in your query. It would be better in this case to write CTE five and then rewrite some of this CTE one query if you have to. And you could probably get around it. So you basically, you always want to be able to garbage collect the previous CTEs. So a few more tips. I’ll just point out some of my personal writing style and some of my personal suggestions. For one, I would say always leave helpful comments, especially on like IDs, status IDs, type IDs, anything pointing to a reference table. If you’re not going to enter join it and show the name, which you can because Magento BI is so fast. If you wanted to say item status name like pending, it’s so fast. You don’t have to worry about the performance there. But if you’re going to keep it sort of old school and use the numbers, then leave a comment. It’ll help yourself. You’ll forget what that meant after a few weeks. Another helpful thing, leave your spot checking for CTEs. So in other words, when you first wrote this query, you probably need to double check the output of it. It’s probably going to be a little more complex than this too. So I would say go ahead and write it, wrap it in parentheses, and then write a select statement to get the top 100 star from that CTE. And then leave that there. Comment it out and leave it because, I don’t know, six months down the road, some other person on your team is looking at your query and they want to double check your work. So make that easy for them. Next, I would say this is something you kind of figure out over time. It’s nice to put these commas here. And there’s a reason for that. Because if you want to do this spot check, so you’re going to backspace these two hyphens, and then you need to comment out everything below. So maybe you do a multi-line comment, like slash star. If this comment is right here after the parentheses, which is maybe the way you would see it in a textbook, then you need to comment that out also because the naked query can’t have a trailing comma. So over time, I’ve just learned to put them here and it saves you a few keystrokes. And maybe that seems petty, but keystrokes, they get annoying. We want to minimize those in the development world. Another tip, use descriptive CTE names. That’s sort of like the principle in code writing. You want to have descriptive variable names. So same for your CTEs. Don’t say with X as, and then make your next CTE Y and then Z. I’ve seen it before and don’t do it. And remember, even if it’s a long descriptive CTE name, you can always use aliases like I’ve done here. So pending items and then just write I after that. And now I can refer to that as I later in the query and save all those keystrokes. So use aliases. That’s fine. Notice that I’ve written everything in lowercase. So I think maybe 10 years ago, you would always see SQL with uppercase for all the keywords. And maybe that came from, I don’t know, 30 years ago when IDEs didn’t have nice colors. But MBI, this is a screenshot of the IDE in Magento BI, and it provides the colors for you. So you don’t really need all this uppercase to distinguish what is an SQL keyword and what is the domain specific language. So write lowercase. It’s sort of the new school. All the cool kids are doing it. It saves your pinky from hitting the shift key so frequently. And after a while, it feels really natural. And SQL is filled with a lot of uppercase stuff, but you don’t need it. Magento BI, uppercase, lowercase, case doesn’t matter. It’s not case sensitive. So another tip I would say, if you have a team of developers or data scientists, business intelligence SQL writers, come up with a comment convention so that you can have the author when they created it, if it’s been updated, maybe who was the stakeholder who requested that data, some notes about the nuances of the data, maybe to include something, a status or filter or something it didn’t include. This also gives people a sense of recognition. And this is my last tip. Although I’m showing you a nice database that I’ve made in a nice dashboard that I’ve made in Magento BI here, I’m not able to show you what I would like to. I can’t give away all of our reference IDs in our personal database, but you can get the idea. Create a Magento BI dashboard that provides all of your reference table IDs and names. So you could have a few tables here of your raw data, and it would output something like this. So you can easily reference that and not have to write a query to figure out what is step ID for. It’ll save a lot of time. And so that is it for my tips on advanced business intelligence using the SQL report builder in Magento BI. Thanks a lot, Shan. Can we have a short demo on the SQL based reports in MBI? Maybe we can start with writing the CTE based SQL statements, create a visual report from it in the same interface and finally attach it to a given dashboard. Okay, here we are in MBI. We’re looking at a dashboard. We have several dashboards here, but we’d like to make a new report. So we can do it two ways. We can say add a report and go this way or more simple, we can click report builder. Here we choose either a visual report builder, code list tool or SQL report builder. So let’s go here. All right. So what we’re looking at on the left is all of our database tables. So here is a database and you could have multiple databases synced. User push is for CSV files that we’ve uploaded to MBI and they’re just like a database table. We can query those with SQL. So for today, let’s go ahead and get started writing a CTE. So we’ll start with the word with. With and maybe we want to take a look at some sales orders. And so I could say orders or maybe I want to make it more defined. Let’s say completed orders. Maybe there’s some in there that are not completed yet. They’re still pending. So completed orders as, and I’ll just set this up and maybe I’ll go ahead and set up select star from completed orders. Let’s put a limit on that. Okay. And now I know I want to select something and I know it’s going to come from the sales order table, but I’m not sure what I want to select yet. So let’s click on this table and I can see the columns I have available. It’s really helpful in data discovery to write your query this way. And at this point I’m deciding, I probably always want my entity ID. Maybe if I were designing this table, I would call it the sales order ID. Entity ID. And let’s get a date. Do we have a date column? Here we have created at. So let’s go ahead and add this and let’s put it in our two car format instead of a date part format. So two car created at. And we’ll do our standard format. Let’s do it monthly. Okay. As month. And I put this little underscore here to denote that I wrote this month. It’s not the month keyword in SQL. That would go for date part. If you were splitting by month, you would write it this way. So we’re not going to do that. And let’s find something interesting to analyze. I think money is always interesting. So let’s find a money column. Order currency code. Base grand total. That looks like a money column. It’s a decimal. So I’m a little lazy to type that out. So I can just click insert and it’ll put it there for me in quotes. The quotes are not necessary. However, if you have a column that has a space, then you’ll need the double quotes. But if you have a nice design database with these underscores, you don’t even need it. It’s optional. And let’s add status. Status is important on almost every table. There’s usually some kind of nuance like that. And let’s say where the status is not equal to, or I could say not like processing. Okay. Now I’ve got it all set up here. I can go ahead and run this and let’s just spot check it. Okay. So we’ve got a thousand rows and I put a limit 1000. So that means I definitely have more than a thousand rows available. And we can see complete. We’ve got a status picked and we don’t have any statuses of processing. So that looks like what we want. So I think we can carry on to our next step. Let’s go ahead and comment out this line and let’s begin our next CTE. Thinking about something that’s interesting. What could we combine with sales order? Maybe the address. The address has a region column here. That looks interesting. So let’s try that. So comma regions as and set it up. Select. I’m coming from the table. No, not regions. It is this table. And that’s kind of a long table name. So I’m going to give it an alias S O A. And I know I want to inner join my previous CTE. So I want to enter join completed orders. It’s also a little bit long, so I’ll give that an alias. Say on the entity ID. I’ll add that to this entity ID. Okay. And I think I’d like to carry over all this data from the previous CTE into my next one. Sort of a common chaining pattern. So let’s say CO dot star. And I think just the region. We could maybe add in the country ID. And region. Okay, let’s see what this looks like. Select star from regions. And run it. Okay, so we got an error message. And this is the most common. Everyone will get this message at some point. So we have a limit. So let’s just add our limit here. And 10,000. Run again. And .27 seconds. We got 8,400 rows. And here is our region column. The country ID is interesting. It’s a string, not a number. Okay. So we’re looking at the data. And it looks like all of this data is from the USA. And it is states. So let’s think about something interesting we can do with this data. Let’s add a where clause and filter it somehow. Where region like, let’s see, New Mexico, North Carolina. Let’s try with the letter N. I just want all the states that start with an N. So I’m going to write a percent sign. Meaning begins with an N and then anything else can follow that. And as long as it’s like that, include it. So let’s run again. Okay, now we’ve got New Mexico, North Carolina, New Hampshire, more New Mexico, Nebraska. Okay, so that’s pretty interesting. Let’s see how all the N states are doing with their sales orders. So let’s comment out our spot check here. And let’s move to the next part of the query. Maybe we’re ready to wrap this up and do an analysis. So let’s select from regions. And we know we always want this date to make a good chart. We need a time series. We definitely need the region. And maybe we want to analyze the revenue. This base grand total. Let’s do a sum and base grand total. Just type it out here. Base grand total. Let’s give it a name. Let’s call it revenue. And since we’re using a function, that means we must use a group by. And let’s group by one and two. So one and two. You could also type them out, but this is shorthand. And we need to order it. So let’s order by the month. And let’s do an ascending order. And that’s actually a default, so we don’t even need to write that. OK, so now it’s in order. We have our regions. We have our sums. So let’s make a chart. I’m going to increase the size of this. And now using our three step process, we’re going to go to the labels first. Step one, always labels. And we choose the time, month. It’s not meaningful yet. We have to also have a series. Series is what we are measuring. So we’ll choose the sum revenue. And since we arranged our data to have many different states, so we need to split by those. And pick a category, region. And right now we’re on the line chart setting. That’s not so beautiful for a lot of stacked data. So let’s choose a stacked bar chart. It’s much more beautiful. We can hover over this. And we have this nice feature that allows us to see the data more clearly. And let’s try the area chart. Sometimes this could also be interesting. It’s especially interesting if you have percentages where the top would be 100%. And this is interesting. But maybe I’d like to drill down on something more meaningful. So let’s go back to our query and see what we can do. Maybe we want to change this to North. Are there more than one states that use North? North Dakota and North Carolina? Yeah. Or maybe we’d like to compare just the Carolinas, North and South Carolina. So let’s try that. Let’s just rearrange this. Carolina. OK. So I think you can see the power of how fast it is for data discovery. When I get this instant response, I can change my query frequently and see what other kinds of exploration and data discovery I can find. If I were on a traditional database, every time I run that query, I might be waiting 30 seconds, a minute, five minutes, 40 minutes. So this can be a very powerful way to explore your data and to build beautiful charts that are automated and last for years. So now that we have a chart that we like and we want to keep, let’s give it a name. Let’s call it Revenue Performance and say in the Carolinas. OK. And now let’s save. And we can save as a table, which would give us the results down here. Or we can save it as the chart. So let’s save it as the chart. And maybe let’s choose a different dashboard. Let’s choose this DK generic. Save to dashboard. And we can go check that out now. And then we can rearrange our dashboard. We can make our chart larger. And for anyone else who is possibly viewing our dashboard, maybe if we want to share to others, they can come to this chart. And they can still export the data table that’s underneath this chart. They can export that to a CSV. It’s a really powerful feature and great for sharing. All right. Thank you, Deepak and Sean, for that really great presentation. It was a lot of helpful tips and a fun demo at the end. All right. So now we’re going to move over to the Q&A portion on the topics that Sean covered today. Please submit any additional questions that you have in the Q&A pod on the left side of your screen. Also, please submit your feedback in the poll questions that are now showing on your screen. These are helpful for us to improve our content in the future. If you attended both sessions, please submit feedback for today’s session in the polls rather than for both of the sessions combined. If you have additional feedback or questions after the webinar or if we don’t get to your question during the Q&A, please send us an email at mbi-webinar at adobe.com. And we will get questions that are sent there for about the next week. You’ll get a follow-up email in the next day or two with a link to a recording of this webinar and to the resources library where our other webinar recordings are already posted, as well as the email address that I just mentioned. The recordings of both this week’s sessions should be posted on the resources library in the next couple of weeks. All right, so we’ve got a few questions here. Let’s see, first one, this is for Sean. So it says, watching someone write a query or walk through a simple example seems quite easy. But when a non-expert needs to solve a problem, it’s difficult to know where to begin and how to use various SQL features like groupby, case statements, joins, and other functions. So how should a non-expert overcome this sort of learning curve or hurdle of getting started with SQL? Yeah, I think the first step is to use the common table expression pattern that we discussed. It kind of gives you a starting point. When you have a large complex problem and you don’t know where to begin, you can really start almost anywhere. Start with the most low level data and just pull it out into a CTE, take a look at it. And then in your second CTE, you can start to play around with that data, just like we did in the example. Another good simple rule of thumb, this would be for a Nautus who’s just starting out. The groupbys can be a little bit confusing. That’s probably one of the first learning curves for a new person writing SQL. If you ever use one of the functions, sum or average or count, you’ll know that you have to have a groupby. It’s just, you’ll get an error message if you don’t include it, but eventually that’ll become very natural. The same way that if you’ve never made a pivot table in a spreadsheet, you’re gonna have to just do that a few times and after two or three times of doing it, it feels very natural. Great. All right, another question for Sean. Let’s see. Do you also track query history like updates or changes via the comment blocks that you mentioned your team uses? Yeah, we only use comments to track the updates. And sometimes it’s not necessary if the query is very readable and it’s not very complex. There’s no need to write that someone made an update. For example, maybe your database has an update. There’s a new status type or something like that and you need to adjust it. That might actually be better in the query, not at the top of the query trying to explain why you did it. So really it comes down to the diligence of the author and whether or not it’s necessary. We don’t use any other way to track updates. Thanks, good suggestion there. All right, so this next one I think is more for the MBI team. Are all MBI accounts hosted on Redshift? So most MBI accounts actually have their data warehouse hosted on PostgresQL. But those that have a particularly large amount of data like pixels can be hosted on Redshift. In many cases, the query syntax is the same between these. So the tips and examples that we saw from Sean in our presentation, these couple of sessions would generally apply. Sometimes there are different names for some specific functions that do the same thing. Both Postgres and Redshift have pretty thorough documentation available online. So if you are attempting to discover how to do something in one of them that you know how to do in the other one, then that should be fairly straightforward to find. All right, our next question. I can also confirm that when I do research, if I’m looking at Stack Overflow or some other research, I’m always looking for either PostgresQL information or Redshift information, and both always work for me. Nice, yeah, I myself have also found that there’s a lot of overlap. There are a few functions that, like I said, they have different names, but by and large, they’re very similar. All right, so next question is again for you, Sean. Since Sean mentioned query speed, I’m curious to know more about this. Does the performance of MBI compared to traditional databases really impact your workflow? Or is it rare to have expensive queries that are not practical for a traditional database? Yeah, you know, actually I have a good story about that. Just two days ago, I wrote a really large query. We had a more simple query that gave some monthly data the last six months, and a department came to me and they said they’d like to see that data drilled down to hourly for each day of the week. And they wanted to do some shift planning and understand fluctuations in certain volumes in our company so they can have more staff on certain shifts should they justify a night shift. And we have a lot of data, so six months of data in our database, if I were to write that on our traditional SQL database, I would have to start writing that query with just two or three days of data, and it maybe still might take 30, 60 seconds to run, maybe five minutes, just because of the sheer volume of data we have each day. So six months of data, totally impractical. I would never be able to solve this problem in pure SQL on our traditional database. But when I wrote this query in MBI, I used the CTE method that I always do, and it took about 300 seconds to run, so just enough time for me to have a coffee break and come back and I have some results. And it was exactly what the team need. They were able to handle their problems immediately. So I think that is really powerful. Speed does matter, and the reason it matters is you might think that you don’t have a lot of SQL questions that would need that kind of performance, but you might not realize how many questions you could be answering if you had that ability. So if you’re a fluent SQL writer, you feel comfortable with the CTE style, and you can write these really long, big, complex queries that answer interesting questions like that, you open more doors by having that kind of speed. And I think it’s also possible to say, maybe there’s a critic out there thinking, you could write Python and you could extract the raw data in chunks. For example, you could get 10,000 rows at a time, and that way you’re not blowing up your traditional database trying to get that six months of data. Maybe you’re pulling out two million rows. So you pull that out in Python, and we’ve done that before. And then you have to write a Python analysis. You’re probably using the Pandas library, so you can put it into a data frame and use some nice built-in functions in that library. That can be really useful, but it’s time consuming, and it’ll take a few hours to solve that problem if you do it that way. Whereas the way that I solved it, five minutes later I had the answer, and that department was able to move on and not create some big project out of this simple question. Not a simple question, but yeah. So I think the answer to that question, a bit long-winded, but yes, speed matters, and it makes a huge impact on our organization. Great, thanks. That was a really interesting new example and general discussion of your workflow process. Okay, so let’s see this next question. Again, more for the MBI side of things. I believe Deepak is back connected again. So can we use MBI to do analysis using data from multiple data sources, such as data from e-commerce and combining it with web traffic data? Absolutely, yes. Using integrations in MBI, you can start, you can bring data from multiple sources. If the source you want to bring in data from is not available as a pre-built integration in MBI, you can always use import API or CSV upload to get the data into MBI. All right, great. And Deepak, another one for you here. Are there any video tutorials explaining the entire process from managing metadata to publishing reports to dashboards? We covered the reporting process in our Getting Started with MBI and Managing Data Sets with MBI webinars. Recordings of both of these are posted in resource library. Yeah, and I’ll just jump onto that as well. You will, as a reminder, get a follow-up email to this webinar that’ll have a link to that resources library. So you’ll be able to go check out those recordings. That email should appear in the next day or so. All right. Mike, is that a call? Go ahead. Yeah. We have about three databases synced to Magento BI, but we don’t sync all the tables. And if I need a new table that’s in our traditional database, I’ve not yet synced it into Magento BI, so it’s not available for me to query there. It’s very easy for me to do that. Any person with the administrative privileges in MBI can easily use the point and click data warehouse manager. It’s super simple to pull in a new database table. It’s a very simple process. And if it’s a huge table, we have a few huge tables with billions of rows, we will write a support ticket to the Magento BI team and they will help us with that sync process in a way that doesn’t disrupt our nightly syncs with Magento BI. So I’ve found that whole, like we were talking about the entire process. I think that process of syncing is really a breeze in Magento BI. And when it’s not a breeze, there’s excellent support team. Great. Thank you. All right. I think we have time for maybe one more question today before we wanna let people go with enough time to kind of get to their next meeting, maybe take a little break in between. There are many reports I could make, but nobody is requesting them. I assume in MBI. Many other teams and departments don’t know what data is available or they’re unfamiliar with making requests, but as a data worker, I see data and opportunities in the database that really ought to be investigated and or used in decision-making. How do we improve this situation? I recommend to make a personal dashboard. Anyone in Magento BI can make their own dashboard and they decide whether or not they want to make that visible to others. So make your personal dashboard like a playground or sandbox, and then create those reports. If you think it seems valuable and maybe not too time consuming, because you are the person working in the data, no one else is gonna know to ask that question or to look at those insights. And this is where you really have an opportunity to shine as a valuable contributor in your organization. So after that report is already living on your personal dashboard, you can reach out to stakeholders who might be interested in those insights, ask a few play questions about that topic. See if they might already have some information about that topic. But if they don’t, I think it’s a good suggestion to ask them if they’d like to own those data reports, meaning like you can move those to their departmental dashboards or whichever dashboards they have. And there might be some occasions you don’t want to use your time that way. Maybe instead you should reach out to the stakeholder before using your time to investigate and create those reports and insights, because they might already have their eyes on that type of data through another reporting system that you’re not aware of, or you might learn some other reason for not needing that data insight. So it’s that interdepartmental communication to and from the data team. I think that’s essential. That’s one of the most valuable aspects of a business intelligence or data science team, is that communication. And lots of times that communication leads to new questions that are valuable. Those discussions lead to action items. And because data can often be too complex for an email, for writing, I think sometimes it’s also better to schedule a 15 or 30 minute meeting with the stakeholder instead of writing in one of those large emails trying to describe something ambiguous or complex. All right, great. Well, thank you again, Sean, for all those great insights. We are about out of time for today’s webinar. Once again, everybody, thank you so much for joining us today. We hope you got some really great information out of the session, and have a great rest of your day.

Useful resources

TIP
If you are new to Commerce Intelligence, the Getting Started video series provides a full overview for driving better business insights and results through data aggregation, analysis, and visualization.
recommendation-more-help
d7409b8f-48b5-4e68-8fba-85517cfe1127