Martech Conversations: Episode 7
Google Search Techniques for Marketers
Dorai Thodla: Okay. So what do you use today anyway, to collect, at least gather data?
Pravin Shekar: It continues to be Google, it continues to be search strings, put that together. A few databases that we have signed up for in terms of competitor information or client prospect information, and newsletters that we subscribed to, and tracking whatever is happening in the landscape with specific reference to the new tools or new services that are being offered. And we put them together. We don't have a database per se, it comes down to a Google XLS or an Evernote where everything comes in.
Dorai Thodla: So basically a text tool. Let me kind of before actually start talking about tools. We'll explore this a little bit, right. Let's say I want to find a directory of companies in, you know, medical devices. So what I'm going to do is I'm going to go to Google and say, “companies in medical devices”, I can say, “list of companies in medical devices”, “directories of companies in medical devices.” Okay. And then if I'm lucky, and I think in most of them, you do get lucky is that you hit upon a directory, which is basically this, which is actually a table. And this is exactly what you're looking for. And what they do is they start getting these tables, and then you can, so first thing you can do is go to the page, and you can extract the thing. So instead of copying and pasting it as text, you can, there are tools that actually go to a web page, extract a table, and give you a CSV file, that you can directly load in Excel or Google sheets or any those things and Google Sheets, if you're sharing it with somebody else and processing it. And one of the things that you will start noticing is that many of them will have links to these, each of these companies, but the link will actually not go to the company, actually go to their internal page, which will give you a small summary of the company and their own thing. And sometimes if you're lucky, we'll give you a link to the website of the company. But otherwise, in many of them, they may just give you, it's in their own internal entry about the company or page about the company. And but of course, once you know the company name, you again, take your company name, and now go to Google, and then try to get the actual URL of the company, which is website address of the company. Have you ever tried to type the name of the company in a Google search and see what happens?
Pravin Shekar: Yeah, most of the competitors and some clients and prospects, we do that.
Dorai Thodla: Yeah. So what do you expect when you type the name of company in Google?
Pravin Shekar: It's generally about the company, the website, the press releases, etc, that comes through.
Dorai Thodla: But that’s not in the first results that will come.
Pravin Shekar: No no, it will be a bunch of ads and other leakages
Dorai Thodla: Besides the ads, even if you don't have the ads, there's a lot of like, for example, you type the name of a college, the lots of others who went and got the SEO. So they'll be people who will be actually using like, if you type the name of a restaurant, go and type for example, Sangeeta or whatever, the first entry you see...
Pravin Shekar: Other links. Yes, I get it.
Dorai Thodla: So before you actually find the name of the company or the actual company,
Pravin Shekar: Which may not be the first page.
Dorai Thodla: Yeah, may not even be in the first page in some cases. And so that is one thing. So, these are normal tools, right? You have a search engine that gives you these kind of stuff and then you will gather them. Then you go to the company website. So there's a bunch of pages. So what are you going to do? They're all text. Okay, so what do you do then? Let's say your idea is to find 10-15 competitors and alternatives to some product. Yeah. So the product alternator, or product competitor is not necessarily the company alternative or company competition, even though there is an overlap. So what you start doing is, you find another website, you go to the website, then what do you do? You start writing down, you know, the product name, you will start getting, you know, copy, paste the URL and all that, bookmark it and all that sort of stuff. But you finally end up everywhere with text and images, text and images, yes. So what at that point time most people stop. They look at the text and think and say, “Okay, fine, we'll copy paste the text.” Like you said, the very fact you said Evernote sent me a signal that, okay, that's exactly what. You put a Web Clipper or something like that Evernote extensions on the browser and say, “Hey, copy this dude.” Or “Give me a summary of this page.” And Evernote has all these different [tools]. And so for those people who have not used Evernote, Evernote has, like, I think, three or four formats, it can bookmark a page, it can give you an abstract somewhere near that page, you know, save, it can just copy all the text in that page.
Pravin Shekar: Right, right.
Dorai Thodla: And then and then it does something else. So now instead of being on a webpage you're stuck inside Evernote. The web was easy to read and write. Now Evernote is not that easy to read and write. So what happens is, you keep getting locked up in these little products that are there. And that's good because you have it, you can read it, but it's human readable. Evernote fortunately has a capability for you to write your own plugin, and which can go and do something. But now you are getting into programming. At that level, the person who writes
Pravin Shekar: I don't want to get into it.
Dorai Thodla: Which is proper, you know, which is fine. So there are a few tools we follow. And we run through the same stuff. We wanted to go track all the companies in a particular space. I will go and say “association of competitive intelligence professionals”. And then I will get a list then I want to say I want to track all these pages, web pages to see whether any of them change using InfoMinder. InfoMinder is a product where we track a page every day and see whether any difference is. And there are other products like that. Also, there is something called change detect, and a lot of website, webpage monitoring. So we needed to extract all the links from them. So you write a browser extension? I'm not asking you to write Pravin, I know what your answer will be, but there are a lot of products that you can put in as a plugin in the Chrome plugin. That will just extract all the links and give it to you. And then you can save the links. That is one. The second thing is there is a noun, we remember in the beginning, when you're talking you said, I want to go to a certain space, I want to know the vocabulary of the space,
Pravin Shekar: Right! Yes, of course.
Dorai Thodla: And how do you get the vocabulary, you get the keywords that are there in the homepage, and some of the key pages, like in the product page and things like that. And there are these keyword tools that are available to you that you can, you know, that is a plugin, so I can press it. And then it will tell me it will run through the whole page and extract all the keywords and give it to you. And then you can keep accumulating these keywords. And now you have a bunch of keywords. And of course, you can sort them and you know, frequency, remove duplicates. They can all be done because these keywords can be exported to a spreadsheet. A lot of people are very comfortable with doing this with the spreadsheet. But the story is that you end up with links. You end up with text. You end up with images, which you can't do much with so you throw it away for some reason. Sometimes you keep it for some reason. You end up with tables, which can be converted into spreadsheets, and then which is again text and links. Right. And the process of getting this sometimes is very simple, because when you say directory or you get companies. But in the case of Google, if you set your page size to 25 results, you will get first 25 companies, take next 25 companies. I always set it to 100. Because it's a very simple setting in Google. Give me the first 100 results. I'll scroll through once and say, “That's it.” If I want, I go to the next 100. But most of the time I won't. And you can also search through that list. The thing that would be really nice and we don't have is that, can I take these results and search inside that results. Same level of sophisticated sector, a simple find kind of tools that you can use, like, for example, a convertible browser and say Ctrl+F, which means find in this page, and I can type some letters, and it will just do that. But the problem with that page is that pages, snippets from Google and links, and I want to go to the next level, and get all the links and go to that text, then search option. You don't have tools that doing that kind of stuff. So these are the kinds of tools that actually be built. In fact, I have one that I can show it to you that while what we do is Google fortunately provides you what is called an API. And so we work with builders is Custom Search Engine kind of stuff. And it has limited to 100 results. And I normally don't go beyond 100. So because why rather take that and then modify the search and then get 100 more which are getting closer and closer to this time. So for example, let's say I'm doing some research on cloud, you know, cloud providers, sorry, let's say Saas companies. So yeah, so I'll type “Saas companies in marketing”, somebody who delivers, you know, software as a service for marketing, marketing tools and all the stuffs. Most of them are like that, will be in Hubspots, and you know, many of these companies like SEO-MOZ, and all these kinds of things.
Pravin Shekar: Right.
Dorai Thodla: So I can go and type “Saas+marketing+companies” and then I'll get the Google results. Now I can, then I suddenly find out that, oh, the marketing tools themselves have the tools for SEO, tools for backlinks. So I now expand the search. So I grab this 100, then I go and say, “SEO+Saas+...”, you know, SEO is unique in itself. But you know, you may want to expand it a little bit further. So what you do is modify the search, refine the search, and you start getting more and more 100, for example. So given a company, I'll say, top 100 companies will give you the top list, like a list of Top 100 companies in your list. So you it's more like a lot more of Google Search magic that starts doing. So one of the skills that, you know, your team has to have is, you know, what are all the different ways in which you can search? Yeah, sometimes I go click on this link, and I go to this page. And using an automated tool saying that, okay, I want to go and grab the top level pages of a website and the company blocks. You can't do that. Because they look at you and say, “Hey, who's this guy, and I don't want him coming in.” Most people come and steal content. So there are lot of smart tools that grab. But they all want, of course, they want Google to index the pages so that they come up in search ranks. So they don't block Google. So use that technique because they don't block Google but they block all the unknown crawlers and things like that. You can say Google has this thing called Site Search, which is “site: give the URL” and we give a search string, and it is a search within that site and pull out all the pages within that site. So Google has a lot of levels of sophistication. So it's a question of learning that and then now you can use that string and you know, Google search, you can do it as a store search. The other thing that you can do is let's say you want to not only get those pages, but you also want to track them when they change. So, monitor them when they change. So what you can do is you can take this, and then you can set an alert, saying that whenever this page changes, whenever any of these links change, let me know and there are a lot of page monitoring services that do that. So at the first level, data gathering level, I have links, I have text. After that people just take off, you know, like, for example, I read an item in New York Times or the Hindu magazine or the Financial Express about, you know, various companies in a particular industry that are going to benefit from this COVID, you know, vaccine distribution or production or research or whatever. And now, I want to go and research each one of these companies, what do people do, they take a piece of paper, or now in the modern equivalent of paper, which is a spreadsheet, and then type in the names of all those kinds of companies. Fortunately, now we have something called natural language processing. The natural language processing has a subset, basically what it does is, I mean, there is nothing natural about any of these languages, but it's a human language, you know, when you look at it, and then I'll just speak to it in English, because I don't have the level of sophistication of these tools in other languages, because I haven't tried them. But saying the common denominator is English, the natural language processing, what it does is, it takes the text, and then it breaks it into paragraphs, which is fairly easy. Then it breaks paragraph into sentences, which is also easy, because each sentence ends with either a period or an exclamation mark or else sometimes a question mark. These are the end of sentence punctuation marks. And then pulls the sentences, then the past sentence, and give you, there is something called past tagging, which is called parts of speech tagging.
Pravin Shekar: Right?
Dorai Thodla: So they give you verbs and nouns, basically. And of course, they'll also give you the adjectives, they give you the determinants, and they give you a whole bunch of the stuff. But basically, for me, verbs are things that you do, and nouns are things that I'm interested in pulling out. So when you start looking at nouns, I'll start getting the names of companies, names of cities, names of places, and entity extractors go one step even further, so they don't pull out. So there is an entity extractor, which actually underlying uses something very similar to sentence, you know, parts of speech tagging, but it goes one step further, instead of it pulls up the names of places in, it will also tags and categorise them. This is where machine learning kind of hits. This technology used to be different before. They used linguistic models and that kind of stuff, and all that. But the entity extractors are not that great before. But today, they are extremely good. So I run this thing through an entity extractor, and boom, I can get the names of companies, events. So if I know there is a conference, I get a conference, all that kind of stuff. And so now I got all these entities, and which is great. So I get the names of companies, which is basically what I'm looking for. And then so it's kind of a recursive thing. So you do a Google search, get a directory, get a list of companies. If you have a ready made list of companies, your job is pretty much done. But you have an article that talks about 10 different companies and that say top list, if the company's name is not sitting in a table, they're like, hidden in subsections. So you take the entire text and feed it to this NLP engine in our tool, which will do entity extraction, and then it gives you the names of the companies. And then of course, you take the name of the company and feed it back to your company to URL mapper, you go to the company's website, and then you extract the pages of the company and the references for the company. So now, you start doing this entity extraction.
Pravin Shekar: Now, again, we've covered the gathering, and we've covered the analysis part of it. But how do I draw the inferences really fast? Because as a typical listener, or viewer to this conversation, myself included Dorai, I want to eliminate the manual tasks, because all these that you have mentioned, we are still doing traditionally, manually, one by one. Now, from information, we've got some sort of intelligence, how do I glean something actionable out of it?
Dorai Thodla: Yeah. So you don't have to do it manually today, what you can do is, there are these tools called Robotic Process automation tools. And they are process automation, right. So you do the steps 1, 2, 3, 4 manually, so you know I’m going to go to this page, then find the company and then I'm going to go to the page. We're going to do another Google search across sites. I'm going to go and do this kind of thing and the robotic process, there are a whole bunch of companies, reasonably good products. The top, I think, is UiPath. And you know, we'll probably come up with a list tomorrow. But you can also go and say RPA companies. Whatever method that we talked about finding companies, you can find the RPA company. They started with a very routine, boring things. Like for example, one of the most common uses of RPA I've noticed is I went to one of these, what is called KPOs- knowledge processing outsourcing companies. And they are, let's say, customers of Walmart. And they are, let's say, want to find out all similar products and pricing from Amazon. Okay, so they'll take Walmart product, take the name of the product. So if it's a branded product, the name will be the same, keep going, and Amazon searches, and then keep finding the pricing. And then keep extracting all the things, including maybe the product images and that kind of stuff. And then you start building a database of this kind of stuff and deliver it to customers. And that's what they basically do. Because a lot of nuances, like for example, Walmart will figure out that if you take AAA batteries that Amazon sells a pack of four, pack of twelve, pack of you know, whatever, and these guys will go and say we’ll sell a pack of six, you know, which is two skews in between these skews. So they were very dumb, in the sense that they're not dumb, they're smarter than and you know, what we do manually. But what they will do is they follow a routine. And then they will go and handle it. The next generation of them added some scripting. So what happens if one of these steps fail? What is a graceful recovery? So added scripting. Language Python is used for doing these kinds of things. In fact, there is an entire book on automating boring stuff with Python. So I'll show you that cover of the book. Yes, it's a pretty neat tool that they do all these things. And because many of these, you can do it with PHP and a whole bunch of Ruby, all these programming languages, but what they essentially do is they use the web, you know, API's to go and get this information, then that is the crawlers. Then scrapers, they scrape the information. And then scraping can get very complex because there is no one single way of scraping information. Because no two people maintain information in the same way. So slowly, they're getting smarter, like the semantic scraping is coming up, you know, like people are like, saying, Okay, I'm going to look at, you know, if I want to see an address, the way I'm going to detect an address is I'm going to look if it's India it’s 6-digit PIN code, which [in] US, it's a 5 to 9-digit zip code, even though I've never seen people use 9 digits, you know, and then normally it is preceded by the state, which is two characters, which is one of these 50 combinations. Which means that before that is a city kind of thing. So there are smarter ones that will do address extraction, for locations of the company kind of thing. And so they use rules based logic. So there is a whole bunch of interesting technologies, doing it.
Pravin Shekar: Beautiful Dorai, I've got a whole lot of things listed down as my learning for today. Starting with, I need to get one of my folks trained as a Google search guru, to get that part going. Text extractors, don't get locked up in various products, how do we find the vocabulary of a space, searching inside search results: that's the something we would need to, pass tagging, entity extractors, RPA tools example: UiPath, well there's enough and more that you have given for me to go ahead, read and prepare. And my request to you is when we connect for the next episode, it would be great, Dorai, if you could walk us through a few products, including InfoMinder that your company iMorph has developed, we would love to have a look at that tool as well.
Dorai Thodla: Okay, we also built a small some of these tools, we built an automatic search using Google Custom Search Engine API. So that given a search, will take it and save it into a spreadsheet, CSV format, you can use that. We have built a keyword extractor tool, that given a page it will automatically extract all the keywords in it, which is much harder than you think. Actually we use a Microsoft Azure service for doing this. There is an entity extractor. So I'll demonstrate a few tools. These are not products per se, but these are all simple tools we built so that, you know, we can do all these kinds of things fairly easily.
Pravin Shekar: I’d love to know more and I'd love to see how I can use more.
Dorai Thodla: And I have not answered your inference question. So we'll talk about it. I think, is the same keyword that I used. I thought I would get it today, but I don't think we can get is that knowledge graph?
Pravin Shekar: Yes. So we will cover the tools, inferencing and the Knowledge Graph. Lovely Dorai. Thank you very much.
Dorai Thodla: Thanks, Pravin. Take care.