Sariel Moshe 25 min

A Knowledge Copilot for your Entire Company using Precision RAG


Your knowledge base is more than a repository; it should be a co-pilot for your entire organization as well as your customers. This session will disrupt the way you think about knowledge management, introducing technology and guardrails that ensure the right information is available when and where it’s needed most. We’ll prove that empowering your team with an implicit knowledge engine can lead to transformative business outcomes.



0:00

>> Hi everyone. I'm next to Corinne for the great presentation. I'm Co-founder

0:09

of XFind,

0:11

recently acquired by support logic. And now our engine is the basis for support

0:17

logic's

0:17

latest edition which is resolvasex. What I'd like to do in the coming

0:23

presentation is

0:24

share first of all the support logic vision for what an enterprise knowledge co

0:29

-pilot can

0:30

look like or will look like, how that plays into our technology, how it

0:36

compares to other

0:37

solutions in the market, and how it can really improve on existing knowledge

0:42

management processes

0:45

and turn many overlooked resources you already have in the enterprise into

0:51

valuable real-time

0:52

knowledge. And there we go. And as Corinne noted, resolvasex is targeting the

1:09

answer

1:09

seeker in the enterprise. And this could be really multiple different people in

1:16

the workforce.

1:17

It can be from a customer wanting validated answers to their issues throughout

1:23

many different

1:23

people on the team. And when we go specifically into support, really everyone,

1:32

customers and

1:33

agents, engineers alike, want a great knowledge co-pilot. This is very obvious

1:39

in a lot of

1:40

research that's been done. So engineers, what they want is secure precise

1:45

knowledge co-pilot

1:46

embedded into their workflow. And we'll get into what that means exactly. But

1:52

what Corinne

1:52

and Karthika just showed gives you an idea. And customers also want a similar

1:59

ability

2:00

to help themselves serve themselves. And this, again, has been really a

2:05

recurring theme

2:06

in a lot of research that's been done in the past few years and really AIs

2:10

bring it to

2:11

a new level of expectations from what such a co-pilot can look like. But when

2:18

we look

2:18

into very specifically B2B support, what defines a good co-pilot is mainly

2:26

three things. So

2:28

first of all, it has to be in the user's flow. So whether it's customers, when

2:32

they're

2:33

coming in with an issue or agents, when they're dealing with that issue, it has

2:37

to be right

2:38

there in front of them and help them as they're working on the issue at hand. B

2:44

, it has to

2:44

be adapted to really different types of issues and queries. So it's not just

2:50

keyword

2:51

searches as we've been used to for so many years. Really, you want to be able

2:56

to deal

2:57

with complex types of issues, to complex types of cases, as queries as well.

3:03

And C, and probably

3:05

most importantly, is that you want to be able to make use of those internal

3:09

sources that

3:10

you have that are really very messy a lot of times. So past cases, Slack, Jira,

3:16

all those

3:17

data sources, you already have them in place. Many times they're overlooked,

3:21

they're not

3:21

really being used in order to help your customers or your engineers solve the

3:26

issues at hand.

3:27

And that's what we really want to have and enable in such a co-pilot. So if

3:36

everyone

3:36

wants this and we already have this data in place, the big question is why isn

3:43

't everyone

3:44

building it for themselves? AI is out there. AI has been available for two

3:48

years now,

3:49

we've all seen chat, JPT, and that big revolution. Why isn't everyone going

3:54

ahead and building

3:55

it for themselves? So the answer really breaks down into two sides. There's the

4:00

data side

4:00

of things and the tech side. On the data side, and again, when we're talking

4:05

specifically

4:05

about B2B support, this is really true, it's complex data sources. So as I

4:12

noted, a lot

4:13

of times, most of the data you want to have in place for such a co-pilot does

4:18

not reside

4:19

in a nicely curated knowledge base. Usually, a nicely curated knowledge base

4:24

covers maybe

4:25

20, maybe 50% of the issues you're actually trying to solve with a co-pilot,

4:30

while the

4:31

rest really is sitting in those past cases, in your Slack or teams in JIRA, and

4:37

you really

4:38

want to be able to deal with that kind of data. And that data doesn't play

4:42

nicely with

4:43

generic search tools, with generic AI tools, and it requires quite a lot of

4:48

massaging.

4:49

And we'll talk about what that looks like as well. The second issue is the

4:54

queries. So

4:54

as I noted, we're not talking about keyword searches here, we're talking about

4:58

actual

4:58

issues coming in with logs and descriptions and a lot of different technical

5:04

information,

5:05

and you want to be able to deal with those types of queries, not just the

5:09

simple ones.

5:10

So that's on the data side of things. It's on the tech side of things. So you

5:14

want to

5:15

be able to integrate with these dynamic sources, and that's not a fire and

5:19

forget type of integration.

5:20

You really need to be able to continuously work on that integration and make

5:24

sure you're

5:25

pulling the data correctly, doing the work correctly. Another element is

5:30

workflow adaptation.

5:31

So you want this to be as part of your engineers or customers' workflow. You

5:37

want to be building

5:39

the correct user experience and UI, and that requires quite a lot of work. It's

5:44

not just

5:45

something that you can do offhand. Another two important points are cost and

5:53

upkeep.

5:54

So upkeep, I think, again, these are complex types of workflows, complex types

5:58

of data

5:59

sources. That's there. Cost. So large language model, the cost there, they are

6:06

going down

6:07

over time, but still when we're talking about tens of thousands, hundreds of

6:11

thousands of

6:12

queries, yearly, the cost are there, and you want to be able to deal with that

6:20

cost efficiently.

6:22

That requires quite a lot of work, quite a lot of technology in place to be

6:25

able to do

6:26

that. So really, the point is that many legacy search solutions out there, or

6:36

do it yourself

6:37

rag solutions, where the embeddings are taken off of the internet, just as is

6:43

internet

6:43

trained embeddings, those really aren't going to be able to deal with the use

6:48

cases we're

6:49

talking about in B2B support. So if we break it down into the actual use cases,

6:56

what common

6:57

rag, as we call it, or vanilla rag, as I like to call it, what that can enable

7:03

you to do

7:03

is work with how to use cases, how to chat bots, right? When your customer

7:10

comes in with very

7:11

simple issues, and you have a nicely curated knowledge base with maybe a few

7:17

hundred articles

7:18

explaining how to solve different issues, FAQs and that kind of stuff. Yeah,

7:24

that can

7:25

do the work. For that, you don't really need us. You don't really need to

7:27

resolve a sex.

7:28

But what we're talking about, break fix issues, when we're talking about really

7:32

the more complex

7:33

types of issues that you have in the more complex types of companies and

7:39

products, when we're

7:40

talking about thousands of barely curated articles or where you want to use

7:45

your past

7:46

cases in order to help solve those issues, that's when you really need

7:53

precision rag.

7:55

The generic solutions won't really be helping you in those types of cases. So

8:03

let's get

8:04

a bit into the technology itself and explain what differentiates us and what we

8:09

're doing

8:09

from many of those generic solutions. So it starts really with the knowledge

8:14

sources.

8:14

Knowledge sources, again, many different types of knowledge sources from those

8:20

knowledge

8:20

bases all the way over to Slack channels, pulling them, really the first part

8:29

of making this

8:30

work is data cleaning and parsing. So you want to be able to take that data and

8:36

have

8:36

the models in place to massage that data into something that can really be

8:41

worked with in

8:42

the models that are going to be working with it. So cleaning it up, removing

8:47

irrelevant

8:48

information, making sure obviously that no PII, no personal information exists

8:53

there.

8:53

And then only then when it's really ready, you can take it over to the modeling

8:58

and language

8:58

processing phase. Now, with another big differentiator is our use of how we use

9:06

language models

9:09

in our engine. So what we do is instead of using just one approach, which is

9:14

many times

9:15

a lexical approach from past types of search engines, which is the keyword-

9:22

based approach,

9:23

or just using a semantic approach which uses internet trained embeddings like

9:28

we see in

9:29

large language models, what we do is we combine a lot of different approaches.

9:34

So lexical,

9:35

semantic, local semantic, a lot of different of these models into a stack that

9:40

really what

9:41

that gives you is a lot of different angles to understanding what is going on

9:45

in the data,

9:46

both on the query side and in the result side. And that enables us to achieve

9:54

very nice accuracy

9:55

even when our query, as showed earlier, is an entire case or a summary of a

10:00

case. And

10:01

summary of a case for us in our engine is a valid query much like a keyword

10:05

query is

10:06

an apporel. And we want to be able to deal with that quickly. We don't want to

10:12

require

10:13

months of training in order to achieve good results. We want to be able to do

10:18

it, pull

10:19

the data and go. So that's another important differentiator. And then comes the

10:26

guardrails.

10:27

Guardrails are really a critical part. Again, we all know hallucinations is a

10:32

big issue with

10:33

our language models and with the I in general. And we want to avoid that at all

10:37

costs. How

10:38

do we do that? So we have two types of guardrails in place. The first is in

10:43

domain. So we

10:45

want to understand whether the query that we're getting, whether it's from a

10:49

customer

10:50

or from an engineer as well, whether it's actually in the domain that's

10:53

relevant to the

10:54

data we have in place. So say for example, a user comes in and asks the

10:59

question, how

10:59

do I walk my dog? We don't want to be answering that question with data from a

11:04

cybersecurity

11:06

company because obviously it's just going to hallucinate and provide irrelevant

11:10

information.

11:11

So we want to detect that first. And that's one proprietary technology that we

11:16

've been

11:16

developing for quite a while now is how to be able to detect that. The second

11:22

after the

11:22

end domain element is say even if the query is in the correct domain, do we

11:28

have information

11:29

in place to actually answer the question? And again, if we do not, we want to

11:34

avoid answering

11:35

the question. We want to be able to say we don't have any information on this

11:40

topic.

11:41

Go ahead and maybe create an article or ask experts, but we don't want to just

11:46

make up

11:47

stuff that doesn't exist. Last and doesn't appear here, but another very

11:53

important element,

11:54

and this has to do with cost, which I noted earlier, is the whole process of

12:00

passes retrieval.

12:02

So right before we provide the prompt and the context, if you're all familiar

12:08

with the

12:09

rag setup, so right, we get the context, which is the relevant items, we have

12:13

the query, and

12:14

then we provide that as a prompt to a large language model, please answer this

12:19

question

12:20

with this context. Now the context, if we're just providing it with a three or

12:25

five page

12:25

long article, that is not precise enough to really be able to find the relevant

12:32

information.

12:33

That's A, but B, it also has to do a lot with the cost. The more tokens, the

12:38

more information

12:40

you're feeding to large language model, the higher the cost. And it's really,

12:43

it's an

12:44

avoidable part. If we're able to find the specific passages that will power the

12:51

answer, it's a

12:52

much better both in precision and in cost. So that's the whole process that we

12:56

've built

12:57

in place. It's a process that each data source, each customer that we work with

13:03

, this is the

13:04

entire process that we're running on their data, and it's built in a way that

13:08

really

13:09

integrates into many different use cases and workflows. Just to maybe drive

13:15

home another

13:15

point on the technology side, so this is a benchmark that we ran a few months

13:22

ago, taking

13:25

X-Find, then X-Find, the engine and comparing it to, again, the vanilla or

13:31

generic rag

13:32

approach, which is taking embeddings, say, opening eyes, ADA embedding off the

13:37

internet,

13:38

and comparing the precision between them given a few different sets of data. So

13:43

what we did

13:43

here is we used four different types of data. One is Ember, they sell coffee m

13:50

ugs, Alab,

13:51

sell office, some office hardware, eight by eight and waters. What we see very

14:02

nicely

14:02

is the more complex and domain specific the data that we're dealing with, the

14:08

bigger the

14:09

better X-Find compares to internet embeddings to ADA in this case. And when we

14:18

get to waters

14:19

that they sell lab hardware, there the difference is already almost a quarter.

14:25

So one in four

14:26

questions where X-Find or Resolve us X today will answer correctly versus

14:31

generic embedding

14:33

off the internet. And one in four is very -- that's a big number when we're

14:38

talking about

14:39

getting answers correctly in a customer service, customer support type of

14:45

scenario, right?

14:47

We want to make sure we're getting those answers correctly. 25% that's a big

14:52

difference.

14:54

So what I'd like to do now is -- so I've talked quite a bit. Now let's see a

15:04

few live examples.

15:06

And what I'm going to be demoing is two scenarios, two use cases. The first is

15:11

support portal,

15:13

a scenario, much like you're all familiar with, you want your customer to come

15:18

in, ask

15:19

questions and get answers. The second is more of a Slack use case. We've

15:24

connected the same

15:25

engine to a Slack -- in a Slack type of scenario. We want internal workers,

15:34

engineers, ask

15:36

questions and get relevant answers. In the portal, it's actually demoing on

15:42

super-object

15:42

data. So in the portal, it's our fresh test knowledge base. And in Slack, it's

15:48

using internal

15:50

get guru and confluence knowledge bases as well. And what I want to show is a

15:54

few edge

15:55

cases that will really, I think, bring home the point of the difference between

16:00

common

16:01

rag approach to what precision rag can really get you. There we go. So I don't

16:15

know if you're

16:15

not seeing it yet. So right away. There we go. So as I noted, starting with the

16:25

portal

16:26

side of things. So on purpose, I wrote a query here that's a bit long and B has

16:34

quite a

16:35

few spelling mistakes. And a lot of the solutions that exist today really are

16:44

built as I noted

16:45

for keyword searches. They're not really built for this type of length of a

16:50

query. And really

16:51

the ability here is A, to be able to retrieve the relevant items as you can see

16:56

in the center,

16:57

those are the relevant knowledge items to the given query. And B, the other

17:03

element here

17:04

is to be able on the large language model side, which is on the right hand side

17:08

, the

17:08

answer that's being generated, to be able to take the specific passages from

17:13

the top

17:14

items and mesh them into a very nice answer to the given questions. Just to

17:21

read through

17:22

it, right, the customer comes in with the question, which features does core

17:28

include?

17:29

Does it include voice? And there's a few spelling mistakes as you can see here.

17:33

But then by

17:35

providing the relevant items, even though the, even given the spelling mistakes

17:40

and being

17:40

able to retrieve the right passages, we really can get a very nice answer to

17:45

the customer's

17:46

question. Now another example here is, so what I went ahead and so elevate,

17:54

right, is

17:55

also of both a verb in English and a support logic solution. And you want to be

18:02

able to

18:03

be very precise in understanding when it's serving as a verb versus what it's

18:07

serving

18:08

as a name of a product. So when I ask it how to elevate the solution, which is,

18:14

it's a

18:15

very general question, but it goes ahead and answers to the point, right,

18:21

elevate as a

18:21

verb. How do I improve the solution? But then when I ask it how to add elevate

18:29

to the solution,

18:29

just by adding that word add, it understands that elevate is, we're talking

18:34

about the

18:34

product here and speaks to that, speaks to how to utilize that, the elevate the

18:41

product

18:42

in the context here versus, and not just elevate as a verb. So those very

18:47

specific, very small

18:49

details is what really is really important when you're trying to utilize, rad,

18:56

trying

18:56

to utilize these types of knowledge solutions in your context, especially again

19:02

, given complex

19:03

types of data. Now moving over to the Slack use case, what I went ahead and did

19:16

is, again,

19:17

comparing two seemingly similar queries, but they have a difference in what

19:24

they're

19:25

trying to achieve. So I'm starting with a query that's more, a question that's

19:32

more

19:32

process oriented, customer coming in from a customer's perspective, what's

19:37

required to

19:38

install the iFrame and Salesforce, right, the customer or the engineer is

19:42

asking for a

19:43

customer here, what do they need in place in order to install the Spore Object

19:47

widget?

19:48

And the answer here is coming from, mainly from the fresh desk, which makes

19:55

sense, right,

19:56

we want to, this is an externally relevant question and we want to answer to

20:01

that question

20:02

what the customer needs to do, what steps they need to take in order to install

20:07

in Salesforce.

20:08

But then when I move over to the next query here, and by the way, running these

20:14

queries

20:15

in Slack is very simple, it's a slash command, slash resolve, and you run the

20:22

question here,

20:24

it looks very similar, but, you know, again, the differences in the details,

20:29

the question

20:29

is what are the system requirements for installing the Salesforce iFrame? That

20:34

's a more technical

20:35

question. And here it knows to retrieve the relevant information, and this is

20:39

where it's

20:40

really critical to be able to deal with not so nicely curated internal

20:45

knowledge. Here

20:46

it's taking the main information from a Guru, an internal Guru article, and

20:52

Confluence,

20:54

Confluence and Guru articles, and what it's doing is it's providing the more

20:58

technical

20:59

information, more internally relevant information of what's required from our

21:04

perspective as

21:05

Spore Logic in order to make sure we can install the widget, right, the

21:09

requirements versus

21:10

not what's required, what are the requirements. So again, those little details

21:15

are really

21:16

important when you're trying to enable such an advanced type of knowledge

21:22

retrieval solution

21:24

on complex types of data. And of course, the more complex, the more it has to

21:31

be precise.

21:33

So I hope these cases, these demos gave you an idea, and I'm really, you're all

21:39

invited

21:39

to ask me later about like more specific examples or more specific questions on

21:47

the technology

21:48

itself. But maybe moving over back to the presentation. So I showed you a demo

21:59

on the

22:00

portal and in Slack. The same exact engine can power the CRM assist as Karan

22:07

and Krithika

22:08

showed earlier as well as the chat bot assist. So really all the different use

22:12

cases, all

22:12

the different workflows and experiences that you want to have in place can be

22:16

powered by

22:16

the exact same engine. Now, this is a comparison that Krishna thought of that I

22:25

really like,

22:26

which is what we see here is a test law and a Hummer, right, both of them are

22:30

cars, both

22:31

of them get you from A to B. But obviously when you think about it more in

22:35

depth, they're

22:36

engineered very differently and they're built differently for, in a sense,

22:42

different aims

22:44

even. So a Hummer, like when you're doing it yourself, when you're doing it

22:51

yourself,

22:53

a build yourself approach, to Rad, you're probably going to end up with Hummer,

22:58

which

22:58

is, it's a car, you know, it's a pretty nice car. But it's not engineered for

23:04

precision,

23:05

it's not engineered for performance and efficiency. What we're offering is a

23:10

test law. We're offering

23:11

something that's really built for that precision for the performance and the

23:16

efficiency and

23:16

to be able to work with very complex types of data and scenarios in order to

23:22

provide

23:23

that. I think, and we'll end with this, I think a good example of that is what

23:28

you'll

23:28

see in the next slide coming from Vanit, who's a VP of support at Seavent and

23:35

his experience

23:37

with Xfind. Seavent, one of our first customers and one of our first large

23:43

customers as well,

23:46

and they compared us, they compared our engine to other solutions out there and

23:51

they were

23:52

mainly taken away by the level of precision that we were able to provide on

23:56

complex types

23:57

of queries with their type of data in the use cases they needed, which is

24:02

mainly portal

24:03

on chatbot. So we'll end with this, his feedback, his thoughts on Xfind now,

24:11

Resolve

24:12

and the other side of the system. So we're going to start with the results.

24:19

>> We evaluated multiple providers to be very honest and fan when we're looking

24:24

at Xfind,

24:24

we're like still a startup, we're going to try that search engine, but the

24:31

results were

24:32

really astonishing if I could. We were getting what we needed, we never thought

24:37

that's going

24:37

to be the case. So we started with that, I think one Xfind did for us really

24:43

was three

24:44

things I could think of, one unified search. When we're talking about putting

24:48

information,

24:48

we're putting information from our training content, from our KB articles, from

24:58

our community

24:59

forums. So all of the knowledge could be designed in any way. It's actually

25:03

helping us pull

25:04

out the right content there. It's also helped us do really was unified

25:10

experience for our

25:11

customers. It doesn't matter where you are, you could get the same experience.

25:17

>> So as I said earlier, I invite you all to ask more questions and hear some

25:23

more information

25:24

on this engine. Thank you very much.

25:26

[APPLAUSE]

25:30

[Silence]