Sunday, November 22, 2009

Sales Force Call Center Adapter

SalesForce Call Center Adapter.
Author :Bijumon Janardhanan.o
Contents
1. Introduction
2. Architecture
3. How it looks in the Application javascript:void(0)
4. DemoAdapter Class
5. ISalesforceCTIAdapter
6. CDemoAdapterBase
7. UIAction
8. SendUIRefreshEvent
9. FinalConstruct the place to initialize custom API
10. CDemoEventSink
11. CDemoUserInterface
12. Handling Call Center Hardware API
13. How to Add a Button in GUI and handling Button Message
14. How to Hide and Display Buttons
15. CWinsetContainer Class
16. Call Ringing Event
17. Search for Callers Accounts and Displaying Caller details in the CTIWindow





Introduction
This is an attempt to document salesforce adapter development for future reference for the maintenance of the newly developed adapter. The documentation available with salesforce can be supplemented by this document. It took me lot of trial and error to explore some features. My task was to integrate aspect Winset control with the salesforce adapter .
Salesforce call center adapter is a bridge integrates salesforce CRM with call center hardware. Callcenter Adapter allows the agents to receive calls from customers through a Callcenter hardware and manage it with the CRM interface. Salesforce ApexAPI helps to pop the account information. For an existing user Apex API brings up the account information, with the use of the information provided by the call center hardware. The information coming from the callcenter hardware may be just the callerID or a user entered account number. Popping up account details helps the agent to effectively handle the customer call.
Salesforce browser application instantiate Salesforce Adapter, which is implemented as a local COM server. Salesforce brower application expects Call center Adapter COM to notify call related events through a COM event. This COM event sends as a piece of XML to the browser interface which is then rendered by the browser to a neat user interface. There are a set of XML tags predefined for this communication. I assume that the browser uses a script to render it. This rendered GUI which appears the left side of the salesforce application. The communication between Salesforce Application and the Call center Adapter (Local COM server ) is bidirectional. Salesforce Browser application call methods in the COM server to pass events back to the COM server.

Architecture

CTI API(Call center API provided by vendor)
CTI Connector
(Call center Adapter)
SalesForce Aplication
Call Center Hardware



Salesforce Provides a sample implementation of the COM server (Which is called as CTI connector) which can be used as a template for building new Adapters . This workspace contain two projects one is named as CTIAdapter LIB and the other one is DemoAdapter. CTIAdapterLIb contains classes implementing basic functionalities. SecondProject contain the COM server implementation. I will start with DemoAdpter project workspace. As I stated above each element in the rendered call center GUI in the salesforce browser application represents a pre defined XML tag. Fig1 shows how the adapter appears in the browser.






How it looks in the Application





Fig 1.
Class Descriptions
DemoAdapter Class
This workspace consists the classes mainly derived from the base classes provided in CTIAdpaterLIB.
ISalesforceCTIAdapter
This is the COM interface implemented by the CTIConnector. It mainly contains one method and two events.
Events
HRESULT UIRefresh(BSTR pXMLUI); //Send to browser to refresh the UI
HRESULT UpdateTrayMenu(BSTR xmlMenu); //Send to update the tray menu. Rarely used.


Method
HRESULT UIAction([in] BSTR message);//This method is called by the browser script to indicate an action in the UI by the user. This includes the action in the adapter UI or in the browser application. Handling these messages originating from browser is the core of the CTI Connector development. Message handling function will be described later in this document.



CDemoAdapterBase
This class inherits from the CCTIAdapter base class and implements the actual COM interface ISalesforceCTIAdapter.
UIAction
This method method called by the browser to indicate events in the callcenter Adapter UI. This includes both Adapter user interface events and salesforce generated events like connect,logoff etc.
An XML fragment is passed to the function containing the event type and other parameters relevant to the action.




The function handling this messages (UIHandleMessage(std::wstring& message, PARAM_MAP& parameters)) are somewhat similar to WindowsMessage handle function. This function is implemented in CCTIUserInterface class which then over loaded in CDemoUserInterface. If you want to handle some message specially it is recommended to implement that in the CDemoUserInterface class.

SendUIRefreshEvent
This function generates the COM event to notify the browser to change the UI. This function is actually called by the main class (CDemoUserInterface) when there is a UI change is needed based on the events received from the underlying call center hardware or the driver . It send an XML fragment containing the elements to be displayed in the UI. Below given is a sample XML send by the adapter.


-
-


-
-


-
-














-









FinalConstruct the place to initialize custom API
This function is called by ATL when the COM server is instantiated. So this is the best place to do some startup related works. By default it is where the main worker class CDEmoUserInterface is created. In my work I used this function to create my hidden window which holds the activeX control for communicating with the Aspect Hardware.

m_pWinSetCon = new CWinsetContainer();
m_pWinSetCon->Create(NULL);
//m_pWinSetCon->ShowWindow(SW_SHOWNORMAL);
m_pWinSetCon->m_pDemoAdapterBase = this;
m_pWinSetCon->m_pUserInterface = (void *) m_pUI;
m_pUI->m_pWinsetContainer = m_pWinSetCon;
FinalRelease
This function is called when the COM server is destroyed.


CDemoEventSink:
Its usage is described in salesforce documentation. Other than the events alredy defined this class if of not that use for a developer.
CDemoUserInterface
This is the main worker class derived from CCTIUserInterface where we can embed our logic. We have to implement our specific functionalities in this class. But I broke the rule and sometimes I made my changes in the base class itself. I was mainly due to my time shortage and in the beginning I didn’t get a clear picture of the class structure.
Handling Call Center Hardware API
The purpose of CTIAdapter is to interface with call center hardware and provide a common platform for interfacing with salesforce web application. It is callcenter adapter developers responsibility to modify the demo implementation to encapsulate the API provided by the call center hardware vendor. The custom API provided by the harware vendor normally provide a set of methods and events. Vendors may be implemented the API as OCX or dlls. In case of OCX they may provide events for notifying call related events. If it is a DLL then we may have to depend on call back functions to receive events.
It is recommended implementing a wrapper calss to handle events and method. In my implementation I encapsulate the OCX provided by aspect in to class where I received events and wrote the methods to initialize. A wrapper class will allow you to handle state variables needed to mange API state.
In my implementation I used to call CDemoAdpter function directly to notify events received from the call center. In order to make voice calls and to log on to call center hardware I used wrapper class function.
Wrapper class can also process the data received from the call center hardware to match with what is expected by the CDemoUserInterface class. This way we can insulate the changes required for switching between different call center providers. In the below section I am explaining some details of this wrapper class implementation.

UIAction Handler in CDemoUserInterface



Above given chart shows the call stack of UIHandleMessage in CCTIUserInterface. UIParseIncomingMessage function parses the XML parametaers and places in the param collection and then passes the same to UIHandleMessage function.

UIHandle contains a big if then chain where it calls functions to process each message.
This is the function where you can start the debugging. To see the list of messages supported you can open the CTIConstants.H file.
If you want to see the messages send to the CTIAdapter place a break point in this function. That is the best way to determine the messages we want to. Below given table shows a partial list of messages.

KEY_ACCEPT_CONFERENCE Message send when user press accept button in the UI. If the Conference button is shown by the developer than it must be assigned to the button
KEY_ACCEPT_TRANSFER Message sends when user press accept transfer
KEY_CONNECT Sends to COM just after browser establishing the connection with the COM. At this point CTIAdpter COM can create call center API instance.
KEY_LOGIN Sends to COM on pressing login button. Parameter array contains the user entered login name and password. CTIAdpter can then send it to the Call center hardware through the API provided.
KEY_RELEASE When user presses end call this message is send to the Adapter
KEY_ANSWER Sends when user press answer button. CTI Adapter can now call appropriate API function to accept the cll.
KEY_HOLD Sends when user presses the HOL button
KEY_RETRIEVE Sends when user press retrive button to release a call from hold.


How to Add a Button in GUI
It is possible to add custom messages to handle custom UI elements.
In this implementation We have added a new button to verify the caller account. Below shown is how to add a button and handle button press
void CCTILine::AddDefaultButtons()
{
……………………………..
…………………………….
pButton = AddLongButton(BUTTON_ACCOUNTVERIFIED,KEY_ACCOUNTVERIFIED,COLOR_BEIGE,L"Account Verified");//L"/img/btn_conference.gif", L"Account Verified");
if(pButton)pButton->SetIconURL(L"/img/btn_conference.gif");

……………………
}
void CCTIUserInterface::UIHandleMessage(std::wstring& message, PARAM_MAP& parameters)
{
………………………………….
……………………………………..

else if (message==KEY_ACCOUNTVERIFIED )

{


AccoutVerified(parameters);

}
else if (message==KEY_COMPLETE_TRANSFER) {
//CallCompleteTransfer(parameters);
//BMJO trying above function to be called in the calsate event
//function in the derive class will handle this mesg
CallCompleteTransfer(parameters);

}


………
}

How to Hide and Display Buttons
To show buttons add it in listEnabledButtons. Those buttons whose ID is not listed in this list may not be shown.
if(Line!=1)
listEnabledButtons.push_back(BUTTON_OUTSIDELINE1);
if(Line!=2)
listEnabledButtons.push_back(BUTTON_OUTSIDELINE2);
listEnabledButtons.push_back(BUTTON_INSIDELINE);
listEnabledButtons.push_back(BUTTON_SUPERVISOR);
OnButtonEnablementChange(Line,listEnabledButtons,false);
UIRefresh();


CWinsetContainer Class
This class is specific to this implementation which I used to wrap the active control provided by Aspect to interface with call center hadware. This class implements a hidden window to receive the events from the Aspect WinsetControl.
Winset Control is an active X component provided by Aspect to communicate with call center hardware. This control allows to distribute calls landed in the aspect system to the agents PCs. Once you logged on to Aspect with the use of aspect winset control aspect system can reditrect calls to the PC based on the rules set in the Aspect system. I used a hidden window to host this control and to receive events from it.

This class keeps a pointer of CDemoUserInterface class . When ever it receives an event from the ASPECT hardware it calls the function in the CDemoUserInterface class to take appropriate functions. For example when there is a an incoming call it receives an event from the aspect hardware with CUTTHROUGH data(information passed by the call center hardware along with the call).After formatting the CUTTHOUGH data it fills that in parameter map and calls the CDemoUseInterafce functions to further handle the event. Function prototype used to handle this event is given below.


Call Ringing Event










Call tree to handle Ring Event



LRESULT ProcessCallRingEventFromWinset(BSTR CallInfo)
{
std::wstring strCallInfo = CCTIUtils::BSTRToString(CallInfo);


CDemoUserInterface *pUI = (CDemoUserInterface *)m_pUserInterface;

pUI->m_sCallTrack = GenerateCallTrackID(strCallInfo);
pUI->m_mapCustomAspectData[KEY_CALLTRK]=pUI->m_sCallTrack;
pUI->m_sAccNo = GenerateAccountNum(strCallInfo);
pUI->m_mapCustomAspectData[KEY_ACCNO]=pUI->m_sAccNo;
pUI->m_mapCustomAspectData[KEY_ZIP]=GenerateZip(strCallInfo);
pUI->m_mapCustomAspectData[KEY_CCT]=GenerateCCT(strCallInfo);
pUI->m_mapCustomAspectData[KEY_LANG]=GenerateLang(strCallInfo);
pUI->m_mapCustomAspectData[KEY_SEG]=GenerateCustSeg(strCallInfo);
pUI->m_mapCustomAspectData[KEY_DNIS]=GenerateDNIS(strCallInfo);
pUI->m_mapCustomAspectData[KEY_CALLTYPE]=GenerateCallType(strCallInfo);
pUI->m_mapCustomAspectData[L"ANINUM"]=m_sANI;
pUI->m_mapCustomAspectData[KEY_SITEID]=GenerateSiteID(strCallInfo);



PARAM_MAP mapInfoFields;



PARAM_MAP mapAttachedData;
//This field is used by the app-exchange class to prepare query
mapAttachedData[L"account.ICOMS_Account_Number__c"]=pUI->m_sAccNo ;

std::wstring sCallObjectId = pUI->CreateCallObjectId();

if(m_SearchType==0)
{
pUI->SetInfoFieldLabel(KEY_ACCNO,L"Account Number");
pUI->m_mapCustomAspectData[KEY_SEARCHTYPE]=L"AC";
mapInfoFields[KEY_ANI]=pUI->m_sCallTrack ;
}
else
{
pUI->SetInfoFieldLabel(KEY_ACCNO,L"ANI");
pUI->m_mapCustomAspectData[KEY_SEARCHTYPE]=L"AN";
mapInfoFields[KEY_ANI]=m_sANI;
}


mapInfoFields[KEY_CALLTRK]=pUI->m_mapCustomAspectData[KEY_CALLTRK];
mapInfoFields[KEY_ACCNO]=pUI->m_mapCustomAspectData[KEY_ACCNO];
mapInfoFields[KEY_ZIP]=pUI->m_mapCustomAspectData[KEY_ZIP];
mapInfoFields[KEY_CCT]=pUI->m_mapCustomAspectData[KEY_CCT];

mapInfoFields[KEY_LANG] = pUI->m_mapCustomAspectData[KEY_LANG];
mapInfoFields[KEY_DNIS]=pUI->m_mapCustomAspectData[KEY_DNIS];
mapInfoFields[KEY_SEG]=pUI->m_mapCustomAspectData[KEY_SEG];
mapInfoFields[KEY_CALLTYPE]=pUI->m_mapCustomAspectData[KEY_CALLTYPE];



int nLine = pUI->OnCallRinging(sCallObjectId,CALLTYPE_INBOUND,true,true,mapInfoFields,mapAttachedData);
//RCN requested to remove answer button. So calling
//answer directly. According to design I am expecting incoming call
//in line 1 only. It is not defined what will happen if a call comes in
//line 2.
pUI->OnAgentStateChange(std::wstring(AGENTSTATE_BUSY));
std::list listButtonsEnabled;


listButtonsEnabled.push_back(BUTTON_RELEASE);

pUI->OnButtonEnablementChange(nLine,listButtonsEnabled,false);
pUI->m_nWinsetLineToUse=0;//reseting it just in case
m_nDialOut=0;//reseting
return 0;
}


Similarly it calls end call and other related function appropriately to notify the call related status.

Search for Callers Accounts and Displaying Caller details in the CTIWindow
This is an important area where developers want to make changes. It is very important to show the details of the caller to the agent. Depending on the callcenter hardware and IVR prompts it may collect different information’s form the user. This can be used to search in the salesforce database to find out the user accounts and to display needed information in the CTI window. The default implementation contains a search based on ANI(caller ID).
In the frame work there is a special class and function to handle this search that is CCTIAppExchangeSearchThread::ThreadSearch()
After completing the search this function posts a message to the hidden window created in that class. In my implementation I used end of search to send Answer event. It was client requirement to auto answer with out user pressing the Answer buttion. I have no other event to depend on calling answer method.

LRESULT CALLBACK CCTIAppExchangeSearchThread::HiddenWindowProc(HWND hwnd, UINT msg, WPARAM wp, LPARAM lp)

CCTIUserInterface calss calls search function on CallRinging function through AppExchange class. AppExchnage class is where APEX API COM interfaces are instatiated and used.

APEX API is the interface provided by sales force to access the salesforce internal data. Salesforce provides different methods to access these data. CTIDemoAdpter implementation uses the COM version of the Apex API
IQueryResultSet4Ptr

Conclusion
Though I want to write a comprehensive guide on CTIAdapter my time does not permit it. Also the technology is almost expired and there wont be much work based on this adapter I hope. The way I organized the contents is the way I remember it. My intention is just to record what ever I can so that it will be helpful to those who struggle with it. Also I hope this document will help me in future if there is some maintenance required in my own work. I am happy to help somebody who struggle with CTIAdapter development. Feel free to contact me at bmjo@ebirdonline.com

Friday, April 10, 2009

Voice Recognition Study Report

1 System Definition

1.1 Problem Definition

Voice Recognition systems have been developed by different vendors for the last 10 years. Even though the current systems available in the market are not reached its maturity, They are quite usable. This is an attempt to tap the potential of such systems to make the transcribers job much easier and that way improve their productivity in a mutually beneficial way

1.3 Goals of the Initial Study and the proposed system

13.1 To use VR systems effectively in Medical transcription.

13.2 Most VR systems available in the market depend on initial enrolment. This enrolment process is lengthy and it is not possible to insist the doctors to go through that. So we want to figure out a method to eliminate this initial enrollment process

13.3 The main advantage of using VR in medical transcription is the availability of archrivals. In our case we have lot of old transcripts and dictations available to train the system. We need to tap this advantage for systems training and improvement.

13.4 In most cases the Vocabulary used in Medical transcription is limited compared to a general day-to-day conversation. This vocabulary can be extracted from the transcript archrivals.

13.5 Integrate the system with VoiceSys . Voice Sys identifies the files from those doctors who are having a speech profile stored in it. It then transcribes those files automatically.

13.6 After the automatic transcription of the Voice files this will be send to transcribers for proofing. The corrections made during the proofing process will be then used for further training. We want to figure out a method to collect this data with out affecting productivity of the document proofing process.

1.4 Constraints on Proposed System

1.4.1 Usual direct enrolment is not possible in most cases.

1.4.2 The recordings of medical records are usually done in normal noisy environments. This usually cases lot of back ground noises.

1.4.3 Most doctors are not willing to follow best practices in recording.

1.4.4 Recording volume can vary some instances

1.4.5 Doctors can have difficult accents depending on their origin and speaking habits.

1.5 Suggested Environment for the system

1.5.1 Hardware

1.5.2 Software

Windows 2000 or XP/VoiceSys

IBM ViaVoice Engine

IBM ViaVoice SDK

1.5.3 Manpower and Skill Report

It requires a VC++ programmer with deep knowledge in IBM ViaVoice SDK,

1.6 User requirements

1.6.1 Automatic Doctor enrollment with use of archived transcripts and audio files.

1.6.2 Huge amount of Transcripts can be used to teach the dictation context of the speaker

1.6.3 Facility to build specialized Vocabulary (Topic in IBM SDK language)

1.6.4 Automatic Transcription as on files arrives in the system.

1.6.5 Automatic Template(work type) selection for formatting the transcribed documents.

1.6.6 Automatic formatting of the document to populate fields from the dictation to corresponding fields in the template.

1.6.7 Automatic collection of correction information while transcriber proof reads the ASR generated transcripts.

1.6.8 Automatic continues training to improve performance based the correction data collected during the proof reading.

1.7 Solution Strategy

Each requirement stated above can be considered as the main points to be addressed by this project. Based on that this project can be modularized as discussed below.

1.7.1 Enrolment: - This will be semi automatic process . It may not possible to apply VR to all doctors . After selecting the doctors to whom we can apply VR then we have to some process manually.

1.7.2 Transcription and Formatting: - This is a very important and tedious phase. Here we need a generic design to match with the dictation habits of different doctors.

1.7.3 Correction: - Design goal of this module is to collect correction information from proofreader with out adding extra burden to proof reader to mark the changes. Once the proofreader sends back the corrected document this module collects those information and saves it for future training of the engine.

1.7.4 Easing Proof reading:- Goal of this module is to improve the productivity of the proofreader by providing cool features like moving the cursor as play back advances.

1.7.5 System architecture:- VR is a process intensive process so it is recommended to run on separate system.

1.8 Priorities for System Features

Since all are mandatory, hence the features high lighted above have equal priorities. Anyway developers can concentrate more on OSS and foot switch side since it needs some expertise.

1.9 Sources of Information

IBM Via Voice Documentation and Wizzards Software

2 Standards and Procedures

We will follow standards published in ‘Synergy’

3 Risk Management

IBM Via Voice is poorly documented and IBM stopped development and support in this product. Wizzards Software is now supporting Via Voice.

4 Acceptance Criteria

Accuracy 60%> for selected doctors. Total integration with VoiceSys

5 Requirement Changes during Development

Please refer section 3.

6 Project Deliverables

VoiceSys VR Server

Wordscript

7 Project Estimation

2Programmers: Experienced in C++, knowledge in C++, GUI, IBM ViaVoice API and VoiceSys architecture

8 Glossary

Appendix A: Initial Experiment results

Appendix B: Architectural Design Guide lines

9 Appendix A Vocabulary Experiments:

As part of our initial study about voice recognition I did some experiments assess the possibility of using VR effectively.

Vocabulary Size:

the size of a normal doctors vocabulary. Unfortunately all the doctors selected from same specialty.

Below given table shows the results of experiment. It is showing that normal dictation vocabulary is somewhere near 20,000 words. This is only 1/3 rd of the normal vocabulary (64000) . This reduced size of vocabulary can improve the accuracy of voice recognition.

Specialty

Vocabulary size

Number of dictations used

Doc1 (ortho)

18822

10039

Doc 2(ortho)

17008

13331

Doc 3(ortho)

19010

10103

Another important finding is that new words came in per dictation is drastically reducing after processing first 100 dictations. In the first file program collected 80 new words out of 266 words. This is not a mistake since in the same file words can repeat. When it iterates through file vocabulary of the program increases and so the new words found gradually reduces. Below given table shows first 10 rows from the experiment result. Full data is attached in the appendix.

File Number

New WordCount

Total words in the file

New Words Percentage

1

85

266

31.95489

2

43

226

19.02655

3

68

297

22.89562

4

19

173

10.98266

5

19

196

9.693877

6

16

209

7.655502

7

80

519

15.41426

8

80

538

14.86989

9

20

259

7.722008

10

11

225

4.888889

This table shows size of vocabulary acquired by the program and the average new words found in the region. It is showing that after acquiring a vocabulary size of 3000 the chance of new words finding is limited to 1.5 %. In other words it is possible transcribe a file with 98.5 % accuracy with a Vocab size of 3000 words. Also its is noticeable that with a vocabsize of 5000 we can attain an accuracy of 99.4% (See full table in Index)

VocabSize

New words percentage

1697

2.363119

2498

1.572973

3016

1.575954

3512

1.403467

4017

1.227529

4486

1.083378

4861

1.024937

5237

0.848871

5520

0.950028

5834

0.716892

6077

0.717065

6331

0.752041

6604

0.628675

6805

0.671466

**Above results are obtained with the help of an untested program . Any miss assumption or logical error can cause bad results.

Dictation style

When we went studied different dictation of a doctor it is found out that doctors normally follows same style during his all dictations. By tuning the program according to the dictation behavior of the doctor it is possible to improve the accuracy. For eg : - Some doctors always start by dictating his name, then Medical record number and other details in a specific order. Probably we can switch engine to command recognizing mode to recognize these things and fill it in the templates. While designing backend recognizing engine these things to be taken in to consideration. Following things are noticed while examining the dictations.

1. Specific patterns to start a dictation.

2. Some doctors say new paragraph and periods while some other ignores that.

3. Doctors divide dictation sections with different utterances. Objective analysis ,objective etc.


Appendix B: training the Engine from archives:

Un Attended Enrollment Experiments:

One of our main target in this VR attempt is to eliminate the required enrollment process by the dictator. Transcription is done as service and it is not possible to force the user to under go the time consuming and tedious enrollment process required by the most of the engines available. This process to be done at the transcription service with out disturbing the client. We conducted lot of research in the area with existing engines.

We did our experiments mostly with IBM Voice Center. A user is created for the intended user. Following steps are executed to prove that we can by pass user enrolling.

  1. A loop back wiring is used to connect from headphone jack to microphone input. A potentiometer is used to adjust the volume level.
  2. Use old archive transcripts of the same doctor to teach the context. This is done in two steps
    1. Set speech engine user created for the doctor as the default user. This can be done by editing the C:\Program Files\ViaVoice\users\client.dfl file. There is only one entry in the file which can be set to the needed user.
    2. Use vocabexp.exe to teach context. Start this program and select the text form of the transcript files for training
  3. Create Topic for the user. Topic can be created with the use of same set of files. After creating the topic use vati.exe to activate the topic.
    1. Now select this newly created topic as the default topic of the user. This can be done by the use of the control panel applet provided by IBM. Last combo in this control shows the topic.
  4. Now with the use of Dictation pad transcribe the file. Wave file can be played to the Dictation pad with the use of loop back.
  5. After the completion of transcription correct it in the dictation pad itself. Continue this step with 50 files or so. This will give a good amount of accuracy.

Above experiment proved that it is possible to avoid enrollment since after doing 20-50 files dictation pad starts giving good amount of accuracy. Encouraged from this result we start looking in to IBM API for features to automate manual correction.

Automating the correction process:

The above experiment proved that it is possible to use manual correction process to avoid pre enrollment. But this is a time consuming and indirect method. Our next attempt was to use the old archive audio files and its corresponding transcripts to use for automated training. Then we examined IBM API for correction functions. We also used IBM API log to analyze how dictation pad handled the correction process. API Log can be enabled by setting api_log_level = 2 in engine config file.

One of the main task in automated correction is to exactly determine the mis recognized words and its corresponding correct words from the original and recognized texts. This is a tedious task in some condition there is only limited words recognized correctly. We used a word sequence find method to locate correctly recognized words. Then we corrected those words where we are sure that it can be corrected. Otherwise those words are discarded. A sample program developed with use of this algorithm which we used for training purpose. This gave us a good result. See appendix for details of the algorithm and training results.

Conclusion:

In the the light of experiment results we suggest that it will worth to develop a comprehensive transcription application on IBM Engine. IBM engine is comparatively low cost and considered other engines. Next step is detailed designing on the system to integrate with voiceSys.


New words Found in first 100 files (X Axis -- File number, Y axis – New Word Count)

New words Found in first 1000 file (X Axis -- File number, Y axis – New Word Count)

New words Found in first 10000 file (X Axis -- File number, Y axis – New Word Count)

New words Found in last 100 file (X Axis -- File number, Y axis – New Word Count)


Appendix D: Unattended training Progress in Accuracy results

This graph plotted based on the results obtained when trained with a batch of 100 files and its corresponding wave files. Before starting the experiment A topic is created for the doctor using 1000+ archive dictation transcripts. Context creation with use of Vocabexpert is also done with same files.

X Axis shows training counts. Training is done when 1000 phrases are collected from the training files.

Y Axis shows average percentage accuracy of the batch of files used for the iteration.


Below Graphs shows the accuracy changes in same file after training iterations.

X Axis : Iterations

Y Axis : Accuracy



Appendix D: Observation on IBM engine. This tips will be useful to developers on IBM engine. These are the point we stuck while our experiments.

  1. Training fails at setp 2 due to unavailability of bsf files in the corresponding folder (uns\[documetid]\bsf).

Train log shows failure at step 2. This problem is solved when we changed SmSet called with SM_SAVE_AUDIO without SM_SAVE_AUDIO_DEFAULT

SmSet( SM_SAVE_AUDIO, SM_SAVE_AUDIO_ADAPTATION|SM_SAVE_AUDIO_ALTERNATES|SM_SAVE_AUDIO_TRAINWORD|SM_SAVE_AUDIO_PLAYBACK, &reply );

  1. Train fails if SmDiscardData is called.

Solved when save all speech data argument is used in SmSaveSpeechData. It will save all audio data but discarded data reflects in the tags file.

nRc = SmSaveSpeechDataEx(NULL ,iUniqueDocumentID,SM_SAVE_FOR_ADAPTATION|SM_SAVE_ALL_TAGS,0,(unsigned long *)m_Tags,&msg);