SLT10.pdf

Taking too long?

Reload document

Open in new tab

401

6.7k

VIEWS

Share on Facebook Share on Twitter

Titre : SLT10.pdf
Submitted by : Anonymous
Description : The ATIS task consisted of spoken queries on ight-related in-formation. An example utterance is I want to y to Boston from New York next week . Understanding was reduced to the problem of ex-tracting task-speci c arguments, such as Destination and Departure Date .

Transcription

SLT 2010

WHATISLEFTTOBEUNDERSTOODINATIS?GokhanTurDilekHakkani-T¨urLarryHeckSpeechatMicrosoft|MicrosoftResearchMountainView,CA,94041gokhan.tur@ieee.orgdilek@ieee.orglarry.heck@microsoft.comABSTRACTOneofthemaindataresourcesusedinmanystudiesoverthepasttwodecadesforspokenlanguageunderstanding(SLU)researchinspokendialogsystemsistheairlinetravelinformationsystem(ATIS)corpus.TwoprimarytasksinSLUareintentdetermination(ID)andslotﬁlling(SF).Recentstudiesreportederrorratesbelow5%forbothofthesetasksemployingdiscriminativemachinelearn-ingtechniqueswiththeATIStestset.Whiletheselowerrorratesmaysuggestthatthistaskisclosetobeingsolved,furtheranalysisrevealsthecontinuedutilityofATISasaresearchcorpus.Inthispa-per,ourgoalisnotexperimentingwithdomainspeciﬁctechniquesorfeatureswhichcanhelpwiththeremainingSLUerrors,butinsteadexploringmethodstorealizethisutilityviaextensiveerroranalysis.Weconcludethatevenwithsuchlowerrorrates,ATIStestsetstillincludesmanyunseenexamplecategoriesandsequences,hencere-quiresmoredata.Betteryet,newannotatedlargerdatasetsfrommorecomplextaskswithrealisticutterancescanavoidover-tuningintermsofmodelingandfeaturedesign.Webelievethatadvance-mentsinSLUcanbeachievedbyhavingmorenaturallyspokendatasetsandemployingmorelinguisticallymotivatedfeatureswhilepre-servingrobustnessduetospeechrecognitionnoiseandvarianceduetonaturallanguage.IndexTerms—spokenlanguageunderstanding,ATIS,discrim-inativetraining1.INTRODUCTIONSpokenlanguageunderstanding(SLU)aimstoextractthemeaningofthespeechutterances.Whileunderstandinglanguageisstillcon-sideredanunsolvedproblem,inthelastdecade,avarietyofpracticalgoal-orientedconversationalunderstandingsystemshavebeenbuiltforlimiteddomains.Thesesystemsaimtoautomaticallyidentifytheintentoftheuserasexpressedinnaturallanguage,extractassociatedargumentsorslots,andtakeactionsaccordinglytosatisfytheuser’srequests.Insuchsystems,thespeaker’sutteranceistypicallyrecog-nizedusinganautomaticspeechrecognizer(ASR).ThentheintentofthespeakerisidentiﬁedfromtherecognizedwordsequenceusinganSLUcomponent.Finally,adialogortaskmanager(DM)inter-actswiththeuser(notnecessarilyinnaturallanguage)andhelpstheuserachievethetaskthatthesystemisdesignedtosupport.Intheearly90s,DARPA(DefenseAdvancedResearchProgramAgency)initiatedtheAirlineTravelInformationSystem(ATIS)pro-ject.TheATIStaskconsistedofspokenqueriesonﬂight-relatedin-formation.AnexampleutteranceisIwanttoﬂytoBostonfromNewYorknextweek.Understandingwasreducedtotheproblemofex-tractingtask-speciﬁcarguments,suchasDestinationandDepartureDate.Participatingsystemsemployedeitheradata-drivenstatisticalapproach[1,2]oraknowledge-basedapproach[3,4,5].Almostsimultaneouslywiththesemanticframeﬁlling-basedSLUapproaches,anewtaskemergedmotivatedbythesuccessoftheearlycommercialinteractivevoiceresponse(IVR)applicationsusedincallcenters.TheSLUwasframedasclassifyingusers’utter-ancesintopredeﬁnedcategories(calledasintentsorcall-types)[6].Thebiggestdifferencebetweenthecallclassiﬁcationsystemsandsemanticframeﬁllingsystemsisthattheformerdoesnotex-plicitlyseektodeterminetheargumentsprovidedbytheuser.Themaingoalisroutingthecalltoanappropriatecallcenterdepart-ment.Theargumentsprovidedbytheuserareimportantonlyinthesensethattheyhelpmaketherightclassiﬁcation.WhilethishasbeenatotallydifferentperspectiveforthetaskofSLU,itwasac-tuallycomplementarytotemplateﬁllinginthateachcall-typecanbeviewedasatemplatetobeﬁlled.Forexample,inthecaseoftheDARPAATISproject,whiletheprimaryintent(orgoal)wasFlight,usersalsoaskedaboutmanyotherthingssuchasGroundtransportationorAirplanespeciﬁcations.Theprogramalsodeﬁnedspecializedtemplatesfortheselessfrequentintents.Thisledtoaseamlessintegrationofintentdetermination(ID)andslotﬁlling(SF)basedSLUapproaches.Thisintegratedapproachactuallyyieldedimprovedend-to-endautomationratesascomparedtothepreviousdecoupledandsequentialapproaches.Forexample,Jeongetal[7]proposedtomodelthesetwosystemsjointlyusingatriangularchainconditionalrandomﬁeld(CRF).Inthispaper,ratherthanfocusonspeciﬁctechniquesorfeaturestoimproveIDandSFaccuracy,ourgoalistoassessthecontinuedutilityoftheATIScorpusgiventhetwodecadesofresearchithassupported.Inthenextsection,webrieﬂydescribetheATIScor-pusandthendiscusstheevaluationmetricsforIDandSF.InSec-tion4,wepresentthestate-of-the-artdiscriminativetrainingeffortsforbothIDandSFforthetaskofATIS.Finally,inSections5and6wepresentourdetailedanalysesontheerrorswehaveseenusingIDandSFmodels,respectively,withcomparableperformancetothosereportedintheliterature.Wewillshowthat,bycategorizingtheer-roneouscasesthatremainafterN-foldcrossvalidationexperiments,ATISisstillusefulandsuggestsfutureresearchdirectionsinSLU.2.AIRLINETRAVELINFORMATION(ATIS)CORPUSAnimportantby-productoftheDARPAATISprojectwastheATIScorpus.ThiscorpusisthemostcommonlyuseddatasetforSLUresearch[8].Thecorpushasseventeendifferentintents,suchasFlightorAircraftcapacity.Thepriordistributionis,however,heav-ilyskewed,andthemostfrequentintent,Flightrepresentsabout70%ofthetrafﬁc.Table1showsthefrequencyoftheintentsinthiscorpusfortrainingandtestsets.Inthispaper,weusetheATIScorpusasusedinHeandYoung[9]andRaymondandRiccardi[10].Thetrainingsetcontains4,978ut- 20

IntentTrainingSetTestSetAbbreviation2.4%3.6%Aircraft1.6%0.9%Airfare9.0%5.8%Airline3.4%4.3%Airport0.5%2.0%Capacity0.4%2.4%City0.3%0.6%DayName0.1%0.1%Distance0.4%1.1%Flight73.1%71.6%FlightNo0.3%1.0%FlightTime1.2%0.1%GroundFare0.4%0.8%GroundService5.5%4.0%Meal0.1%0.6%Quantity1.1%0.9%Restriction0.3%0.1%Table1.Thefrequencyofintentsforthetrainingandtestsets.UtteranceHowmuchisthecheapestﬂightfromBostontoNewYorktomorrowmorning?Goal:AirfareCostRelativecheapestDepartCityBostonArrivalCityNewYorkDepartDate.RelativetomorrowDepartTime.PeriodmorningTable2.AnexampleutterancefromtheATISdataset.terancesselectedfromtheClassA(contextindependent)trainingdataintheATIS-2andATIS-3corpora,whilethetestsetcontains893utterancesfromtheATIS-3Nov93andDec94datasets.Eachutterancehasitsnamedentitiesmarkedviatablelookup,includ-ingdomainspeciﬁcentitiessuchascity,airline,airportnames,anddates.TheATISutterancesarerepresentedusingsemanticframes,whereeachsentencehasagoalorgoals(a.k.a.intent)andslotsﬁlledwithphrases.Thevaluesoftheslotsarenotnormalizedorinterpreted.AnexampleutterancewithannotationsisshowninTable2.3.EVALUATIONMETRICSThemostcommonlyusedmetricsforIDandSFareclass(orslot)errorrate(ER)andF-Measure.ThesimplermetricERforIDcanbecomputedas:ERID=#misclassiﬁedutterances#utterancesNotethatoneutterancecanhavemorethanoneintent.AtypicalexampleisCanyoutellmemybalance?Ineedtomakeatransfer.Inmostcases,wherethesecondintentisgeneric(agreeting,smalltalkwiththehumanagent)orvague,itisignored.Ifnoneofthetrueclassesisselected,itiscountedasamisclassiﬁcation.ForSF,theerrorratecanbecomputedintwoways:ThemorecommonmetricistheF-measureusingtheslotsasunits.Thismetricissimilartowhatisbeingusedforothersequenceclassiﬁcationtasksinthenaturallanguageprocessingcommunity,suchasparsingandnamedentityextraction.Inthistechnique,usuallytheIOBschemaisadopted,whereeachofthewordsaretaggedwiththeirpositionintheslot:beginning(B),in(I)orother(O).Then,recallandprecisionvaluesarecomputedforeachoftheslots.Aslotisconsideredtobecorrectifitsrangeandtypearecorrect.TheF-Measureisdeﬁnedastheharmonicmeanofrecallandprecision:F−Measure=2×Recall×PrecisionRecall+PrecisionwhereRecall=#correctslotsfound#trueslotsPrecision=#correctslotsfound#foundslots.4.BACKGROUNDONUSINGDISCRIMINATIVECLASSIFIERSFORSLUWithadvancesinmachinelearningoverthelastdecade,especiallyindiscriminativeclassiﬁcationtechniques,researchershaveframedtheIDproblemasasampleclassiﬁcationtaskandSFasasequenceclassiﬁcationtask.Typically,wordn-gramsareusedasfeaturesaf-terpreprocessingwithgenericentities,suchasdates,locations,orphonenumbers.Becauseoftheverylargedimensionoftheinputspace,largemarginclassiﬁerssuchasSVMs[11]orAdaboost[12]werefoundtobeverygoodcandidatesforIDandCRFs[13]forSF.Totakeintoaccountcontext,therecenttrendistomatchn-grams(asubstringofnwords)ratherthanwords.Asdiscovered,datadrivenapproachesareverywell-suitedforprocessingspontaneousspokenutterances.Thedatadrivenapproachesaretypicallymorerobusttosentencesthatarenotwell-formedgram-matically,whichoccursfrequentlyinspontaneousspeech.Eveninbroadcastconversationswhereparticipantsareverywelltrainedandprepared,alargepercentageoftheutteranceshavedisﬂuencies:rep-etitions,falsestarts,andﬁllerwords(e.g.,uh)[14].Furthermore,speechrecognitionintroducessigniﬁcant“noise”totheSLUcom-ponentcausedbybackgroundnoise,mismatcheddomains,incor-rectrecognitionofpropernames(suchascityorpersonnames),andreducedaccuracyduetosub-realtimeprocessingrequirements.Atypicalcallroutingsystemoperatesataround20%-30%worderrorrate;oneoutofeverythreetoﬁvewordsiswrong[15].GiventhattheresearchersinthisstudyalsodeterminedthatonethirdoftheIDerrorsareduetospeechrecognitionnoise,robustmethodsforspon-taneousspeechrecognitionarecriticallyimportantforsuccessfulIDandSFinSLUsystems.Tothisend,researchershaveproposedmanymethodsrangingfromN-bestrescoring,exploitingwordcon-fusionnetworks,andleveragingdialogcontextaspriorknowledge(e.g.,[15]).4.1.IntentDeterminationForID,earlyworkwithdiscriminativeclassiﬁcationalgorithmswascompletedontheAT&THMIHYsystem[6]usingtheBoostextertool,animplementationoftheAdaBoost.MHmulticlassmultilabelclassiﬁcationalgorithm[12].Hakkani-T¨uretal.extendedthisworkbyusingalatticeofsyntacticandsemanticfeatures[16].Discrimi-nativecallclassiﬁcationsystemsemployinglargemarginclassiﬁers(e.g.,supportvectormachines)includeworkbyHaffneretal.[17],whoproposedaglobaloptimizationprocessbasedonanoptimal 21

Correct-Estimatedabbdefghijklmnopqa.Abbreviation302b.Aircraft63c.Airfare641d.Airline372e.Airport1521f.Capacity15132g.City32h.DayName2i.Distance91j.Flight111623k.FlightNo26l.FlightTime1m.GroundFare133n.GroundService36o.Meal5p.Quantity8q.Restriction1Table3.Theconfusionmatrixforintentdetermination.channelcommunicationmodelthatallowedacombinationofhet-erogeneousbinaryclassiﬁers.Thisapproachdecreasedthecall-typeclassiﬁcationerrorrateforAT&T’sHMIHYnaturaldialogsystemsigniﬁcantly,especiallythefalserejectionrates.OtherworkbyKuoandLee[18]atBellLabsproposedtheuseofdiscriminativetrainingontheroutingmatrix,signiﬁcantlyim-provingtheirvector-basedcallroutingsystem[19]forlowrejectionrates.Theirapproachisbasedonusingtheminimumclassiﬁcationerror(MCE)criterion.LatertheyextendedthisapproachtoincludeBoostingandautomaticrelevancefeedback(ARF)[20].Cox[21]proposedtheuseofgeneralizedprobabilisticdescent(GPD),correc-tivetraining(CT),andlineardiscriminantanalysis(LDA).Finally,Chelbaetal.proposedusingMaximumEntropymodelsforID,andcomparedtheperformancewithaNaiveBayesapproachwiththeATIScorpus.ThediscriminativemethodresultedinhalftheclassiﬁcationerrorratecomparedtoNaiveBayesonthishighlyskeweddataset.Theyhavereportedabout4.8%topclasserrorrateusingaslightlydifferenttrainingandtestcorporathantheoneusedinthispaper.4.2.SlotFillingForSF,theATIScorpushasbeenextensivelystudiedfromtheearlydaysoftheDARPAATISproject.However,theuseofdiscrimina-tiveclassiﬁcationalgorithmsismorerecent.Somenotablestudiesincludethefollowing:WangandAcero[22]comparedtheuseofCRF,perceptron,largemargin,andMCEusingstochasticgradientdescent(SGD)forSFintheATISdomain.Theyobtainedsigniﬁcantlyreducedsloter-rorrates,withbestperformanceachievedbyCRF(thoughitwastheslowesttotrain).AlmostsimultaneouslyJeongandLee[7]proposedtheuseofCRF,extendedbynon-localfeatures,whichareimportanttodisam-biguatethetypeoftheslot.Forexample,adaycanbethearrivalday,departureday,orthereturnday.Ifthecontextualcuesdisam-biguatingthemarebeyondtheimmediatecontext,itisnoteasyfortheclassiﬁertochoosethecorrectclass.Usingnon-localtriggerfeaturesautomaticallyextractedfromthetrainingdataisshowntoimprovetheperformancesigniﬁcantly.Finally,RaymondandRiccardi[10]comparedSVMandCRFwithgenerativemodelsfortheATIStask.Theyconcludedthatdis-criminativemethodsperformsigniﬁcantlybetter,andfurthermore,itispossibletoincorporatea-prioriinformationorlongdistancefea-tureseasily.Forexampletheyaddedfeaturessuchas“Doesthisutterancehavetheverbarrive”.Thisresultedinabout10%rela-tivereductioninsloterrorrate.Thedesignofsuchfeaturesusuallyrequiresdomainknowledge.5.ANALYSISOFINTENTDETERMINATIONINATISInthissection,ourgoalistoanalyzetheerrorsofastate-of-the-artIDsystemfortheATISdomain,clustertheerrors,andthencate-gorizetheerrortypes.Thesecategoriesoferrortypeswillsuggestpotentialareasofresearchthatcouldyieldimprovedaccuracy.Allexperimentsandanalysesareperformedusingmanualtranscriptionsofthetrainingandtestsetstoisolatethestudyfromnoiseintroducedbythespeechrecognizer.5.1.DiscriminativeTrainingandExperimentsForthefollowingexperiments,weusedtheATIScorpusasdescribedpreviouslyinSection2.Sincethesuperiorperformanceofthedis-criminativetrainingalgorithmshasbeenshownbytheearlierwork,wehaveemployedtheAdaBoost.MHalgorithminthisstudy.Weusedonlywordn-gramsasfeatures.WehavenotoptimizedBoost-ingparametersonatuningsetnorlearnedweakclassiﬁers.Thedataisnormalizedtolowercase,butnostemmingorstopwordremovalhasbeenperformed.TheATIStestsetwasclassiﬁedaccordingtotheclassesdeﬁnedinTable1.TheIDerrorrateweobtainedwas4.5%,whichiscom-parableto(andactuallylowerthan)towhathasbeenreportedintheliterature.5.2.AnalysisofIntentDeterminationErrorsNext,wecheckedtheIDerrorswiththreetrainingandtestset-ups:1.AllTrain:usesallATIStrainingdatatotrainthemodel,anderrorsarecomputedontheATIStestset.Intotal,thismodelerroneouslyclassiﬁedonly40utterances(anerrorrateof4.5%).Theintentconfusionmatrixfortheseerrorsispro-videdinTable3. 22

2.25%Train:uses25%ofthetrainingexamplesintheATIStrainingset,anderrorsarecomputedontheATIStestset.Intotal,thismodelerroneouslyclassiﬁed65utterances(anerrorrateof7.3%).3.N-fold:usesallexamplesforbothtestingandtrainingin10-foldcrossvalidationexperiments.Intotal,thismodelerro-neouslyclassiﬁed162utterances(anerrorrateof3.0%).AsseeninTable3,theproblemismostlythenon-Flightutter-anceserroneouslyclassiﬁedasFlight.Whileonecauseoftheseer-rorsistheunbalancedintentdistribution,wehavemanuallycheckedeacherrorandclusteredtheminto6categories:1.Prepositionalphrasesembeddedinnounphrases:Theseer-rorsinvolvephrasessuchasCapacityoftheﬂightfromBostontoOrlando,wheretheprepositionalphrasesuggestsﬂightin-formation,whereasthedestinationcategoryismainlydeter-minedbytheheadwordofthenounphrase(capacityinthiscase).Sinceclassiﬁerhasnosyntacticfeatures,suchsen-tencesareusuallyclassiﬁederroneously.Usingfeaturesfromasyntacticparsercanalleviatethisproblem.2.Wrongfunctionalargumentsofutterances:Thiscategoryissimilartotheﬁrstcategorybutthedifferenceisthat,insteadofaprepositionalphrase,theconfusedphraseisasemanticargumentoftheutterance.ConsidertheexampleutteranceWhatdayoftheweekdoestheﬂightfromBostontoOrlandoﬂy?Theseareerrorsthatcanbesolvedbyusingeitherasyn-tacticparserthatidentiﬁesfunctionsofphrasesorasemanticrolelabeler.3.Annotationerrors:Theseareutterancesthatwereassignedthewrongcategoryduringmanualannotation.4.Utteranceswithmultiplesentences:Theseareutteranceswithmorethanonesentence.Insuchcases,theintentisusuallyinthelastsentence,whereastheclassiﬁcationoutputisbiasedbytheothersentence.5.Other:Theseincludeseveralinfrequenterrortypessuchasambiguousutterances,ill-formulatedqueries,andpreprocess-ing/tokenizationissues:•Ambiguousutterances:Theseerrorsinvolveutteranceswherethedestinationcategoryisnotclearintheutter-ance.AnexamplefromtheATIStestsetislistLosAngeles.Inthisutterance,thespeakerintentcouldei-therbetoﬁndcitiesthathaveﬂightsfromLosAngelesorﬂightstoLosAngeles.•Ill-formulatedqueries:Theseareutteranceswhichin-cludeaphrasethatmaymisleadtheclassiﬁcationorunderstanding.AnexamplefromtheATIStestsetis:What’stheairfareforataxitotheDenverairport?Inthiscase,thewordairfareimpliesadestinationcate-goryofAirfare,whereaswhatismeantisGroundtrans-portationfare.Thesetypeoferrorsareeasierforhu-manstohandle,butitisnotpresentlyclearhowtheycanberesolvedinautomaticprocessing.•Preprocessing/Tokenizationissues:Theseareerrorsthatcouldberesolvedbyusingadomainontologyorspe-cialpre-processingortokenizationrelatedtothedo-main.Somedomainspeciﬁcabbreviationsandrestric-tioncodesareexamplesofthiscategory.ErrorTypeAllTrain25%Train10-Fold142.5%33.8%24.5%222.5%13.8%30.0%32.5%6.1%18.4%40%0%8.0%517.5%12.5%7.2%615.0%33.8%11.7%Table4.ThedistributionoferrorcategoriesforIDusingalland25%ofthetrainingdata,andusingallthetrainingandthetestsetwith10-foldcrossvalidation.Fig.1.Learningcurveforintentdeterminationusingthetrainingdatawiththeoriginalorderandaverageof8shufﬂedorders.6.DifﬁcultCases:Theseareutterancesthatincludewordsorphrasesthatwerepreviouslyunseeninthetrainingdata.FortheexampleutteranceAresnackservedonTowerAir?,noneofthecontentwordsandphrasesappearwiththeMealcate-goryinthetrainingdata.Table4presentsthefrequencyofeachoftheseerrorsforthethreeexperiments.Asseen,categories1and2constituteamajorityoftheerrors.Bothofthesecategoriescanberesolvedusingasyntac-ticparserwithfunctiontags.However,notethattheATIScorpusishighlyartiﬁcialandutterancesaremostlygrammaticalandwithoutdisﬂuencies.Furthermore,whenworkingwithASR,utterancesmayincluderecognitionerrors.Inamorerealisticscenario,onemightconsidershallowparsingorsyntacticandsemanticgraphs[16]forextractingricherandlinguistically-motivatedfeaturesthatcouldre-solvesuchcases.Figure1showstheerrorrateontheATIStestsetwhenvaryingtrainingsetsizesareused.Whenmanuallyexaminingthetestset,wefoundclustersofsimilarutterancesoccurringoneaftertheother(probablyareutteredbythesameuser).Toeliminatethebiasfromthedatacollectionorder,wealsoestimatedtheerrorwitharandomorderingofthetrainingset,andaveragedtheerrorratesover8suchexperiments.Ascanbeseenfromthisplot,theerrorratekeepsshrinkingasmoredataisadded,suggestingthatmoretrainingdatawouldbebeneﬁcial. 23

6.ANALYSISOFSLOTFILLINGINATISInthissection,ourgoalissimilartotheIDanalysis:analyzethere-sultsofastate-of-the-artSFsystemfortheATISdomainandclustertheerrorsintocategories.6.1.DiscriminativeTrainingandExperimentsFollowingmethodsdescribedintheliterature,weemployedlinearchainCRFstomodeltheslotsintheATISDomain.Weusedonlywordn-gramfeaturesanddidnotuseadevelopmentsettotunepa-rameters.TheATIStestsetwasthenclassiﬁedusingthetrainedmodel.WeconvertedthedatasetsintotheIOBformatsothatwehaveonlyonewordpersampletoclassify.UsingtheCoNLLeval-uationscript1,theSFF-Measureweobtainedwas93.2%withtheIOBrepresentation2,whichiscomparabletowhathasbeenreportedintheliterature.6.2.AnalysisofSlotFillingErrorsAnalyzingtheSFdecisions,themodelfound2,614of2,837slotswiththecorrecttypeandspanfortheinputoutof9,164words.Wemanuallycheckedeachofthe223erroneouscasesandclusteredtheminto8categories:1.Longdistancedependencies:Theseareslotswherethedis-ambiguatingtokensareoutofthecurrentn-gramcontext.Forexample,intheutteranceFindﬂightstoNewYorkarrivinginnolaterthannextSaturday,a6-gramcontextisrequiredtoresolvethatSaturdayisthearrivaldate.Thiscategorywaspreviouslyaddressedintheliterature.Forexample,RaymondandRiccardi[10]extractedfeaturesusingmanually-designedpatternsandJeongandLee[7]usedtriggerpatternstocoverthesecases.2.Partiallycorrectslotvalueannotations:Theseareslotsas-signedacategorythatispartiallycorrect;eitherthecategoryorthesub-categorymatchesthemanualannotation.Forex-ample,thewordtomorrowcaneitherbeaDepartDate.RelativeorArriveDate.RelativefortheutteranceﬂightsarrivinginBostontomorrow.Notethatthesecanoverlapwithotherer-rortypes.3.Previouslyunseensequences:Whilethiscategoryrequiresfurtheranalysis,themostcommonreasonisthemismatchbetweenthetrainingandtestsets.Forexample,mealrelatedslotsaremissedbythemodel(8.0%ofallerrors)becausetherearenosimilarcasesinthetrainingset.Thisisalsothecasefortheaircraftmodels(10.0%),andtravelingtostatesinsteadofcities(3.3%),etc.4.Annotationerrors:Thesearetheslotsthatwereassignedthewrongcategoryduringmanualannotation.5.Other:Theseincludeseveralinfrequenterrortypessuchasambiguousutterances,ill-formulatedqueries,andpreprocess-ing/tokenizationissues:•Ill-formulatedqueries:Theseerrorsusuallyinvolveanungrammaticalphrasethatmaymisleadtheinterpreta-tionoftheslotvalueorthereisinsufﬁcientcontexttodisambiguatethevalueoftheslot.Forexample,intheutteranceFindaﬂightfromMemphistoTacomadinner,1http://www.cnts.ua.ac.be/conll2000/chunking/output.html2Itis94.7%usingtherepresentationusedby[10],whoreported95.0%ErrorTypePercentage126.9%242.4%357.6%48.4%56.7%Table5.ThedistributionofthetypesoferrorsintheATIStestset.Notethatthesedonotsumto100%assomeerrorsincludemultipletypes.itisnotcleariftheworddinnerreferstothedescriptionoftheﬂightmeal.•Ambiguousutterances:Theseareutteranceswheretheslotcategoryisnotexplicitgiventheutterance.Forexample,intheutteranceIwouldliketohavetheair-linethatﬂiesToronto,DetroitandOrlando,itisnotclearifthespeakerissearchingforairlinesthathaveﬂightsfromTorontotoDetroitandOrlandoorfromsomeotherlocationtoToronto,DetroitandOrlando.•Preprocessing/Tokenizationissues:Theseareerrorsthatcouldberesolvedusingadomainontologyorspecialpre-processingortokenizationrelatedtothedomain.Forexample,intheutteranceWhatairlineisAS,itwouldbehelpfultoknowASisadomainspeciﬁcab-breviation.•Ambiguouspart-of-speechtag-relatederrors:Theseareerrorsthatcouldberesolvedifthepart-of-speechtagswereresolved.Forexample,thewordarrivingcanbeaverboranadjective,asintheutteranceIwanttoﬁndtheearliestarrivingﬂighttoBoston.Inthiscase,theslotcategoryforthewordsearliestarrivingisFlight-Mod,butsincethewordarrivingisveryfrequentlyseenasaverbinthiscorpus,itisassignednoslotcategory.Table5liststhefrequencyofeachoftheseerrors.Categories1,2,and3constitutevastmajorityoftheerrors.Eachofthesecate-goriescanbeattackedusingadifferentstrategy.Category1utter-ancesaretheeasiesttoresolveusingricherfeaturesetsduringdis-criminativetraining.Usinga-prioriinformationmayalsohelpwhenavailable.Also,discoveringlinguisticallymotivatedlongdistancepatternsisapromisingresearchwork.Category2utteranceshappenmainlyduetothenuancebetweenthearriveanddepartconcepts(23.1%ofallerrors),whichareveryhardtodistinguishinsomecasesasintheexampleabove.Category3utterancessimplyrequireabettertrainingsetorhumaninterventionofmanualpatternsastheyareunderrepresentedormissinginthetrainingdata.7.DISCUSSIONANDCONCLUSIONSLeveragingrecentimprovementsinmachinelearningandspokenlanguageprocessing,theperformanceoftheSLUsystemsfortheATISdomainhasimproveddramatically.Around5%errorratefortheSLUtaskimpliesasolvedproblem.Itisclear,however,thattheproblemofSLUisfarfrombeingsolved,especiallyformorereal-istic,naturally-spokenutterancesofavarietyofspeakersfromtasksmorecomplexthansimpleﬂightinformationrequests.Newdatasetsfromsuchtaskscanavoidover-tuningtooneparticulardatasetintermsofmodelingandfeaturedesign. 24

TherecentFrenchMediacorpus[23]offersasteptowardsthisgoal:ithasthreetimesmoredataandgreaterthana10%concepterrorrateforSF.However,thedatawasnotcollectedfromanop-erationalsystem.Instead,datawascollectedusingawizardofOzsetupwithselectedvolunteers.AnothereffortistheLet’sGodi-alogsystemusedbyrealusersofthePittsburghbustransportationsystem[24].However,SLUannotationsarenotyetavailable.Evenwithsuchlowerrorrates,theATIStestsetincludesmanyexamplecategoriesandsequencesunseeninthetrainingdata,andtheerrorrateshavenotconvergedyet.Inthatrespect,moredatafromjusttheATISdomainmaybeusefulforSLUresearch.TheerroranalysisontheATISdomainshowstheprimaryweak-nessesofthecurrentn-gram-basedmodelingapproaches:Thelocalcontextoverridestheglobal,themodelhasnodomainknowledgetomakeanyinferences,andittriestoﬁtanyutteranceintosomeknownsample,hencenotreallyrobusttoanyout-of-domainutter-ances.ThiswasalsoobservedbyRaymondandRiccardi[10],wheretheCRFmodelﬁts100%tothetrainingdata.Onepossibleresearchdirectionconsistsofemployinglongerdistancesyntacticallyorse-manticallymotivatedfeatures,whilepreservingtherobustnessofthesystemtothenoiseintroducedbythespeechrecognizerandvarianceduetonaturallanguage.AlesserstudiedsetoftheATIScorpus,ClassDutterances,whicharecontextualqueries,isanothersigniﬁcantportionofthiscorpus,waitingtobeunderstood.Whilemostpeopletreatedun-derstandingincontextwithhandcraftedrules(e.g.,[4]),tothebestofourknowledge,theonlystudytowardsbuildingastatisticaldis-coursemodelhasbeenproposedbyMilleretal.[25].8.ACKNOWLEDGMENTSWewouldliketothankChristianRaymondandGiuseppeRiccardiforsharingtheATISdata,revisedforannotationinconsistenciesandmistakes.9.REFERENCES[1]R.Pieraccini,E.Tzoukermann,Z.Gorelov,J.-L.Gauvain,E.Levin,C.-H.Lee,andJ.G.Wilpon,“Aspeechunderstand-ingsystembasedonstatisticalrepresentationofsemantics,”inProceedingsoftheICASSP,SanFrancisco,CA,March1992.[2]S.Miller,R.Bobrow,R.Ingria,andR.Schwartz,“Hiddenunderstandingmodelsofnaturallanguage,”inProceedingsoftheACL,LasCruces,NM,June1994.[3]W.WardandS.Issar,“RecentimprovementsintheCMUspo-kenlanguageunderstandingsystem,”inProceedingsoftheARPAHLTWorkshop,March1994,pp.213–216.[4]S.Seneff,“TINA:Anaturallanguagesystemforspokenlan-guageapplications,”ComputationalLinguistics,vol.18,no.1,pp.61–86,1992.[5]J.Dowding,J.M.Gawron,D.Appelt,J.Bear,L.Cherny,R.Moore,andD.Moran,“Gemini:Anaturallanguagesys-temforspokenlanguageunderstanding,”inProceedingsoftheARPAWorkshoponHumanLanguageTechnology,Princeton,NJ,March1993.[6]A.L.Gorin,G.Riccardi,andJ.H.Wright,“HowMayIHelpYou?,”SpeechCommunication,vol.23,pp.113–127,1997.[7]M.JeongandG.G.Lee,“Exploitingnon-localfeaturesforspokenlanguageunderstanding,”inProceedingsoftheACL/COLING,Sydney,Australia,July2006.[8]P.J.Price,“Evaluationofspokenlanguagesystems:TheATISdomain,”inProceedingsoftheDARPAWorkshoponSpeechandNaturalLanguage,HiddenValley,PA,June1990.[9]Y.HeandS.Young,“Adata-drivenspokenlanguageunder-standingsystem,”inProceedingsoftheIEEEASRUWorkshop,U.S.VirginIslands,December2003,pp.583–588.[10]C.RaymondandG.Riccardi,“Generativeanddiscriminativealgorithmsforspokenlanguageunderstanding,”inProceed-ingsoftheInterspeech,Antwerp,Belgium,2007.[11]V.N.Vapnik,StatisticalLearningTheory,JohnWileyandSons,NewYork,NY,1998.[12]R.E.SchapireandY.Singer,“Boostexter:Aboosting-basedsystemfortextcategorization,”MachineLearning,vol.39,no.2/3,pp.135–168,2000.[13]J.Lafferty,A.McCallum,andF.Pereira,“Conditionalran-domﬁelds:Probabilisticmodelsforsegmentingandlabelingsequencedata,”inProceedingsoftheICML,Williamstown,MA,2001.[14]A.StolckeandE.Shriberg,“Statisticallanguagemodelingforspeechdisﬂuencies,”inProceedingsoftheICASSP,Atlanta,GA,May1996.[15]N.Gupta,G.Tur,D.Hakkani-T¨ur,S.Bangalore,G.Riccardi,andM.Rahim,“TheAT&Tspokenlanguageunderstandingsystem,”IEEETransactionsonAudio,Speech,andLanguageProcessing,vol.14,no.1,pp.213–222,2006.[16]D.Hakkani-T¨ur,G.Tur,andA.Chotimongkol,“Usingsyntac-ticandsemanticgraphsforcallclassiﬁcation,”inProceed-ingsoftheACLWorkshoponFeatureEngineeringforMa-chineLearninginNaturalLanguageProcessing,AnnArbor,MI,June2005.[17]P.Haffner,G.Tur,andJ.Wright,“OptimizingSVMsforcom-plexcallclassiﬁcation,”inProceedingsoftheICASSP,HongKong,April2003.[18]H.-K.J.KuoandC.-H.Lee,“Discriminativetraininginnatu-rallanguagecall-routing,”inProceedingsofICSLP,Beijing,China,2000.[19]J.Chu-CarrollandB.Carpenter,“Vector-basednaturallan-guagecallrouting,”ComputationalLinguistics,vol.25,no.3,pp.361–388,1999.[20]I.Zitouni,H.-K.J.Kuo,andC.-H.Lee,“Boostingandcombin-cationofclassiﬁersfornaturallanguagecallroutingsystems,”SpeechCommunication,vol.41,no.4,pp.647–661,2003.[21]StephenCox,“Discriminativetechniquesincallrouting,”inProceedingsoftheICASSP,HongKong,April2003.[22]Y.-Y.WangandA.Acero,“Discriminativemodelsforspokenlanguageunderstanding,”inProceedingsoftheICSLP,Pitts-burgh,PA,September2006.[23]H.Bonneau-Maynard,S.Rosset,C.Ayache,A.Kuhn,andD.Mostefa,“SemanticannotationoftheFrenchMEDIAdia-logcorpus,”inProceedingsoftheInterspeech,Lisbon,Portu-gal,September2005.[24]A.Raux,B.Langner,D.Bohus,A.Black,andM.Eskenazi,“Let’sgopublic!takingaspokendialogsystemtotherealworld,”inProceedingsoftheInterspeech,Lisbon,Portugal,September2005.[25]S.Miller,D.Stallard,R.Bobrow,andR.Schwartz,“Afullystatisticalapproachtonaturallanguageinterfaces,”inProceed-ingsoftheACL,Morristown,NJ,1996.

WHAT IS LEFT TO BE UNDERSTOOD IN ATIS?

Transcription

Related Posts

e.learning) dans la formation professionnelle des salariés

Non correcte CMYK RVB – Formation Emitech

associations agrées formations secours

LICENCE EN NUTRITION ET DIETETIQUE

20 de formation 21 - ne

CETTE FORMATION A POUR VOCATION L’APPRENTISSAGE DE …

Leave a Reply Cancel reply

Latest documents

Recent Comments

Archives

Categories

Docs Wikilivre

WHAT IS LEFT TO BE UNDERSTOOD IN ATIS?

Transcription

Related Posts

e.learning) dans la formation professionnelle des salariés

Non correcte CMYK RVB – Formation Emitech

associations agrées formations secours

LICENCE EN NUTRITION ET DIETETIQUE

20 de formation 21 - ne

CETTE FORMATION A POUR VOCATION L’APPRENTISSAGE DE …

Leave a Reply Cancel reply

Trending Categories

Latest documents

Recent Comments

Archives

Categories

Docs Wikilivre