The performance of genetic algorithms on Walsh polynomials: some anomalous results and their explanation.

of 8

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
PDF
8 pages
0 downs
133 views
Share
Description
Abstract In this paper we discuss a number of seemingly anomalous results reported by Tanese concerning the performance of the genetic al-gorithm (GA) on a snbclass of Walsh polynomials. Tanese found that the GA optimized these functions poorly, that
Tags
Transcript
  The Performance of GeneticAlgorithms on WalshPolynomials: SomeAnomalousResults and Their Explanation Stephanie Forrest Dept.ofComputerScienceUniversity of NewMexicoAlbuquerque,N.M.87131-1386 Abstract Inthispaperwediscussanumber of seeminglyanomalousresultsreported by Taneseconcerningtheperformance of thegenetical gorithm(GA)onasnbclassofWalshpolynomials.Tanesefound that theGAoptimizedthesefunctionspoorly, that apartitioning of asinglelargepopulationintoanumberofsmallerindependentpopulationsseemed to improveperformance, and that hillclimbingoutperformed both theoriginal and partitionedforms ofthe GAonoptimizingthese functions.Wereexaminetheseresultsexper imentallyandtheoretically,andpropose and evaluatesomeexplanations.Inaddition)weexaminethequestion of whatarereasonable andappropriatewaystomeasuretheperformance of geneticalgorithms. 1 Introduction Tanese's1989doctoraldissertationreportsexperiments that apparentlycontradictsomecommonlyheldbeliefs about geneticalgorithms(GAs)asfunctionoptimizers [13, 12J. Specifically,Tanese'sresultsshow that foracertainclassofWalshpolynomials,GAsarepooroptimizers,performworse than hillclimbing,andoftenperformbestwhenthe total populationissplitupintoverysmallsubpopulations.Theseresultscall intoquestionseveralexpectationsaboutGAs-that theyaregood at optimizingWalshpolynomials [1], that theywillroutinelyoutperformhillclimbing and othergradientdescentmethodson hard problemssuchasthosewithnonlinearinteractions [9], and that pop ulationsmust be of asufficientsizetosupporteffective schemaprocessing [4]. Tanese'sresultsareprovocativeandhavecausedsomeresearchers to question the effectiveness of GAsin functionoptimization.Thus,theydeserveacarefulexplanation.This paper reviewstheseapparentlyneg- MelanieMitchell ArtificialIntelligenceLaboratoryUniversityofMichigan Ann Arbor,MI48109-2110 ativeresults,explainssome of the anomalies,anddiscusses the generalquestion of whatcriteriaareappro priatetouseinassessingtheperformance of GAs. 2 ExperimentalSetup The experiments we reportinthispaperwereperformedwithasimilarGAandidenticalparametervaluestothoseusedbyTanese [2, 13]. AllofTanese'sexperimentsusedstrings of length32 and populationsof 256 individuals. The populationwassometimessubdividedintoanumber of smallersubpopulations.Tanese'salgorithmuseda sigmascaling method,inwhich the numberofexpectedoffspringallocated to eachindividualisafunction of theindividual'sfitness, the meanfitness of thepopulation,andthestandarddeviationfromthemean.Anindividualwithfitnessonestandarddeviationabove the populationmeanwasallocatedoneandahalfexpectedoffspring. Multipointcrossoverwasused J with acrossoverrate of 0.022per bit (e.g.,for32-bitstrings,therewere on average0.7crossoversperpair of parentspergener ation). The crossoverrate(perpair)wasinterpretedasthemeanofaPoissondistributionfromwhichthe actualnumber of crosseswasdeterminedforeachin dividual. The mutationprobabilitywas0.005perbit. Withthe exceptionsofsigmascaling and multipoint crossover,Tanese's GA wasconventional.Taneseran eachofherexperimentsfor500generations,whereas we ,raneachofoursfor 200 generations,although,aswillbediscussedlater,thiswasnotanimportant dif ference.Forsomeexperimentswealteredcertainpa rametervalues;theseexceptionsarenotedexplicitly. 3 Walsh Polynomials Inherdissertation,Tanesedescribesexperimentsin volvingseveralfitnessfunctions.Inthispaper we consideronly the resultswithrespect to Walshpolynomials.Walshpolynomials and theirrelevanceto  AWalshpolynomialhasthefollowingform: 21 -1 f(x) = L: Wj,pj(x) j=O GAshavebeendiscussed by anumber of researchers [1, 5,9, 13].AWalshpolynomial is definedasasumof Walshfunctions. Thereisacorrespondencebetweencertainpartionings of the GA'ssearchspace and Walshfunctions.Such partitioningsareinduced by schemas.Forexample,thepartitioningd**... * dividesthesearchspaceinto twohalves,corresponding to theschemas 1** ...*and 0**... *. Likewise,thepartitioningdd**... * represents adivision of thespaceinto fOUf quarters,each of which corresponds to aschemawiththeleftmosttwobitsdefined.Eachdifferentpartitioning of thesearchspacecan be indexedbyauniquebinaryinteger(bitstring) in which1'scorrespond to the partition'sdefinedbitsand O's correspondtothenon-definedbits.Forexample,underthisenumeration,thepartitiond**...*hasindex j=100 ... 0. The Walshfunctioncorrespondingtothe jth partitionisdefinedasfollows(where both j and x arebitstrings):where I is the lengthofthe bit string x, andeach Wj isareal-valuedcoefficient.Walshpolynomialspro videabasis for defininganyreal-valuedfunctionon bit strings.Fromthisgeneralclassoffunctions,specificsubsetswereselectedasfitnessfunctions, both for Tanese'sworkandfor the experimentsreportedhere.Tanesegeneratedeachfitnessfunction,hereaftercalled Tanesefunctions, byrandomlychoosing 32 partitionindices j (called the definedpartitions), all of thesameorder- that is,allcontaining the samenumber of 1 'So The coefficient Wj foreachofthe 32 chosenpartition indiceswasalsochosenatrandomfromtheinterval (0.0,5.0]. The fitnessfunctionconsistedof thesumof these 32 terms(allothercoefficientswereeffectively set to0). Oncethe 32 partitionindiceswerechosen,apoint x' waschosenrandomlytobetheglobaloptimum, and thesignofeachofthe 32 Wj'S wasadjustedso that thefitnessof x' would be L [Wj [. This method of constructingfunctions had severaladvantagesforcomparingtheperformanceofdifferentGAs:(1)itwaseasytoconstructrandomfunctions that onaveragewere of similardifficulty(althoughGoldberg [6] hasshown that itispossible to constructtwofunctionsofagivenorder that havedifferentlevels of difficultyfortheGA);(2)functionsofdifferentdegreesofdifficultycould be constructedbyvarying• The poorperformance of allforms of the GA onoptimizing both low-order and high-orderTanese functions.• The improvementinoptimizationperformance obtained bythe partitionedGAoverthetraditionalGA. • The superioroptimizationperformance of hillclimbingon the Tanesefunctionsascompared with both thetraditional and partitionedGA. 4.1 Why does the GA optimizeWalshpolynomialspoorly? 4 Discussion of Anomalies One of Tanese's most strikingresultswasthepoor performance of theGA(in both itstraditionalandpartitioned-populationform)whensearchingformax imum valuesofthefunctionsdescribedabove.In Tanese'sexperiments, the GA wasrun5timesoneach of the 64 randomlygeneratedfunctions;eachsetof5constitutesa trial. Oneofherprimarycriteriawasthe successrate: the numberoftrialsonwhichtheglobaloptimumwasfound at leastone outof fivetimes.On This sectiondiscussesandproposesexplanationsforthefollowinganomaliesinTanese's results: the order,sincelow-orderfunctions of thissortshouldonaveragebeeasierfortheGA to optimize than highorderfunctions [13]; and(3)theglobaloptimumwasknown,whichmade it possible to determinehowclose the GAcame to the maximum valueofthefunction. Taneseconductedexperiments on fitnessfunctions withdefinedpartitions at orders 4, 8,16, and 20 (eachfunction had allofitsdefinedpartitions at thesameorder).Foreachexperimentsherandomlygenerated 64functions of agivenorder I andcomparedtheperfor mance ofthe traditional (singlepopulation)GAwithanumberof partitioned GA's,inwhichthepopulation of 256individualswassubdividedintovariousnumbers of smallerpopulations.Inthispaper, we discussresultsonlyforfunctions at order 8, sincethesewere the functionsTaneseanalyzedinthegreatestdepth.Allofourexperimentsinvolvedmanipulationsofparameterson the traditionalGA; we didnotexperiment withpartitionedGA's.Foreach of ourexperiments, we ranthe GAonceoneachof 20 differentrandomly generatedfunctions,for200generationseach.Tanesecarriedeachrunoutto500generations,butineach of ourruns that usedstringsoflength 32, thepopulation had reachedamoreorlesssteady state byaboutgeneration100, and theresultswould not havechangedsignificantly ifthe run had beenextended. The shorter runsweresufficientfordeterminingthecomparative effectsofthevariousmanipulations we performedontheparameters.if x II j haseven parity otherwise. pj(X) ={ セ  the 64 trials(i.e.,320runstotal)onrandomlygeneratedorder4Walshpolynomials,thesuccessrateofthetraditionalGAwasonly3(themostsuccessfulpartitionedalgorithm'ssuccessratewasonly15).On320runsonrandomlygeneratedorder8Walshpolynomi als,neitherthetraditionalnorthevariouspartitioned GAseverfoundtheoptimum. The followingpossibleexplanationsfortheGA'spoorperformancewillbediscussedandevaluatedinthissection:(1) the averagedefining-lengthsofschemasareverylongand thus goodschemastendtobebrokenupbycrossover;(2)randomlygenerating 32 partitionindicesoverstrings of length 32 resultsinalargenumberofoverlaps,whichmeans that most of thesignificantlociarecorrelated,effectivelymakingthefunctionsverydifficult;(3)crossover is ineffectiveonthesefunctionsbecauseofthelackoflower-orderbuildingblocks;and(4)thefitnessfunctionsaredeceptive. 4.1.1 Is the GA'spoor performancedue to longdefining lengths of schemas? GivenaTanesefunction F, the fitnessofastring x under F dependson the parityof x /\ jforeachjinthetermsof F. Becauseofthisparitycalculation,achangeofasingle bit in x inany of thepositionsinwhichjhasa1willproduceanoppositevaluefor x /\j and thusreversethecontribution of that termtothe totalfitness.Thisimplies that ingeneral,foraTanesefunctionoforder n, noschema s of orderless than n willgivetheGAanyusefulinformation,sinceforanygiven j, half the instancesof s willhaveevenparitywithrespectto j, and half willhaveoddparity.Thispropertyisdue tothe paritypropertiesoftheWalshfunctions and tothefact that inaTanesefunction,allthetermsare ofthe sameorder.Thisproperty and its implicationfortheeffectiveness of crossoveronthese functionswillbediscussedinSection4.1.3. The expecteddefininglengthofaschemaoforder8ina32-bitstringis26,asubstantialproportionoftheentirestring [7]. SuchlongdefininglengthscouldmaketheTanesefunctions hard fortheGAbecauseofthehighprobabilityofcrossoverdisruption,Towhatdegreewasthisproblemresponsibleforthepoorperformanceof the GAonthesefunctions?Toanswerthisquestion, we ranthe traditionalGAwithTanese'sparameterson 20 randomlygenerated functions of order8whosepartitionindiceswerere stricted to haveamaximumdefininglengthof 10. That is,foreach ofthe 32 partitionindices,arandomlypositionedwindow of 10 contiguouslociwaschosen,andalleight l's wereplacedrandomlyinsidethiswindow. Usingthesuccess-ratecriteriondescribedabove,the performancewasidentical to Tanese'ssrcinalresults: thetraditionalGAneverfoundtheoptimum.Otherperformancemeasuresmade it clear that limitingthedefininglengthimproved the GA'sperformance to someextent:theGAwasable to findstringswithslightlyhigherfitnesses(intermsofpercentofoptimum)andslightlyhighermean-populationfitnesses.Theseresults,togetherwithresultsfromallexperi mentsdescribedinthispaper,aresummarizedinTa ble1undertheheading De/-Len: 10. Theyare to becomparedwiththevaluesunder Original, giving the resultsfromourreplication of Tanese'straditionalGArunsonorder-8functions.Weconclude that thecontributionoflongdefininglengths to the GA'soverall poorperformanceisnotsignificant.As will bediscussed,onereasonforthisisthatcrossover is notvery effectiveonthesefunctionsinthefirstplace. 4.1.2 Is the GA'spoor performance due to overlapamong significant loci in the partitionindices? Next, we consideredthepossibility that overlapsamongsignificantlociin the partition-indicesj(i.e., lociwith1 '8) werecausingcorrelationsamongtermsin thefitnessfunctions,making the optimizationproblemeffectivelyhigher-order.Asasimpleexample,suppose that 0011 and 0110aretwoorder-2 j's that havenon zeropositivecoefficients.Thereareeightstringsthat willcause "'OOH(X) tobepositive,correspondingtotheschemas ##00 and ##11. Likewise,thereareeightstrings that willcause "'OHO(X) tobepositive,corresponding to theschemas #00# and #11#. So,inordertofindapoint that getsapositivevaluefrom both "'OllO(X) and "'OOH(X), the GAmustdiscoverei therthe schema #000 ortheschema #111.The net effectofthisoverlapis that threebitsarecorrelatedinsteadoftwo,making the problemeffectivelyorder-3insteadoforder-2. In thecase ofthe Tanesefunctions,this is alikelysourceofdifficulty;forexample,withorder-8functions,where 32 order-8termswererandomlygenerated,each1inanygivenjwillonaveragebeamember of 7otherdifferent j's, andthustheeffectivelinkagewillbeexceedinglyhigh.Toassesstheeffectofoverlaps, we ran the GAonstringsoflength 128 rather than 32,inordertoreducethenumberofoverlaps. With astringlength of 128, eachdefinedlocuswithallele1wouldparticipateonaverageinonly2 ofthe 32 partitionindices. As before, we generated 20 randomfunctionsandrantheGAfor 200 generationsoneach.AsshowninTable1(under Str-Len: 128),theGA'sperformancewasremarkablyimproved. The GAfoundtheoptimum19/20times,comparedwith0/20for the 32-bitcase,andcameveryclose to finding the optimumon the otherrun. Of all theexperimentswetried,thiscausedthemost dra- maticimprovement,leadingustoconcludethatthe principlereasontheTanesefunctionsaredifficultisbecausetheshortstrings(32bits) and relativelyhighnumber of terms(32)causesnearlyall32bits to becor-  related,thuseffectivelycreatinganorder32-problem.Inthenon-overlappingcase, it ispossiblefortheGAtooptimizeeach term ofthefunctionalmostindependently. The fact that overlapismuchhigherwith32-bitfunctions than with128-bitfunctionsexplainsthestrikinglydifferentdynamicsbetweenrunsof the GAontheformerand latter functions.Spaceconstraintspreventusfromreportingthese data indetail(see [3]), but we describe them qualitatively.AWalshpolynomialcanbethought of asa"constraint satisfaction"probleminwhicheachtermproducesan additionalconstraint. The goalis to findanindivid ualthatreceivespositivevaluesfromasmanyterms aspossible, and preferablyfromtermswithhighcoefficients.Inatypical run on32-bitstrings,theGAveryquicklyfindsitshighest-fitindividual,byaround generation 30. Thisindividualreceivespositivevalues fromonlyasubsetofthetermsinthefitnessfunc tion,leavingtherest of the terms"unsatisfied,"that is,yieldingnegativevalueson that individual.However,because of thehighdegreeofoverlap, the con straintsatisfactionproblemhereisverysevere,and todiscoveranindividualthatreceivesadditionalpos itivevaluesfromotherterms-withoutlosingtoo many of thecurrentlyheldpositivevalues- is verydifficult(especiallysincecrossoverislargelyineffective onthesefunctions).Thisisparticularlytruesince, onceindividuals of relativelyhighfitnessarediscovered,thediversityof the populationfallsveryquickly.Forexample,ononetypical run thenumberofdifferentchromosomesinthepopulation started out at 256,fellto 39 bygeneration 20, and stayedrelativelycon stant thereafter'. Veryquickly,thepopulationsubdi videsitselfintoasmallnumber of mutuallyexclusive sets:onelargeset of individualsreceivingpositiveval uesfromonelargesubsetofthetermsinthefitnessfunction, and other,muchsmallersetsofindividuals receivingpositivevaluesfromtheother 1 smallersubsets of terms in thefitnessfunction. Tosummarize,theGAstaysstuck at alocalopti mum that is discoveredearly. It ispossible that raising crossoverandmutationratesmightimprovetheGA'sperformance on suchfunctions,sinceitwouldincrease theamount of diversityinthepopulation.Onatypical run involvingstrings of length128,the situationisverydifferent.Insucharun, the GA tendsnottodiscoveritshighest-fitindividualuntillatein the run (onaverage,aroundgeneration150),and the diversityof the populationremainsrelatively lWe countedtwoindividualsasbeingdifferentonly if theyweredifferentatsomesignificantlocus-thatis,atalocusinwhichone of the 32 is hada 1. Asonewouldexpect,withstringsoflength32,everylocuswassignificant,which is notgenerallythecaseforstringsof length 128. highthroughout the run. The continuinghighdiversityof the populationseemstobearesultofseveralfactors,including the following:sincetheproblemis lessconstrained,therearemanywaysinwhichanindi vidualcanachievearelativelyhighfitness; and since thecrossoverratewasdefined on aper-allelebasis,thereare on averageagreaternumber of croSSoversperchromosomeherethanin the 32-bitcase.However, therelativelack of constraintswasthemajorfactor, sincewhencrossoverwasturnedoffinthe128-bitcase, thepopulationdiversityremainedconsiderablyhigher than for the 32-bit run describedabove,although it wasless than in the 128-bitcasewithcrossoverturnedon.Inthe128-bitcase, the populationdoesnotsegregateintomutuallyexclusivesets-throughoutatypicalrun,each of thetermsinthefitnessfunction providepositivevaluestoasignificantfraction of thepopulation.Because of the relativelack of constraints inthe128-bitcase,thepopulationdoesnotquicklybecomelockedintoalocaloptimum, and isable both tocontinueexploringlonger than in the 32-bitcase and tooptimizeeach term ofthefitnessfunctionmoreindependently. 4.1.3 Iscrossover effective onthese functions? As we mentionedearlier,thefacts that (1)aTanese function F evaluatespointsaccording to theirparity with the j's intheterms of F, and (2) that agivenFiscreatedusingpartitionindices of onlyasingleorder,imply that schemas of orderlower than theorderofF do notprovide the GAwithusefulinformation.Thuscrossover,one of themajorstrengths of the GA, isnotausefultoolforrecombiningbuildingblocksuntilschemas of order at leastashighas that ofthe functionhavebeendiscovered.Toverifythisempirically, we rantheGAwithoutcrossoveron 20 randomlygenerated32-bitorder8functions. The resultsaresummarizedinTable1under No Xover 32. The GA'sperformancewasnotimpaired;themaximumfitnessdiscoveredandthemeanpopulationfitnessarenotsignificantlydifferent fromtherunsinwhichcrossoverwasused (Original inthetable).Tofurtherverifytheineffectivenessof crossover,weran theGA withoutcrossover on 20ran domlygenerated128-bitorder8functions. The resultsaresummarizedunder No Xover 128inTable 1: theperformanceoftheGAwasnotsignificantlydifferentfromthe128-bitrunsinwhichcrossoverwasused (StrLen 128). With both theshorter and longerstringlengths(and thus with both largeandsmallamountsofoverlap),whetheror not crossover is useddoesnotseem to makeanydifferenceintheGA'sperformanceonthesefunctions.  Times AverageAverageAverageMax.Optimum セ。クN Fit. Gen. of MeanFit.Found % opt.) Max.Fit. (% opt.)Ungmal 088.12.9) OJ ,"" , 00.1 1"·0 Def-Len: 10092.32.9)41( 48 89.23.0 セエイMl・ョZ 128 1999.97 O.la) 15U (aU 9a.61.3 .No Xover32 088.42.7)22(2886.22.6 .N 0 Xover128 99.85.0.45) n( 419a.9 IU.6 liill 32095.41.5) - - iIlI 128 セu o I.U (U.U) - Table 1: Summary of results of allexperiments. The experimentswere all performedrunningthetraditional GA (or,inonecase,hillclimbing)onrandomlygeneratedorder8Walshpolynomials.Eachresultsummarizes20runs of 200generationseach. The experimentswere:(1) Original (replicatingTanese'sexperiments);(2) Dei-Len:10 (limitingthedefininglength of partitionindicesto 10); (3) Sir-Len: 128 (increasingthestringlengthto 128 bits); (4) NoXaver 32 (same as Original but with no crossover);(5) NaXaver 128 (same as Sir-Len: 128 but withnocrossover); (6) Hill 32 (hillclimbing on 32 bits);and (7) Hill 128 (hillclimbing on 128 bits)Allrunsexceptthe128-bitrunsusedstrings of length32. The valuesgivenare(1)thenumber of timestheoptimumwasfoundj(2)themaximumfitness (% ofoptimum)found(averagedover 20 runs);(3)theaveragegenerationatwhichthemaximumfitnesswasfoundjand(4)themaximumpopulationmean (% of optimum)duringarun(averagedover 20 runs).Thenumbersinparenthesesarethestandarddeviations. 4.1.4 Is the GA'spoor performance due to deceptive functions? Goldberg has recentlyshown that it ispossible to con struct deceptivefunctions by exploitinginteractionsamonglow-orderterms [6], and heproposes the possibility that the Tanesefunctionsaredeceptive.Webelieve that the Tanesefunctionsare not fullydeceptive and that even ifthey arepartiallydeceptive,de ceptionisnot the majorreasonthatthefunctionsare difficult. The Tanesefunctionsarenotfullydeceptive becausetherearesomeschemasthatdonotleadthe GAawayfrom the globaloptimum.Forexample,as wediscussedinSection4.1.3,schemas of lowerorder than the definedorder ofthe functionprovidenoin formationtodirecttheGA'ssearch. It should bepointedout that anyeven-orderTanesefunctionwillhavethe property that f(x) = f(X,ompl,m,n'). Forexample,consider an order-2function with onedefinedpartition-index, j = 0011. ,p;(x) willbepositivefor the schemas ##00 and ##11, and negativefor ##01 and ##10. It is possible to show that this property willholdforanyeven-orderposi tivetermandforanylinearcombination of even-orderterms. Thisproperty implies that forall the Tanesefunctionstherewill be at leasttwoglobal optima that arecomplements of oneanother.Additionally, there will be at leasttwosecondmostfitpoints,againcomplements of oneanother,and so on.Thisfactmayhaveimplicationsforwhetherornotthefunctionsaredecep tive.Oneconsequence of this property is that there will bemutually exclusivefamilies of solutions(trees of schemas) and that the GA may havetroubleselectingwhich peakto climb(seeSection4.1.2).As wehaveshowed,theTanesefunctionsareeffectively higher-order than the level at which theyare defined, andthisfactissufficienttoaccountforthefunctions' difficulty. The fact that the functionsareeffectivelyhigher-orderdoes not imply that they aredeceptive. 4.2 Why does partitioning the population seemto improve performance? Although both thetraditional and partitionedGAsperformedpoorly on the Tanesefunctionswithrespect to the highestfitnessfound, the resultsreported by Taneseshowed that the performance of mostinstances of the partitionedGA was betterthanthat ofthetraditional GA. This was the caseevenwhenthe population wassubdivided into extremelysmallpartitions, such as64 partitionsof 4individualseach,and, in somecases,evenfor128 partitions of 2individualseach. For functions of orderhigher than 4,neitherthe traditional nor partitionedGA's everfoundthefunctionoptimum,so the differencebetween the twowasmeasuredonly in terms of proximity to the optimumof the best fitnessdiscovered. AsTanesepointsout,theseresultsrunagainstconven tionalwisdom about GAs: it has been thought that ondifficultproblemsalarge population is neededforpracessingasufficient number of schemas [4]. Taneseproposesthreemainreasonsforthissurprising result.1.Tanesefunctionshavealarge numberof localop timaand the GA tends to convergeonone.Each ofthe smallersubpopulations of the partitioned GA willconvergeearlier than alargersinglepopulation, but the subpopulations tend toconverge on different optima, and soareable to explorea greaternumber.
Related Search
Advertisements
Similar documents
View more...
Advertisements
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks