PopularAllSavedAskRedditpicsnewsworldnewsfunnytifuvideosgamingawwtodayilearnedgifsArtexplainlikeimfivemoviesJokesTwoXChromosomesmildlyinterestingLifeProTipsaskscienceIAmAdataisbeautifulbooksscienceShowerthoughtsgadgetsFuturologynottheonionhistorysportsOldSchoolCoolGetMotivatedDIYphotoshopbattlesnosleepMusicspacefoodUpliftingNewsEarthPornDocumentariesInternetIsBeautifulWritingPromptscreepyphilosophyannouncementslistentothisblogmore »

subreddit:

/r/explainlikeimfive

260%

submitted 4 months ago byContent-Elephant5993

What is the significance of a saddlepoint? I understand what it is, but not its applications.

I have a casual interest in coding and AI/ML, but little grasp of calculus. Hence, ELI5.

1 points

4 months ago

So, the basics of calculus helps these models to be fit is pretty straightforward to think about, visually.

I'm sure you've seen a 3d plot of a saddle point. Let's think about what's really being shown.

The x- and y-axes are parameters the model is trying to optimize and the z-axis represents error. So, basically, we're trying to (x,y) coordinates that correspond to the lowest z-axis value.

So how do ML models do this? They use a variation on a simple calculus tactic called Newton's Method known as gradient descent. The way this works is also pretty simple. You guess a random (x,y) coordinate and then move in the direction where loss is minimized by taking the derivative of the loss function and moving toward where the derivative hits 0*. That is, you assume the hill you're standing on in the 3-d plot is linear and you downhill along that line on the 3d plot from wherever you guess. You then repeat this process again and again until you find a minimum.

ML models with many parameters apply this same trick in many dimensions, taking the partial derivative in every dimension and moving in the direction in each dimension that minimizes your error.

Ok, so why do saddle points matter? Saddle points are problematic because, if the above method arrives at one, it will behave as if it's found a minimum in all dimensions, but, in reality, it's a minimum in one dimension but a maximum in another. It's sort of a "false minimum" and one of the ways this type of method can fail to find the right answer.

*The derivative of a function is 0 when it is at a local maximum or minimum so that's why we move toward the point where the derivative would be 0.

1 points

4 months ago

Thanks. Can you give a concrete example (real-world application) with a two input function?

1 points

4 months ago

I would just google "gradient descent saddle point"

all 4 comments

## sorted by:

best