I listened to this very interesting podcast episode on statistical learning by Lex Fridman. It covered a variety of topics including cognitive science, philosophy and machine learning. This is an attempt to summarise its contents in an easy-to-understand language.
Instrumentalism and Realism
The phrase “does god play dice” is associated with the debate on whether the world is deterministic or probabilistic. In the sense, can we discover all laws and predict how things will turn out or whether things are inherently probabilistic and nothing can be fully understood.
The phrase, “god does not play dice” was popularised by Albert Einstein who believed in a deterministic world and rejected probabilistic roots of Quantum physics. This question was thrown at the guest speaker by Fridman and the answer is refreshingly honest.
We don't know some factors. And because we don't know some factors, which could be important, it looks like God plays dice.
Vladimir Vapnik’s answer on Lex Fridman Podcast
It’s really fascinating to know about the process of learning. How does one learn? What are we learning in reality? Let’s set aside our agnostic or atheistic tendencies aside for a moment and imagine that the whole universe is a creation of god.
And we humans are living in this simulation, trying to make sense of what god did. The act of making sense of god’s actions/creation is “learning”. In that sense, the veteran researcher Vladimir Vapnik says that we can look at god’s act in two ways:
A position of instrumentalism : where the seeker is creating a theory for predicting the outcome in any given scenario. (Example : Newton’s laws of motion)
A position of realism : the seeker is trying to understand what god did (i.e. the phenomenon). Most machine learning models which are trying to predict. However in real terms they are learning about conditional probability.
Then the speakers go on to discuss how god speaks to humans via mathematics, or in other words, god speaks to man in the language of mathematics.
Leaping ahead of math
Are there moments of brilliance in human intuition that can leap ahead of math and then the math will catch?
Lex Fridman
Apparently not. At best, humans can come up with axioms and later on go to prove these axioms via mathematics. What we imagine or cook up in our heads are in no way relevant to the real solution of the problem, though it might inspire us to find the solution in some way.
Process of Learning
Do you think we will ever have the tools to try to describe that process of learning?
Lex Fridman
According to the speaker, this is not possible. Whatever we might find is not the description of what's going on. It is an interpretation. The speaker goes on to explain how the inventor of the microscope observed the blood and came up with the wrong interpretation of what he saw. Similarly, we might be able to observe something and come up with an interpretation. But we might never reach the truth.
A great teacher
People say that one day with a great teacher is better than a thousand days of diligent studies. But if I ask you what a teacher does, nobody knows. And that is intelligence. But we know from history and now from math and machine learning that a great teacher can do a lot.
We can say what a great teacher can do. He can introduce some invariants, some predicate for creating invariants. How is he doing it? I don't know, because teacher knows reality and can describe from this reality, predicate invariants. But he knows that when you're using invariant, you can decrease number of observations 100 times.
Vladimir Vapnik (paraphrased with grammar corrections)
This is a fascinating part of the podcast because it tells you what a great teacher can bring to the table. But at this point I need to dive into the terms “invariant” and “predicate”.
Predicate: A logical condition that returns true or false.
Invariant: A condition that stays true throughout the execution of a specific section of code (usually a loop). In the context of cognition, one can replace “code” with the “learning process”.
Strong Convergence Mechanism and Weak Convergence Mechanism
Apparently there are two modes of learning.
Strong Convergence: Guarantees almost sure convergence to the exact solution for each instance of the algorithm.
Weak Convergence: Guarantees that the distribution of the algorithm’s output converges to the distribution of the true solution, allowing for more noise or variance across instances. (more details on this later)
From what I can infer, humans learn via the weak convergence mechanism. An English proverb is used to explain this thought.
If it looks like a duck, swims like a duck, and quacks like a duck, then it is probably a duck.
In this scenario we are using predicates, and that too relevant to the scenario. You need to answer three questions with ‘yes’ to confirm that the creature is a duck.
Does it look like a duck?
Does it swim like a duck?
Does it quack like a duck?
Now, we are also not using useless predicates to confirm whether a creature is duck. Nobody wants to check if it plays chess like a duck because ducks don’t play chess.
Some more research on this topic on ChatGPT lead me to this information.
In machine learning, strong convergence and weak convergence mechanisms typically refer to the mathematical concepts of how an algorithm converges to a solution over time. These concepts are derived from probability theory and statistics, and they apply to the behavior of learning algorithms. Here's an explanation of each:
1. Strong Convergence Mechanism
Definition: Strong convergence refers to the almost sure convergence of an algorithm's solution to the true value or optimal point. This means that as the number of iterations increases, the solution will almost certainly reach the optimal value, provided that the model is sufficiently complex and the data is large enough.
Key Characteristics:
The probability that the solution converges to the optimal value approaches one.
It’s a pointwise guarantee—meaning that each specific instance of the algorithm’s sequence will get arbitrarily close to the true solution.
It typically requires stricter conditions on the learning algorithm (e.g., stability, smoothness of the objective function, or convexity).
Strong convergence is often difficult to achieve in complex models because it requires the solution to converge almost exactly in each trial.
Example in Machine Learning: A gradient descent algorithm might strongly converge if, for every learning rate and initialization, it reaches the true minimum of a loss function with probability ‘1’ as the number of iterations tends to infinity.
2. Weak Convergence Mechanism
Definition: Weak convergence refers to the convergence in distribution of an algorithm’s output to the true solution. It guarantees that the distribution of the algorithm’s output gets closer to the distribution of the true solution, but not necessarily on a point-by-point basis.
Key Characteristics:
It’s a distributional guarantee—meaning that the overall distribution of solutions from different iterations converges to the distribution of the true solution.
Weak convergence is often easier to achieve than strong convergence, as it allows for some variability or noise in individual iterations, but the overall performance improves over time.
Many stochastic optimization algorithms are designed to achieve weak convergence.
Example in Machine Learning: Stochastic gradient descent (SGD) may exhibit weak convergence, as the average behavior of the iterates converges to the optimal solution, but individual iterates may oscillate due to noise.
Source: ChatGPT
Deep Learning and Neural Networks
Mathematics does not know deep learning. Mathematics does not know neurons. It operates only on the basis of functions.
Vladimir Vapnik
The discussion goes on towards how a set of admissible functions are required to solve a problem. It also highlights that mathematics as a field does not require deep learning or neural networks. It is driven by functions by the end of the day.
What’s fascinating about humans is that we can learn more with less training data (in comparison to a machine). Now, the speaker is not impressed by AlphaGO defeating a top ranked player. He suggests that this success is only an indication that the problem was not so complicated.
The conversation from this point veers off to some philosophical abstraction from which I could not gather meaningful takeaways. But, documenting these things has substantially secured the knowledge that could make sense to me in the long run.
I hope this summary/reflection on the podcast episode peeked your interest and probably inspired you to listen to the entire 54 minute episode.