*“Biffures” is French for crossing-outs; these short articles are light explorations, less involved discussions of topics that don’t form a particular sequence. If you are interested in more structured content, check out the Bitwise or the ML series.*

What is the probability of winning 12 games in a Hearthstone arena or duels run? Is the limit of 12 wins anchored in some interesting math?

There are several ways to think about this (in Biffure #2 certainly), and one of them will involve make assumptions about the distribution of deck and player strength. Naturally, using the normal distribution came to mind…

In a 1640 letter to his friend Frénicle de Bessy, Pierre de Fermat wrote that:

Tout nombre premier mesure infailliblement une des puissances — 1 de quelque progression que ce soit, et l’exposant de la dite puissance est sous-multiple du nombre premier donné — 1 ; et, après qu’on a trouvé la première puissance qui satisfait à la question, toutes celles dont les exposants sont multiples de l’exposant de la première satisfont tout de même à la question.

which, keeping all the archaisms of the original French version, translates to English as:

Every prime number measures infallibly one of the…

- A model parameterised by η is a (at least) tuple <Pη, Rη> of state transitions and rewards, such that:

- We learn the model by counting the occurrences of transitions and rewards for all visits to each state action pair. For example, from a two-state problem, A, B, we can infer a model from experience.
- Once we have a model, we can use a planning algorithm (value iteration, policy iteration, tree search…); however the recommended approach is to use the model
*only to generate samples*

The lectures draw from and complement Sutton and Barto’s book, *Reinforcement Learning: An Introduction*, which is available for free on Rich Sutton’s site, http://incompleteideas.net/.

- We still operate in the context of an unknown Markov Decision Process, and we apply General Policy Improvement to find optimal policies.
- ε-greedy policies follow the greedy policy with probability 1-ε and distribute ε chance to the other actions. They allow for continuous exploration of all states and actions. We can show that ε-greedy policies π’ with respect to q_π are improvements over initial policy π.
- Greedy in the Limit with Infinite Exploration (GLIE): ε-greedy with…

I continue my quest to learn something about reinforcement learning in 60 days (this is day 20), with a 15 hour investment in Deepmind’s David Silver’s course on Reinforcement learning, which itself draws from Sutton and Barto’s 1998 book, *Reinforcement Learning: an Introduction*.

- A state S is a
*Markov state*if it contains all useful information from the history (see Markov property, also referred to as “memorylessness”). *Value*is the prediction of a series of*rewards*, and is calculated as the expected sum of discounted rewards — similar to the concept of present value in finance.- Model:
*P*is…

We left the last article reflecting on how powerful single neurons are — particularly neurons with Identity or Sigmoid activations, which we now know can solve linear and logistic regressions. We also wondered why despite their qualities, those types of neurons are not the evident choice for neural networks.

We use *Deep sparse rectifier neural networks* (Glorot et al., 2011) to find answers, with key findings summarized in two tables below:

A neuron in machine learning looks like this if you Google it:

Here is my own representation of a single neuron — a bit more wavy, unpractical when it comes to drawing networks, hopefully not as bad for education purposes.

In this series, I will attempt to learn something about reinforcement learning in a limited period of time, after work hours, with the intent to have a somewhat performant system built by the end of that period. A system that plays Atari games maybe, or a trading system — we will see.

There is no intended audience for this series, though if you wish to follow this, you should know that I know some fundamental of machine learnings but am no expert. I will try to learn as much as possible, with 50% planning, 50% improvisation as I will run…

*This continues a series on bitwise operations and their applications, written by a non-expert, for non-experts. Follow **Biffures** for future updates.*

Hash functions transform arbitrary large bit strings called *messages*, into small, fixed-length bit strings called *message digests*, such that digests* *identify the messages** **that produced them with a very high probability. Digests are in that sense fingerprints: a function of the message,* *simple, yet complex enough that they allow identification of their message, with a very low probability that different messages will share the same digests.

In SHA-256, messages up to 2⁶⁴ bit (2.3 exabytes, or 2.3 billion gigabytes)…

*We explained in **Part 1** what bit strings are, and in **Part 2** and **3**, that interesting patterns can be found in bitwise operations such as AND and OR. In Part 4, we accelerate these findings as we present key bitwise operations and their patterns with minimal commentary. Usual caveats: I use the notations found in **FIPS 180–4**; assume bit strings are representations of positive-only integers, most significant bit to the left.*

In the following examples, we work with bit words of fixed length *n*, and we note *a, b* two *n*-bit words. …

I care about history, education and science.