CPU Time Jitter Based Non-Physical True Random Number Generator

Stephan Müller <smueller@chronox.de>

Abstract

Today’s operating systems provide non-physical true random number generators which are based on hardware events. With the advent of virtualization and the ever growing need of more high-quality random numbers, these random number generators reach their limits. Additional sources of entropy must be opened up. This document introduces an entropy source based on CPU execution time jitter. The design and implementation of a non-physical true random number generator, the CPU Jitter random number generator, its statistical properties and the maintenance and behavior of entropy is discussed in this document.

1 Introduction

Each modern general purpose operating system offers a non-physical true random number generator. In Unix derivatives, the device file /dev/random allows user space applications to access such a random number generator. Most of these random number generators obtain their entropy from time variances of hardware events, such as block device accesses, interrupts triggered by devices, operations on human interface devices (HID) like keyboards and mice, and other devices.
Limitations of these entropy sources are visible. These include:
How can these challenges be met? A new source of entropy must be developed that is not affected by the mentioned problems.
This document introduces a non-physical true random number generator, called CPU Jitter random number generator, which is developed to meet the following goals:
  1. The random number generator shall only operate on demand. Other random number generators constantly operate in its lifetime, regardless whether the operation is needed or not, binding computing resources.
  2. The random number generator shall always return random numbers with a speed that satisfies today’s requirement for random numbers. The random number generator shall be able to be used synchronously with the random number consuming application, such as the seeding of a deterministic random number generator.
  3. The random number generator shall not block the request for user noticeable time spans.
  4. The random number generator shall deliver high-quality random numbers when used in virtualized environments.
  5. The random number generator shall not require a seeding with data from previous instances of the random number generator.
  6. The random number generator shall work equally well in kernel space and user space.
  7. The random number generator implementation shall be small, and easily understood.
  8. The random number generator shall provide a decentralized source of entropy. Every user that needs random numbers executes its own instance of the CPU Jitter random number generator. Any denial of service attacks or other attacks against a central entropy source with the goal to decrease the level of entropy maintained by the central entropy source is eliminated. The goal is that there is no need of a central /dev/random or /dev/urandom device.
  9. The random number generator shall provide perfect forward and backward secrecy, even when the internal state becomes known.
Apart from these implementation goals, the random number generator must comply with the general quality requirements placed on any (non-)physical true random number generator:
Entropy The random numbers delivered by the generator must contain true information theoretical entropy. The information theoretical entropy is based on the definition given by Shannon.
Statistical Properties The random number bit stream generated by the generator must not follow any statistical significant patterns. The output of the proposed random number generator must pass all standard statistical tools analyzing the quality of a random data stream.
These two basic principles will be the guiding central theme in assessing the quality of the presented CPU Jitter random number generator.
The document contains the following parts:
But now away with the theoretical blabber: show me the facts! What is the central source of entropy that is the basis for the presented random number generator?

1.1 Related Work

Another implementation of random number generators based on CPU jitter is provided with HAVEGEd. A similar work is proposed in The maxwell random number generator by Sandy Harris.
An analysis of the system-interhent entropy in the Linux kernel is given with the Analysis of inherent randomness of the Linux kernel by N. Mc Guire, P. Okech, G. Schiesser.

2 CPU Execution Time Jitter

We do have deterministically operating CPUs, right? Our operating systems behave fully deterministically, right? If that would not be the case, how could we ever have operating systems using CPUs that deliver a deterministic functionality.
Current hardware supports the efficient execution of the operating system by providing hardware facilities, including:
In addition to the hardware nondeterminism, the following operating system caused system usage adds to the non-deterministic execution time of sets of instructions:

2.1 Assumptions

The CPU Jitter random number generator is based on a number of assumptions. Only when these assumptions are upheld, the data generated can be believed to contain the requested random numbers. The following assumptions apply:

2.2 Jitter Depicted

With the high complexity of modern operating systems and their big monolithic kernels, all the mentioned hardware components are extensively used. However, due to the complexity, nobody is able to determine which is the fill level of the caches or branch prediction units, or the precise location of data in memory at one given time.
This implies that the execution of instruction may have miniscule variations in execution time. In addition, modern CPUs have a high-resolution timer or instruction counter that is so precise that they are impacted by these tiny variations. For example, modern x86 CPUs have a TSC clock whose resolution is in the nanosecond range.
These variations in the execution time of an identical set of CPU instructions can be visualized. Figure 1↓ illustrates the the variation of the following code sequence:
static inline void jent_get_nstime(uint64_t *out)
{
...
        if (clock_gettime(CLOCK_REALTIME, &time) == 0)
...
}
​
void main(void)
{
...
        for(i = 0; (SAMPLE_COUNT + CACHE_KILL) >i ; i++)
        {
...
                jent_get_nstime(&time);
                jent_get_nstime(&time2);
                delta = time2 - time;
...
}
Collection of Time Variances in Userspace
The contents of the variable delta is not identical between the individual loop iterations. When running the code with a loop count of 1,000,000 on an otherwise quiet system to avoid additional time variance from the noise of other processes, we get data as illustrated in figure 1↓.
figure pics/userspace-time-dist-delta-64-hist.png
Figure 1 Distribution of time variances in user space over 1.000.000 loops
Please note that the actual results of the aforementioned code contains a few exceptionally large deltas as an operating system can never be fully quiesced. Thus, the test results were processed to cut off all time deltas above 64. The limitation of the graph to all variations up to 64 can be considered as a ‘‘magnification’’ of the data set to the interesting values.
Figure 1↑ contains the following information of interest to us:
The graph together with the code now illustrates the variation in execution time of the very same set of operations — it illustrates the CPU execution time jitter for a very tight loop. As these variations are based on the aforementioned complexity of the operating system and its use of hardware mechanisms, no observer can deduce the next variation with full certainty even though the observer is able to fully monitor the operation of the system. And these non-deterministic variations are the foundation of the proposed CPU Jitter random number generator.
As the CPU Jitter random number generator is intended to work in kernel space as well, the same analysis is performed for the kernel. The following code illustrates the heart of the data collection disregarding the details on copying the data to user space [A]  [A] For details on how to perform the test, see the tests_kernel/getstat.sh script and the functionality discussed in appendix B.2↓..
static int jent_timer(char *data, size_t len)
{
        __u64 time, time2;
        time = time2 = 0;
...
        time = random_get_entropy();
        time2 = random_get_entropy();
        snprintf(data, len, "%lld\n", (time2 - time));
...
}
Collection of Time Variances in Kernel Space
Although the code sequence is slightly different to the user space code due to the use of the architecture-dependent processor cycle function call random_get_entropy, the approach is identical: obtaining two time stamps and returning the delta between both. The time stamp variance collection is invoked 30.000.000 times to obtain the graph presented in figure 2↓ [B]  [B] The generation of the given number of time deltas is very fast, typically less than 10 seconds. Thus, the shown graph is not fully representative. When re-performing the test, the distribution varies greatly, including the Shannon Entropy. The lowest observed value was in the 1.3 range and the highest was about 3. The reason for not obtaining a longer sample is simply resources: calculating the graph would take more than 8 GB of RAM. .
figure pics/kernel-time-dist-delta-128-hist.png
Figure 2 Distribution of time variances in kernel space over 10.000.000 loops
Striking differences between the timer variances in kernel and user space can be detected:
Even with the kernel time stamp incremented in steps of three, the user space and the kernel space time stamps show a distribution and fluctuation. When looking at the sequence of time deltas gathered during testing [D]  [D] The test result logs can be found in tests_kernel/timer-dist-kernel.data and tests_userspace/timer-dist-userspace.data., no pattern can be detected. Therefore, the fluctuation and the resulting distribution are not based on a repeating pattern and must be considered random.
The tests were executed on an Intel Core i7 with 2.7GHz. As the tests always consume much CPU power, the frequency of the CPU was always at 2.7 GHz during the tests. Surprisingly, when setting the maximum speed of the CPU to 800MHz, which is the lowest setting on the test machine, the distribution of the kernel timer variations hardly changes. For user space, the timer variations are larger compared to a fast CPU on an otherwise quiet system as depicted in figure 3↓. As the variations are even on a slower system, all subsequent discussions will cover the worse case of the fast CPU speed illustrated above as its variations inherently has less entropy.
figure pics/userspace-800MHz-time-dist-delta-150-hist.png
Figure 3 Distribution of time variances in user space over 1.000.000 loops at 800 MHz
Now that we have established the basic source of entropy, the subsequent design description of the random number generator must explain the following two aspects which are the basic quality requirements discussed in chapter 1↑ applied to our entropy phenomenon:
  1. The random number generator design must be capable of preserving and collecting the entropy from the discussed phenomenon. Thus, the random number generator must be able to ‘‘magnify’’ the entropy phenomenon.
  2. The random number generator must use the observed CPU execution time jitter to generate an output bit string that delivers the random numbers to a caller. That output string must not show any statistical anomalies that allow an observer to deduce any random numbers or increase the probability when guessing random numbers and thus reducing its entropy.
With the following chapter, the design of the random number generator is presented. Both requirements will be discussed.

3 Random Number Generator Design

The CPU Jitter random number generator uses the above illustrated operation to read the high-resolution timer for obtaining time stamps. At the same time it performs operations that are subject to the CPU execution time jitter which also impact the time stamp readings.

3.1 Maintenance of Entropy

The heart of the random number generator is illustrated in figure 4↓.
figure pics/big-picture.png
Figure 4 Entropy Collection Operation
The random number generator maintains a 64 bit unsigned integer variable, the entropy pool, that is indicated with the dark gray shaded boxes in figure 4↑ which identify the entropy pool at two different times in the processing. The light gray shaded boxes indicate the noise sources of the random number generator is based on.
In a big picture, the random number generator implements an entropy collection loop that
  1. invokes memory accesses to induce timing variations,
  2. fetches a time stamp to calculate a delta to the time stamp of the previous loop iteration,
  3. folds the time delta value into one bit,
  4. processes this value with a Von-Neumann unbias operation,
  5. adds this value to the entropy pool using XOR,
  6. rotates the pool to fill the next bit value of the pool,
  7. verifies that basic properties of the time stamp are met.
The loop is executed exactly 64 times as each loop iteration generates one bit to fill all 64 bits of the entropy pool. After the loop finishes, the contents of the entropy pool is stirred with a constant and is given to the caller as a 64 bit random number [E]  [E] If the caller provides an oversampling rate of greater than 1 during the allocation of the entropy collector, the loop iteration count of 64 is multiplied by this oversampling rate value. For example, an oversample rate of 3 implies that the 64 loop iterations are executed three times — i.e. 192 times..
The following subsection discuss every step in detail.
When considering that the time delta is always calculated by calculating the delta to the previous loop iteration, and the fact that the majority of the execution time is spend in the folding loop, the central idea of the CPU Jitter Random Number Generator is to measure the execution time jitter over the execution of the folding loop as well as the memory access.

3.1.1 Noise Source: Memory Access

For implementing memory access, memory is allocated during the allocation time of the entropy collector. The memory access operation is defined by the following values:
The size of the memory can be obtained by multiplying the size of a memory block with the number of block. Per default, one block is 64 bytes in size and 32 blocks are defined.
To perform a read and write access, one byte is simply incremented by one and wrapped at 255. The code ensures that all bytes of the memory are accessed evenly by maintaining an index pointing to the last byte in the memory that was accessed.
This index is incremented by the size of one memory slot minus 1. In addition, the index is wrapped if it would point beyond the size of the memory block. The following code snippet shows the handling of the index:
wrap = ec->memblocksize * ec->memblocks - 1;
...
        ec->memlocation = ec->memlocation + ec->memblocksize - 1;
        if(ec->memlocation > wrap)
               ec->memlocation -= wrap;
This code ensures that every byte in the memory is accessed once before one byte is accessed a second time.
The memory is accessed in a loop whose length is defined by the number of access operations. The number of access operations is defined by a time stamp taken immediately before the memory access noise source is triggered. The lowest 7 bits of the high-resolution time stamp plus a static value of 64 are used to determine the number of memory access operations. This implies that the number of memory accesses varies between 64 and 192. Measurements have shown that even with the smallest tested CPUs an even distribution of the memory access operations is achieved.

3.1.2 Obtaining Time Delta

The time delta is obtained by:
  1. Reading a time stamp,
  2. Subtracting that time stamp from the time stamp calculated in the previous loop iteration,
  3. Storing the current time stamp for use in the next loop iteration to calculate the next delta.
For every new request to generate a new random number, the first iteration of the loop is used to ‘‘prime’’ the delta calculation. In essence, all steps of the entropy collection loop are performed, except of mixing the delta into the pool and rotating the pool. This first iteration of the entropy collection loop does not impact the number of iterations used for entropy collection. This is implemented by executing one more loop iteration than specified for the generation of the current random number.
When a new random number is to be calculated, i.e. the entropy collection loop is triggered anew, the previous contents of the entropy pool, which is used as a random number in the previous round is reused. The reusing shall just mix the data in the entropy pool even more. But the implementation does not rely on any properties of that data. The mixing of new time stamps into the entropy pool using XOR ensures that any entropy which may have been left over from the previous entropy collection loop run is still preserved. If no entropy is left, which is the base case in the entropy assessment, the already arbitrary bit pattern in the entropy pool does not negatively affect the addition of new entropy in the current round.
When the time delta is obtained, also the following properties of the time are measured: the first, second and third derivation of the time. Only when all three values are not 0, the resulting bit of the time delta after the folding operation described in the following is considered valid — section 3.1.5↓ explains how this validity is enforced.

3.1.3 Noise Source: Folding Operation of Time Delta

The folding operation is depicted by the left side of figure 4↑. That folding operation is implemented by a loop where the loop counter is not fixed.
The folding operation is considered as a noise source as the execution of the folding operation contains variations that is measured with the time delta measurement from the first step.
To calculate the new fold loop counter a new time stamp is obtained. All bits above the value MAX_FOLD_LOOP_BITS — which is set to 4 — are zeroed. The idea is that the fast moving bits of the time stamp value determine the size of the collection loop counter. Why is it set to 4? The 4 low bits define a value between 0 and 24, i.e. a value between 0 and 16. This uncertainty is used to quickly stabilize the distribution of the output of that folding operation to an equidistribution of 0 and 1, i.e. about 50% of all output is 0 and also about 50% is 1. See section 5.2.2↓ for a quantitative analysis of that distribution. To ensure that the collection loop counter has a minimum value, the value 2¹ is added — that value is controlled with MIN_FOLD_LOOP_BIT. Thus, the range of the folding counter value is from 2⁰ to (2⁴ + 2⁰ - 1). Now, this newly determined collection loop counter is used to perform a new fold loop as discussed in the following.
Figure 5↓ shows the concept of the folding operation of one time delta value.
figure pics/time_folding.png
Figure 5 Folding of the time delta and mixing it into the entropy pool
The upper 64 bit value illustrated in figure 5↑ is the time delta obtained at the beginning of the current entropy collection loop iteration. Now, the time delta is partitioned into chunks of 1 bit starting at the lowest bit. The different shades of gray indicate the different 1 bit chunks. The 64 1 bit chunks of the time value are XORed with each other to form a 1 bit value. With the XORing of all 1 bit chunks with each other, any information theoretical entropy that is present in the time stamp will be preserved when folding the value into the 1 bit. But as we fold it into 1 bit, the maximum entropy the time stamp can ever add to the entropy pool is, well, 1 bit. The folding operation is done as often as specified in the loop count.

3.1.4 Von-Neumann Unbias Operation

According to RFC 4086 section 4.2, a Von-Neumann unbias operation can be considered to remove any potential skews that may be present in the bit stream of the noise source. The operation is used to ensure that in case skews are present, they are eliminated. The unbias operation is only applicable if the individual consecutive bits are considered independent. Chapter 5↓ indicates the independence of these individual bits.
To perform the Von-Neumann unbias operation, two independently generated folded bits are processed.

3.1.5 Adding Unbiased Folded Time Delta To Entropy Pool

After obtaining the 1 bit folded and unbiased time stamp, how is it mixed into the entropy pool? The lower 64 bit value in figure 5↑ indicates the entropy pool. The 1 bit folded value is XORed with 1 bit from the entropy pool.
But which bit is used? The rotation to the left by 1 bit that concludes the entropy collection loop provides the answer. When the entropy collection loop perform the very first iteration, the 1 bit is XORed into bit 0 of the entropy pool. Now, that pool is rotated left by 1 bit. That means that bit 63 before the rotation becomes bit 0 after the rotation. Thus, the next round of the entropy collection loop XORes the 1 bit folded time stamp again into bit 0 which used to be bit 63 in the last entropy collection loop iteration [F]  [F] Note, figure 5↑ illustrates that the the folded bit of the time delta is moved over the 64 bit entropy pool as indicated with the bold black box (a.k.a the ‘‘slider’’). Technically, the slider stays at bit 0 and the entropy pool value rotates left. The end result of the mixing of the folded bit into the entropy pool, however, is identical, regardless whether you rotate the entropy pool left or move the slider to the right. To keep the figure illustrative, it indicates the movement of the slider..
The reason why the rotation is done with the value 1 is due to the fact that we have 1 bit we want to add to the pool. The way how the folded bit values are added to the entropy pool can be viewed differently from a mathematical standpoint when considering 64 1 bit values: instead of saying that each of the 64 1 bit value is XORed independently into the entropy pool and the pool value is then rotated, it is equivalent to state that 64 1 bit values are concatenated and then the concatenated value is XORed into the entropy pool. The reader shall keep that analogy in mind as we will need it again in chapter 5↓.
During the time delta measurement, various derivations of the time are calculated as already discussed. The folded bit is added to the entropy pool regardless whether the time measurement is considered valid, i.e. the derivations are all non-zero. However, the mixed-in bit is only counted for the 64 bit values if all derivations are non-zero. Otherwise, the entropy pool is mixed and rotated only.

3.1.6 Stirring The Entropy Pool

After the raw 64 bits from the entropy collection loop are obtained, the bits are stirred by XORing them with a constant.
The constant is calculated by using the first two 32 bit SHA-1 initialization vectors as a starting point. A loop is performed 64 times where these initialization vectors are XORed into the constant variable if the i-th bit of the entropy pool is set. The loop, the constant variable is rotated left by one bit before the next loop iteration is started.
The goal is to generate a constant variable that is dependent on the bits in the entropy pool. That variable shall only mix the pool further without loosing any entropy collected in the entropy pool.

3.2 Generation of Random Number Bit Stream

We now know how one 64 bit random number value is generated. The interface to the CPU Jitter random number generator allows the caller to provide a pointer to memory and a size variable of arbitrary length. The random number generator is herewith requested to generate a bit stream of random numbers of the requested size that is to be stored in the memory pointed to by the caller.
The random number generator performs the following sequence of steps to fulfill the request:
  1. Check whether the requested size is smaller than 64 bits. If yes, generate one 64 bit random number, copy the requested amount of bits to the target memory and stop processing the request. The unused bits of the random number are not used further. If a new request arrives, a fresh 64 bit random number is generated.
  2. If the requested size is larger than 64 bits, generate one random number, copy it to the target. Reduce the requested size by 64 bits and decide now whether the remaining requested bits are larger or smaller than 64 bits and based on the determination, follow either step 1 or step 2.
Mathematically step 2 implements a concatenation of multiple random numbers generated by the random number generator.

3.3 Initialization

The CPU Jitter random number generator is initialized in two main parts. At first, a consuming application must call the jent_entropy_init(3) function which validates some basic properties of the time stamp. Only if this validation succeeds, the CPU Jitter random number generator can be used [G]  [G] The importance of this call is illustrated in appendix F.31↓ as well as other sections in appendix F↓ where some CPUs are not usable as an entropy source..
The second part can be invoked multiple times. Each invocation results in the instantiation of an independent copy of the CPU Jitter random number generator. This allows a consumer to maintain multiple instances for different purposes. That second part is triggered with the invocation of jent_entropy_collector_alloc(3) and implements the following steps:
  1. Allocation and zeroization of memory used for the entropy pool and helper variables — struct rand_data defines the entropy collector which holds the entropy pool and its auxiliary values.
  2. Invoking the entropy collection loop once — this fills the entropy pool with the first random value which is not returned to any caller. The idea is that the entropy pool is initialized with some values other than zero. In addition, this invocation of the entropy collection loop implies that the entropy collection loop counter value is set to a random value in the allowed range.
  3. If FIPS 140-2 is enabled by the calling application, the FIPS 140-2 continuous test is primed by copying the random number generated in step 3 into the comparing value and again triggering the entropy collection loop again for a fresh random number.

3.4 Memory Protection

The CPU Jitter random number generator is intended for any consuming application without placing any requirements. As a standard behavior, after completing the caller’s request for a random number, i.e. generating the bit stream of arbitrary length, another round of the entropy collection loop is triggered. That invocation shall ensure that the entropy pool is overwritten with a new random value. This prevents a random value returned to the caller and potentially used for sensitive purposes lingering in memory for long time. In case paging starts, the consuming application crashes and dumps core or simply a hacker cracks the application, no traces of even parts of a generated random number will be found in the memory the CPU Jitter random number generator is in charge of.
In case a consumer is deemed to implement a type of memory protection, the flag CRYPTO_CPU_JITTERENTROPY_SECURE_MEMORY can be set at compile time. This flag prevents the above mentioned functionality.
Example consumers with memory protection are the kernel, and libgcrypt with its secure memory.

3.5 Locking

The core of the CPU Jitter random number generator implementation does not use any locking. If a user intends to employ the random number generator in an environment with potentially concurrent accesses to the same instance, locking must be implemented. A lock should be taken before any request to the CPU Jitter random number generator is made via its API functions.
Examples for the use of the CPU Jitter random number generator with locks are given in the reference implementations outlined in the appendices.

3.6 Intended Method of Use

The CPU Jitter random number generator must be compiled without optimizations. The discussion in section 5.1↓ supported by appendix F↓ explains the reason.
The interface discussed in section 3.2↑ is implemented such that a caller requesting an arbitrary number of bytes is satisfied. The output can be fed through a whitening function, such as a deterministic random number generator or a hash based cryptographically secure whitening function. The appendix provides various implementations of linking the CPU Jitter random number generator with deterministic random number generators.
However, the output can also be used directly, considering the statistical properties and the entropy behavior assessed in the following chapters. The question, however, is whether this is a wise course of action. Whitening shall help to protect the entropy that is in the pool against observers. This especially a concern if you have a central entropy source that is accessed by multiple users — where a user does not necessarily mean human user or application, since a user or an application may serve multiple purposes and each purpose is one ‘‘user’’. The CPU Jitter random number generator is designed to be instantiated multiple times without degrading the different instances. If a user employs its own private instance of the CPU Jitter random number generator, it may be questionable whether a whitening function would be necessary.
But bottom line: it is a decision that the reader or developer employing the random number generator finally has to make. The implementations offered in the appendices offer the connections to whitening functions. Still, a direct use of the CPU Jitter random number generator is offered as well.

3.7 Programming Dependencies on Operating System

The implementation of the CPU Jitter random number generator only uses the following interfaces from the underlying operating systems. All of them are implemented with wrappers in jitterentropy-base-*.h. When the used operating system offers these interfaces or a developer replaces them with accordingly, the CPU Jitter random number generator can be compiled on a different operating system or for user and kernel space:
The following additional functions provided by an operating system are used without a wrapper as they are assumed to be present in every operating environment:

4 Random Generator Statistical Assessment

After the discussion of the design of the entropy collection, we need to perform assessments of the quality of the random number generator. As indicated in chapter 1↑, the assessment is split into two parts.
This chapter contains the assessment of the statistical properties of the data in the entropy pool and the output data stream.
When compiling the code of the CPU Jitter random number generator with the flag CRYPTO_CPU_JITTERENTROPY_STAT, instrumentations are added to the code that obtain the data for the following graphs and distributions. The tests can be automatically re-performed by invoking the tests_[userspace|kernel]/getstat.sh shell script which also generates the graphs using the R-Project language toolkit.

4.1 Statistical Properties of Entropy Pool

During a testing phase that generated 1,000,000 random numbers, the entropy pool is observed. The observation generated statistical analyses for different aspects illustrated in table 1↓. Each line in the table is one observation of the entropy pool value of one round of the entropy collection loop. To read the table, assume that the entropy pool is only 10 bits in size. Further, assume that our entropy collection loop count is 3 to generate a random number.
The left column contains the entropy collection loop count and the indication for the result rows. The middle columns are the 10 bits of the entropy pool. The Bit sum column sums the set bits in the respective row. The Figure column references the figures that illustrate the obtained test data results.
Loop count 0 1 2 3 4 5 6 7 8 9 Bit sum Figure
1 0 1 1 0 0 0 1 0 1 1 N/A N/A
2 0 0 0 1 0 1 1 1 0 0 N/A N/A
3 1 1 0 0 1 0 1 0 0 0 4 6↓
Result 1 1 2 1 1 1 1 3 1 1 1 13 8↓
Result 2 1 2 1 2 1 2 0 2 1 1 13 10↓
Table 1 Example description of tests
The ‘‘Result 1’’ row holds the number of bits set for each loop count per bit position. In the example above, bit 0 has a bit set only once in all three loops. Bit 1 is set twice. And so on.
The ‘‘Result 2’’ row holds the number of changes of the bits for each loop count compared to the previous loop count per bit position. For example, for bit 0, there is only one change from 0 to 1 between loop count 2 and 3. For bit 7, we have two changes: from 0 to 1 and from 1 to 0.
The graphs contains the same information as explained for figure 1↑.
The bit sum of loop count 3 is simply the sum of the set bits holds the number of set bits at the last iteration count to generate one random number. It is expected that this distribution follows a normal distribution closely, because only such a normal distribution is supports implies a rectangular distribution of the probability that each bit is equally likely to be picked when generating a random number output bit stream. Figure 6↓ contains the distribution of the bit sum for the generated random numbers in user space.
figure pics/userspace_distribution-bitsumsObservation-hist.png
Figure 6 Bit sum of last round of entropy collection loop user space
In addition, the kernel space distribution is given in figure 7↓ — they are almost identical and thus show the same behavior of the CPU Jitter random number generator
figure pics/kernel_distribution-bitsumsObservation-hist.png
Figure 7 Bit sum of last round of entropy collection loop kernel space
Please note that the black line in the graphs above is an approximation of the density of the measurements using the histogram. When more histogram bars would be used, the approximation would better fit the theoretical normal distribution curve given with the red dotted line. Thus, the difference between both lines is due to the way the graph is drawn and not seen in the actual numbers. This applies also to the bars of the histogram since they are left-aligned which means that on the left side of the diagram they overstep the black line and on the right side they are within the black line.
The distribution for ‘‘Result 1’’ of the sum of of these set bits is given in figure 8↓.
figure pics/userspace_distribution-bitsumsBitposition-hist.png
Figure 8 Bit sum of set bits per bit position in user space
Again, for the kernel we have an almost identical distribution shown in figure 9↓. And again, we conclude that the behavior of the CPU Jitter random number generator in both worlds is identical.
figure pics/kernel_distribution-bitsumsBitposition-hist.png
Figure 9 Bit sum of set bits per bit position in kernel space
A question about the shape of the distribution should be raised. One can have no clear expectations about the distribution other than it must show the following properties:
The distribution for ‘‘Result 2’’ of the sum of of these bit variations in user space is given in figure 10↓.
figure pics/userspace_distribution-deltas-hist.png
Figure 10 Bit sum of bit variations per bit position in user space
Just like above, the plot for the kernel space is given in figure 11↓.
figure pics/kernel_distribution-deltas-hist.png
Figure 11 Bit sum of bit variations per bit position in kernel space
Just like for the preceding diagrams, no material difference is obvious between kernel and user space. The shape of the distributions is similar to the one for the distribution of set bits. An expected distribution can also not be given apart from the aforementioned properties.

4.2 Statistical Properties of Random Number Bit Stream

The discussion of the entropy in chapter 5↓ tries to show that one bit of random number contains one bit of entropy. That is only possible if we have a rectangular distribution of the bits per bit position, i.e. each bit in the output bit stream has an equal probability to be set. The CPU Jitter random number block size is 64 bit. Thus when generating a random number, each of the 64 bits must have an equal chance to be selected by the random number generator. Therefore, when generating large amounts of random numbers and sum the bits per bit position, the resulting distribution must be rectangular. Figure 12↓ shows the distribution of the bit sums per bit position for a bit stream of 10,000,000 random numbers, i.e 640,000,000 bits.
figure pics/random-bitcount-hist.png
Figure 12 Distribution of bit count per bit position of RNG output
Figure 12↑ looks pretty rectangular. But can the picture be right with all its 64 vertical lines? We support the picture by printing the box plot in figure 13↓ that shows the variance when focusing on the upper end of the columns.
figure pics/random-bitcount-box.png
Figure 13 Box plot of variations in bit count per bit position of RNG output
The box plot shows the very narrow fluctuation around expected mean value of half of the count of random numbers produced, i.e. 5,000,000 in our case. Each bit of a random number has the 50% chance to be set in one random number. When looking at multiple random numbers, a bit still has the chance of being set in 50% of all random numbers. The fluctuation is very narrow considering the sample size visible on the scale of the ordinate of figure 12↑.
Thus, we conclude that the bit distribution of the random number generator allows the possibility to retain one bit of entropy per bit of random number.
This conclusion is supported by calculating more thorough statistical properties of the random number bit stream are assessed with the following tools:
The ent tool is given a bit stream consisting of 10,000,000 random numbers (i.e. 80,000,000 Bytes) with the following result where ent calculates the statistics when treating the random data as bit stream as well as byte stream:
$ dd if=/sys/kernel/debug/jitterentropy/seed of=random.out bs=8 count=10000000
​
# Byte stream
$ ent random.out 
Entropy = 7.999998 bits per byte.
​
Optimum compression would reduce the size
of this 80000000 byte file by 0 percent.
​
Chi square distribution for 80000000 samples is 272.04, and randomly
would exceed this value 25.00 percent of the times.
​
Arithmetic mean value of data bytes is 127.4907 (127.5 = random).
Monte Carlo value for Pi is 3.141600679 (error 0.00 percent).
Serial correlation coefficient is 0.000174 (totally uncorrelated = 0.0).
​
# Bit stream
$ ent -b random.out 
Entropy = 1.000000 bits per bit.
​
Optimum compression would reduce the size
of this 640000000 bit file by 0 percent.
​
Chi square distribution for 640000000 samples is 1.48, and randomly
would exceed this value 25.00 percent of the times.
​
Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.141600679 (error 0.00 percent).
Serial correlation coefficient is -0.000010 (totally uncorrelated = 0.0).
ent statistical test
During many re-runs of the ent test, most of the time, the Chi-Square test showed the test result of 50%, i.e. a perfect result — but even the showed 25% is absolutely in line with random bit pattern. Very similar results were obtained when executing the same test on:
In addition, an unlimited bit stream is generated and fed into dieharder. The test results are given with the files tests_userspace/dieharder-res.*. The result files demonstrate that all statistical properties tested by dieharder are covered appropriately.
The BSI Test Suite A shows no statistical weaknesses.
The test tools indicate that the bit stream complies with the properties of random numbers.

4.3 Anti-Tests

The statistical analysis given above indicates a good quality of the random number generator. To support that argument, an ‘‘anti’’ test is pursued to show that the quality is not provided by the post-processing of the time stamp data, but solely by the randomness of the time deltas. The post-processing therefore is only intended to transform the time deltas into a bit string with a random pattern and magnifying the timer entropy.
The following subsections outline different ‘‘anti’’ tests.

4.3.1 Static Increment of Time Stamp

The test is implemented by changing the function jent_get_nstime to maintain a simple value that is incremented by 23 every time a time stamp is requested. The value 23 is chosen as it is a prime. Yet, the increment is fully predictable and does not add any entropy.
Analyzing the output bit stream shows that the Chi-Square test of ent in both byte-wise and bit-wise output will result in the value of 0.01 / 100.00 which indicates a bit stream that is not random. This is readily clear, because the time delta calculation always returns the same value: 23.
Important remark: The mentioned test can only be conducted when the CPU Jitter random number generator initialization function of jent_entropy_init(3) is not called. This function implements a number of statistical tests of the time source. In case the time source would operate in static increments, the initialization function would detect this behavior and return an error.
If the CPU Jitter random number generator would be used with a cryptographic secure whitening function, the outlined ‘‘anti’’ test would not show any problems in the output stream. That means that a cryptographic whitening function would hide potential entropy source problems!

4.3.2 Pattern-based Increment of Time Stamp

Contrary to the static increment of the time stamp, this ‘‘anti’’ test describes a pattern-based increment of the time stamp. The time stamp is created by adding the sum of 23 and an additional increment between 1 and 4 using the following code:
static unsigned int pad = 0;
static __u64 tmp = 0;
static inline void jent_get_nstime(__u64 *out)
{
        tmp += 23;
        pad++;
        *out = (tmp + (pad & 0x3));
}
Pattern-based Increment of Time Stamp
The code adds 24 in the first loop, 25 in the second, 26 in the third, 27 in the fourth, again 24 in the fifth, and so forth.
Using such a pattern would again fail the ent test as the Chi-Square test is at 100 or 0.01 and the data stream can be compressed. Thus, such an time stamp increment would again be visible in the statistical analysis of this chapter.
In addition to the Chi-Square test, the measurements of the second derivation of the time stamp, the variations of time deltas, would present very strange patterns like, zero, or spikes, but no continuously falling graph as measured.

4.3.3 Disabling of System Features

The CPU jitter is based on properties of the system, such as caches. Some of these properties can be disabled in either user space or kernel space. The effect on such changes is measured in appendix 6.1↓.

5 Entropy Behavior

As the previous chapter covered the statistical properties of the CPU Jitter random number generator, this chapter provides the assessment of the entropy behavior. With this chapter, the second vital aspect of random number generators mentioned in chapter 1↑ is addressed.
The CPU Jitter random number generator does not maintain any entropy estimator. Nor does the random number generator tries to determine the entropy of the individual recorded time deltas that are fed into the entropy pool. There is only one basic rule that the CPU Jitter random number generator follows: upon completion of the entropy collection loop, the entropy pool contains 64 bit of entropy which are returned to the caller. That results in the basic conclusion of the random number bit stream returned from the CPU Jitter random number generator holding one bit of entropy per bit of random number.
Now you may say, that is a nice statement, but show me the numbers. The following sections will demonstrate the appropriateness of this statement.
Section 5.1↓ explains the base source of entropy for the CPU Jitter random number generator. This section explains how the root cause of entropy is visible in the CPU Jitter random number generator. With section 5.2↓, the explanation is given how the entropy that is present in the root cause, the CPU execution time jitter, is harvested, maintained through the processing of the random number generator and accumulated in the entropy pool. This section provides the information theoretical background to back up the statistical analyses given in chapter 4↑.
Before we start with the entropy discussion, please let us make one issue perfectly clear: the nature of entropy, which is an indication of the level of uncertainty present in a set of information, can per definition not be calculated. All what we can do is try to find arguments whether the entropy estimation the CPU Jitter random number generator applies is valid. Measurements are used to support that assessment. Moreover, the discussion must contain a worst case analysis which gives a lower boundary of the entropy assumed to be present in the random number bit stream extracted from the CPU Jitter random number generator.

5.1 Base Entropy Source

As outlined in chapter 3↑, the variations of the time delta is the source of entropy.
The design specification already indicates that multiple noise sources support the operation of the RNG. The following subsections discuss the individual noise sources.

5.1.1 Noise Sources Depicted

Unlike the graphs outlined in chapter 2↑ where two time stamps are invoked immediately after each other, the CPU Jitter random number generator places the folding loop between each time stamp gathering. That implies that the CPU jitter over the folding loop is measured and used as a basis for entropy.
Considering the fact that the CPU execution time jitter over the folding loop is the source of entropy, we can determine the following:
When viewing both findings together, we can conclude that the CPU jitter of the time deltas each folding loop shows must exceed 1 bit of entropy. Only this way we can ensure that the folded time delta value has one bit of entropy — see section 5.2.2↓ for an explanation why the folding operation retains the entropy present in the time delta up to one bit.
Tests are implemented that measure the variations of the time delta over an invocation of the folding loop. The tests are provided with the tests_userspace/timing/jitterentropy-foldtime.c test case for user space, and the stat-fold DebugFS file for testing the kernel space.
The design of the folding loop in section 3.1.3↑ explains that the number of folding loop iterations varies between 2⁰ and 2⁴ iterations. The testing of the entropy of the folding loop must identify the lower boundary and the upper boundary. The lower boundary is the minimum entropy the folding loop at least will have: this minimum entropy is the entropy observable over a fixed folding loop count. The test uses 2⁰ as the fixed folding loop count. On the other hand, the upper boundary of the entropy is set by allowing the folding loop count to float freely within the above mentioned range.
It is expected that the time stamps used to calculate the folding loop count is independent from each other. Therefore, the entropy observable with the testing of the upper boundary is expected to identify the entropy of the CPU execution time jitter. Nonetheless, if the reader questions the independence, the reader must conclude that the real entropy falls within the measured range between the lower and upper boundary.
Figure 14↓ presents the lower boundary of the folding loop executing in user space of the test system. The graph shows two peaks whereas the higher peak is centered around the execution time when the code is in the CPU cache. For the time when the code is not in the CPU cache — such as during context switches or during the initial invocations — the average execution time is larger with the center at the second peak. In addition, figure 16↓ provides the upper boundary of the folding loop. With the graph of the upper boundary, we see 16 spikes which are the spikes of the lower boundary scattered by the folding loop counter. If the folding loop counter is 2⁰, the variation of the time delta is centered around a lower value than the variations of a folding loop counter of 2¹ and so on. As the variations of the delta are smaller than the differences between the means of the different distributions, we observe the spikes.
The following graphs use the time deltas of 10,000,000 invocations of the folding loop. To eliminate outliers, time delta values above the number outlined in the graphs are simply cut off. That means, when using all values of the time delta variations, the calculated Shannon Entropy would be higher than listed in the legend of the graphs. This cutting off therefore is yet again driven by the consideration of determining the worst case.
The CPU Jitter RNG is based on two noise sources. The following graphs depict the folding noise source separately from the memory access noise source as the memory access noise source may be disabled during allocation. The following graphs only show the memory access noise source with the lowest set memory accesses to show the additional impact of just memory accesses. Graphs for varying memory accesses are not shown any more as they just show a more or less perfect rectangular distribution (i.e. the varying memory accesses now make the noise sources even better).
figure pics/userspace-foldtime-i7.O0-time-dist-delta-650-hist.png
Figure 14 Lower boundary of entropy over folding loop in user space
figure pics/userspace-foldtime-i7-memacc.O0-time-dist-delta-1300-hist.png
Figure 15 Lower boundary of entropy over folding loop and memory access in user space
figure pics/userspace-foldtime-i7.O0-time-dist-delta-8500-hist.png
Figure 16 Upper boundary of entropy over folding loop in user space
figure pics/userspace-foldtime-i7-memacc.O0-time-dist-delta-4500-hist.png
Figure 17 Upper boundary of entropy over folding loop and memory access in user space
In addition to the user space measurements, figures 18↓ and 19↓ present the lower and upper boundary of the folding loop execution time variations in kernel space on the same system. Again, the lower boundary is above 2 bits and the upper above 6 bits of Shannon Entropy.
figure pics/kernel-foldtime-time-dist-delta-330-hist.png
Figure 18 Lower boundary of entropy over folding loop in kernel space
figure pics/kernel-foldtime-time-dist-delta-5000-hist.png
Figure 19 Upper boundary of entropy over folding loop in kernel space
As this measurement is the basis of all entropy discussion, appendix F↓ shows the measurements for many different CPUs. All of these measurements show that the lower and upper boundaries are always much higher than the required one bit of entropy with exceptions. All tests are executed with optimized code as even a worst case assessment and sometimes with the non-optimized compilation to show the difference. For one CPU, section F.26↓ shows that the lower boundary is below 1 bit of Shannon Entropy. When re-performing the test with non-optimized code, as required in section 3.6↑, the lower boundary is well above 2 bits of entropy and therefore sufficient for the entropy collection loop. This test shows that disabling optimizations is vital for the CPU Jitter random number generator. In addition, when enabling the memory access, the timing variations are even much greater, sufficient for the RNG operation.
For the other CPUs whose lower entropy is below 1 bit and the jent_entropy_init function allows this CPU, statistical tests are performed to verify that no cycles are present. This implies that the entropy is closer to the upper boundary and therefore well above 1 bit. But again, when enabling memory accesses entropy rises way above 1 bit.
The reader should also consider that the measured Shannon Entropy is a conservative measurement as the test invokes the folding loop millions of times successively. This implies that for the entire duration of the test, caches, branch prediction units and similar are mostly filled with the test code and thus have hardly any impact on the variations of the time deltas. In addition, the test systems are kept idle as much as possible to limit the number of context switches which would have an impact on the cache hits. In real-life scenarios, the caches are typically filled with information that have an big impact on the jitter measurements and thus increase the entropy.
With these measurements, we can conclude that the CPU execution jitter over the folding loop is always more than double the entropy in the worst case than required. Thus, the measured entropy of the CPU execution time jitter that is the basis of the CPU Jitter random number generator is much higher than required.
The reader may now object and say that the measured values for the Shannon Entropy are not appropriate for the real entropy of the execution time jitter, because the observed values may present some patterns. Such patterns would imply that the real entropy is significantly lower than the calculated Shannon Entropy. This argument can easily be refuted by the statistical tests performed in chapter 4↑. If patterns would occur, some of the statistical tests would indicate problems. Specifically the Chi-Square test is very sensitive to any patterns. Moreover, the ‘‘anti’’ tests presented in section 4.3↑ explain that patterns are easily identifiable.
Fast Fourier Transformation
When applying a Fast Fourier Transformation transformation to the raw time delta input data before folding, an interesting observation can be made: there is no noticeable pattern in the raw time delta. Only one spike is visible: the expected spike at the zero point. Even when applying the FFT to the oldest or smallest CPUs, no big spikes are visible.
FFTs are calculated on the time deltas when setting the memory access loop numbers and the folding loop numbers to the minimum — i.e. without variations added by these different loop sizes. Regardless of the type of CPU the FFT is calculated for, either a perfect rectangular distribution with some bubbling is visible. The second FFT is applied to the time deltas of the normal operation. Again, an almost perfect rectangular distribution is seen.
In the following, example graphs for a small CPU of a MIPS 4Kec V6.8 is shown. Note, the bigger the CPUs the more perfect FFTs are seen. To make the graphs readable, the spike at the zero point is eliminated.
figure pics/userspace-foldtime-mips4Kec-V6.8-memaccess-memaccvar.O0.data-single-fft50-cutoff_1.png
Figure 20 FFT of minimum fold and memory access loop counts
figure pics/userspace-foldtime-mips4Kec-V6.8-memaccess-memaccvar.O0.data-varying-fft50-cutoff_1.png
Figure 21 FFT of normal operation of fold and memory access loop counts
Impact of Frequency Scaling and Power Management on Execution Jitter
When measuring the execution time jitter on a system with a number of processes active such as a system with the X11 environment and KDE active, one can identify that the absolute numbers of the execution time of a folding loop is higher at the beginning than throughout the measurement. The behavior of the jitter over time is therefore an interesting topic. The following graph plots the first 100,000 measurements [H]  [H] The measurements of the folding loop execution time were re-performed on the same system that is used for section 5.1↑. As the measurements were re-performed, the absolute numbers vary slightly to the ones in the previous section. where all measurements of time deltas above 600 were removed to make the graph more readable (i.e. the outliers are removed). It is interesting to see that the execution time has a downward trend that stabilizes after some 60,000 folding loops. The downward trend, however, is not continuously but occurs in steps. The cause for this behavior is the frequency scaling (Intel SpeedStep) and power management of the system. Over time, the CPU scales up to the maximum processing power. Regardless of the CPU processing power level, the most important aspect is that the oscillation within each step has an similar ‘‘width’’ of about 5 to 10 cycles. Therefore, regardless of the stepping of the execution time, the jitter is present with an equal amount! Thus, frequency scaling and power management does not alter the jitter.
figure pics/i7-variation_foldtime-100000.png
Figure 22 Variations of the execution time jitter over time when performing folding loop jitter measurements with Frequency Scaling / Power Management
When ‘‘zooming’’ in into the graph at different locations, as done below, the case is clear that the oscillation within each step remains at a similar level.
figure pics/i7-variation_foldtime-1000.png
Figure 23 Variations of the execution time jitter over time when performing folding loop jitter measurements with Frequency Scaling / Power Management — ‘‘zoomed in at measurements 1,000 - 3,000’’
figure pics/i7-variation_foldtime-42000.png
Figure 24 Variations of the execution time jitter over time when performing folding loop jitter measurements with Frequency Scaling / Power Management — ‘‘zoomed in at measurements 42,000 - 44,000’’
The constant variations support the case that the CPU execution time jitter is agnostic of the with frequency scaling and power management levels.
To compare the measurements with disabled frequency scaling and power management on the same system, the following graphs are prepared. These graphs show the same testing performed.
figure pics/i7-nopower-variation_foldtime-100000.png
Figure 25 Variations of the execution time jitter over time when performing folding loop jitter measurements with Frequency Scaling / Power Management disabled
figure pics/i7-nopower-variation_foldtime-1000.png
Figure 26 Variations of the execution time jitter over time when performing folding loop jitter measurements with Frequency Scaling / Power Management disabled — ‘‘zoomed in at measurements 1,000 - 3,000’’
figure pics/i7-nopower-variation_foldtime-42000.png
Figure 27 Variations of the execution time jitter over time when performing folding loop jitter measurements with Frequency Scaling / Power Management disabled — ‘‘zoomed in at measurements 42,000 - 44,000’’

5.2 Flow of Entropy

Entropy is a is a phenomenon that is typically characterized with the formula for the Shannon Entropy H
H =  − Ni = 1pilog2(pi)
where N is the number of samples, and pi is the probability of sample i. As the Shannon Entropy formula uses the logarithm at base 2, that formula results in a number of bits of entropy present in an observed sample.
Considering the logarithm in the Shannon Entropy formula one has to be careful on which operations can be applied to data believed to contain entropy to not lose it. The following operations are allowed with the following properties:
Any other operation, including partial overlapping concatenation of strings will diminish the entropy in the resulting string in ways that are not easily to be determined. These properties set the limit in which the CPU Jitter random number generator can process the time stamps into a random bit stream.
The graphs about the distribution of time deltas and their variations in section 5.1↑ include an indication of the Shannon Entropy which is based on the observed samples using the mentioned formula for the Shannon Entropy. In each case, the Shannon Entropy is way above 1 bit — a value which is fundamental to the following discussion.

5.2.1 First Operation: Memory Access

The memory access operation does not operate on the timing value or any of its derivative. Although the memory access is one of the causes for the timing variations and thus entropy, it does not have any impact on the maintenance of entropy throughout the RNG operation.

5.2.2 Second Operation: Folding of Time Delta

According to the implementation illustrated with figure 4↑, the operation after the CPU Jitter random number generator obtains a time delta is the folding operation. The list of allowed operations include the XOR operation. The folding is an XOR operation of the 64 1 bit slices of the 64 bit time stamp. The XOR operation does not diminish the entropy of the overall time stamp when considered as slices. The overall time delta is expected to have more than 1 bit of entropy according to figures in section 5.1↑. The string size after the folding is 1 bit and can thus not hold more than 1 bit of entropy.
To measure that entropy, the folding operation is closely analyzed with the test tests_userspace/timing/jitterentropy-folding.c. This test performs the folding operation as illustrated in the left hand side of figure 4↑, i.e. a time delta is created which is folded. The folded value is recorded and a folding operation is performed. The distribution of the bit value — an integer ranging from 0 to 1 -- resulting from the folding operation is recorded. Figure 28↓ shows the distribution of this test when measuring 10,000,000 invocations of that time stamp with the folding operation applied.
figure pics/userspace-folding-val-hist.png
Figure 28 Measurement of time folding operation
The distribution shows that both values have an equal chance of being selected. That implies that the Shannon Entropy is 1.0 as recorded in the legend of the diagram. We conclude that the folding operation will retain 1 bit of entropy provided that the input, i.e. the timing value holds 1 or more bits of entropy.
Note, the repetition of the folding loop is of no harm to the entropy as the same value is calculated during each folding loop execution.

5.2.3 Third Operation: Von-Neumann Unbias

The Von-Neumann unbias operation does not have an effect on the entropy of the source. The mathematical proof is given in the document A proposal for: Functionality classes for random number generators Version 2.0 by Werner Schindler section 5.4.1 issued by the German BSI.
The requirement on using the Von-Neumann unbias operation rests on the fact that the input to the unbias operation are two independent bits. The independence is established by the following facts:
  1. The bit value is determined by the delta value which is affected by the CPU execution jitter. That jitter is considered independent of the CPU operation before the time delta measurement,
  2. The delta value is calculated to the previous execution loop iteration. That means that two loop iterations generate deltas based on each individual loop. The delta of the first loop operation is neither part of the delta of the second loop (e.g. when the second delta would measure the time delta of both loop iterations), nor is the delta of the second loop iteration affected by the first operation based on the finding in bullet 1.

5.2.4 Fourth Operation: Entropy Pool Update

What is the next operation? Let us look again at figure 4↑. The next step after folding and unbiasing is the mixing of the folded value into the entropy pool by XORing it into the pool and rotating the pool.
The reader now may say, these are two distinct operations. However, in section 3.1.3↑ we already concluded that the XOR operation using 64 1 bit folded values together with the rotation by 1 bit of the entropy pool can mathematically be interpreted as a concatenation of 64 1 bit folded values into a 64 bit string. Thus, both operations are assessed as a concatenation of the individual folded bits into a 64 bit string followed by an XOR of that string into the entropy pool.
Going back to the above mentioned allowed operations with bit strings holding entropy, the concatenation operation adds the entropy of the individual bit strings that are concatenated. Thus, we conclude that the concatenation of 64 strings holding 1 bit of entropy will result in a bit string holding 64 bit of entropy.
When concatenating additional n 1 bit strings into the 64 bit entropy pool will not increase the entropy any more as the rotation operation rolls around the 64 bit value and starts at the beginning of that value again. When the entropy collection loop counter has a value that is not divisible by 64, the last bit string XORed into the entropy pool is less than 64 bits — for example, the counter has the value 260, the 4 last folded bits generated by the loop will form a 4 bit string that is XORed into the entropy pool. This last bit string naturally contains less than 64 bits of entropy -- the maximum entropy it contains is equal to the number of bits in that string. Considering the calculation rules for entropy mentioned above, XORing a string holding less entropy with a string with more entropy will not diminish the entropy of the latter. Thus, the XORing of the last bits into the entropy pool will have no effect on the entropy of the entropy pool.
There is a catch to the calculation: the math only applies when the individual observations, i.e. the individual 1 bit folded time delta values, are independent from each other. The argument supporting the independence of the individual time deltas comes back to the fundamental property of the CPU execution time jitter which has an unpredictable variation. Supportive is the finding that one entropy collection loop iteration, which generates a 1 bit folded value, has a much wider distribution compared to figure 1↑ — the reader may particularly consider the standard deviation. This variation in the execution time of the loop iteration therefore breaks any potentially present dependencies between adjacent loop counts and their time deltas. Note again, the time deltas we collect only need 1 bit of entropy. Looking at figure 29↓ which depicts the distribution of the execution time of one entropy loop iteration, we see that the variation and its included Shannon Entropy is high enough to support the conclusion of an independence between time deltas of adjacent loop iterations.
figure pics/userspace-loop-time-dist-delta-1200-hist.png
Figure 29 Distribution of execution time of one entropy collection loop iteration
Thus, we conclude that our entropy pool holds 64 bit of entropy after the conclusion of the mixing operation.

5.2.5 Fifth Operation: Stirring The Entropy Pool

The entropy pool is stirred with a constant value by XORing that value into the pool. The constant is believed to not contain any entropy. With the XOR operation, the entropy present in the entropy pool is therefore unchanged.

5.2.6 Sixth Operation: Generation of Output String

The last operation on the bit string holding entropy is the generation of the string of arbitrary length.
The generation of the output string is performed by concatenating random numbers of the size of 64 bit with each other until the resulting bit string matches the requested size. The individual random numbers are generated by independent invocations of the entropy collection loop.
Using concatenation and the conclusion from the preceding sections [I]  [I] The entropy pool contains 64 bit of entropy after the completion of the random number generation., the entropy in the resulting bit string is equal to the number of bits in that string.
The CPU Jitter random number generator operates on 64 bit blocks — the length of the entropy pool. When the requested bit string length is not divisible by 64 bits, the last chunk concatenated with the output bit stream is therefore less than 64 bits with the reminding bits not given to the caller — note, the caller is only able to specify the output size in bytes and thus in 8-bit chunks. Why is this operation not considered to diminish the entropy of the last chunk below its number of bits? To find the answer, let us go back how the entropy pool is constructed: one bit of folded timer value known to have one bit of entropy is added to the pool. When considering the entropy pool as 64 segments of individual bits, every individual bit still contains 1 bit of entropy, because the only operation each single bit is modified with, is XOR. Thus, every bit in the bit string of the entropy pool holds one bit of entropy. This ultimately implies that when taking a subset of the entropy pool, that subset still has as much entropy as the size of the subset in bits.

5.3 Reasons for Chosen Values

The reader now may ask why the time delta is folded into one bit and not into 2 or even 4 bits. Using larger bit strings would reduce the number of foldings and thus speed up the entropy collection. Measurements have shown that the speed of the CPU Jitter random number generator is cut by about 40% when using 4 bits versus 2 bits or 2 bits versus 1 bit. However, the entire argumentation for entropy is based on the entropy observed in the execution time jitter illustrated in section 5.1↑. The figures in this section support the conclusion that the Shannon Entropy measured in chapter 5.1↑ is the absolute worst case. To be on the save side, the lower boundary of the measured entropy shall always be significantly higher than the entropy required for the value returned by the folding operation.
Another consideration for the size of the folded time stamp is important: the implications of the last paragraph in section 5.2.6↑. The arguments and conclusions in that paragraph only apply when using a size of the folded time stamp that is less or equal 8 bits, i.e. one byte.

6 Assessment of Noise Sources

The quality of the CPU Jitter RNG rests on the assumption that the execution timing variations cannot be predicted. In addition, any influence that would diminish the entropy below one bit per timing measurement would also adversely affect the CPU Jitter RNG. Finally, any CPU manipulation that has the capability to introduce patterns also will affect the quality of the CPU Jitter RNG.
The following sections discuss the root cause of the noise sources of the CPU Jitter RNG. These tests shall help understand where the noise originates.

6.1 CPU Execution Timing Jitter

This analysis tries to answer the question:
Why is jitter visible in execution timing? How can it be quantifed?
The bare metal testing mechanism discussed in section 6.3↓ offers an analysis tool to gain more insights into the behavior of the CPU without interference by an operating system.
The following tests deviate slightly from all preceding tests by adding various cache flush strategies to observe any potential changes in the jitter measurements to gain an understanding of the CPU behavior. All tests are executed on the same hardware of an Intel Core i7 2nd generation system with the same operating system of Fedora 19 and the Linux kernel 3.11. The test system is configured to execute without power management and with Speedstep disabled to prevent additional interference. The baseline is the execution of the jitter measurement without any special operations. That baseline shows the following entropy numbers:
All of the following tests are performed independently of each other which means that one test does not contain the code changes of another test unless specifically noted. Therefore, the measurements can always be compared to the baseline measurements.
The testing enumerated below can all be re-performed on the bare metal tester outlined in section 6.3↓ by selecting test case 0 and enabling or disabling the discussed CPU mechanisms.

6.1.1 Serialization Instruction

Using serialization instructions, the execution jitter can be completely eliminated.
Additional similar tests are performed. The interpretation of the results, however, is not possible at this point. This means that a theory of the noise source cannot be formulated for the CPU execution timing jitter. Thus, it can only be concluded that noise is visible in normal operation, and even when attacking the RNG with the method causing the variations to vanish by using serialization instructions, the noise source remains operational.
The next sections show other attempts to eliminate the CPU execution timing variations which, however, are not successful.

6.1.2 Prevention of System Call And Branch Prediction Interference

The measurements generated in the following are performed by measuring the time duration of one folding loop and immediately using printf to write it to standard out. This write-out involves system calls and thus a modification of the caches, branch prediction, pipelines and TLB beyond the folding operation. The difference now is that instead of simply taking the measurement and writing it out, the test takes the measurements 1.000 times in one row and prints out the last value. That way, the first 999 loop iterations shall cancel out the impact of the preceding printf to the current measurement:
int i = 0;
for(i=0; i<1000; i++)
        duration = jent_fold_var_stat(NULL, 0);
for(i=0; i<1000; i++)
        duration_min = jent_fold_var_stat(NULL, 1);
printf("%llu %llu\n", duration, duration_min);

6.1.3 Flush of CPU Instruction Pipeline

The CPU instruction pipeline can be flushed with the MFENCE CPU instruction. The flush of the pipeline is performed right before the invocation of one measurement. The following code illustrates that:
#define mb() asm volatile("mfence":::"memory")
mb();
duration = jent_fold_var_stat(NULL, 0);
mb();
duration_min = jent_fold_var_stat(NULL, 1);
mb();

6.1.4 Flush of CPU Caches

The different CPU caches can be flushed with the WBINVD CPU instruction. The flush of the caches is performed right before the invocation of one measurement. The following code illustrates that:
#define wbinvd() asm volatile("wbinvd": : :"memory");
wbinvd();
duration = jent_fold_var_stat(NULL, 0);
wbinvd();
duration_min = jent_fold_var_stat(NULL, 1);
wbinvd();

6.1.5 Disabling of Preemption

The the preemption of the execution of the folding loop may imply that scheduling happens while the loop is executing. Inside the kernel, preemption can be disabled as follows:
preempt_disable();
duration = jent_fold_var_stat(NULL, 0);
duration_min = jent_fold_var_stat(NULL, 1);
preempt_enable();

6.1.6 TLB Flush

The flush of all (non-global) TLB entries is achieved by modifying the CR3 register. Inside the kernel, modification of the CR3 register is performed by reading and writing the register as follows:
native_write_cr3(native_read_cr3());
duration = jent_fold_var_stat(NULL, 0);
duration_min = jent_fold_var_stat(1);

6.1.7 Pinning of Entropy Collection to one CPU

The pinning of the process that performs the measurements to one CPU can be performed by creating a CPUSET:
mkdir /sys/fs/cgroup/cpuset/foldtime
/bin/echo 2 > cpuset.cpus
/bin/echo 0 > cpuset.mems
/bin/echo <PID_of_proc> > tasks
/bin/echo 1 > cpuset.mem_hardwall

6.1.8 Disabling of Frequency Scaling and Power Management

Modern CPUs allow frequency scaling, including the Intel SpeedStep technology, the Intel TurboBoost, the power management of the CPU and peripherals. These techniques are used to conserve power. As these mechanisms may add variations, all these mechanisms are deactivated using the BIOS on the test machine.
The lower boundary shows a significant drop in variations by around 0.5 bits of entropy. Yet, the drop does not affect the quality of the RNG. The cause for the drop in variations is the different patterns of variations as outlined in section 2↑.

6.1.9 Disabling of L1 and L2 Caches

The next test disables the L1 and L2 caches of the CPU and reperforms the measurement of the jitter again. As the disabling of the caches can only be completed in kernel space, the test was executed using the kernel module and reading the exported interface /sys/kernel/debug/jitterentropy/stat-fold.
To disable the caches, the following code was added to the initialization function of the kernel module:
__asm__("push   %rax\n\t"
        "mov    %cr0,%rax;\n\t"
        "or     $(1 << 30),%rax;\n\t"
        "mov    %rax,%cr0;\n\t"
        "wbinvd\n\t"
        "pop    %rax"
       );
In addition, the MTRR was disabled with the following command before the mentioned file was read:
echo "disable=00" >| /proc/mtrr
The disabling of the caches is really noticeable as the system gets really slow by orders of magnitudes! So, if you redo the testing, ensure that nothing executes, perform the tests on a console (not within X11) to get a system that is somewhat responsive to your commands.
The measurements of the variation contain a large number of outliers which are removed to calculate the entropy.
As the lower boundary is already that high and due to the problem of removing the outliers from the measurements of the upper boundary, it is questionable whether the value for the upper boundary is helpful as it surely overstates the worst case by a large degree.

6.1.10 Disabling of L1 and L2 Caches And Interrupts

The test documented in appendix 6.1.9↑ is re-performed with the following code modification in the kernel module function jent_debugfs_statfold_read:
local_irq_save(flags);
local_irq_disable();
duration = jent_fold_var_stat(NULL, 0);
duration_min = jent_fold_var_stat(NULL, 1);
local_irq_restore(flags);
local_irq_enable();
The code change disables all interrupts on the current CPU while executing the folding loop and the time measurement. After the measurement is completed for one round, it is re-enabled again.
Similarly to appendix 6.1.9↑, the variation measurement contains a large number of outliers. When removing them and limiting to the set of values to consider the worst case, the following lower entropy value is calculated. For the upper boundary value, the removal of the outliers is not really possible. Therefore, the given value may overstate the worst case significantly.

6.1.11 Disabling of All CPU Mechanisms

In the preceding subsections various CPU mechanisms were selectively disabled. This section now combines the disabling of all these mechanisms to analyze whether any combination of disabling CPU mechanisms changes the entropy statement.
The following table contains the test results. It starts with the test that combines the disabling and changing of all the CPU mechanisms outlined in the preceding sections [J]  [J] Note, the tests must be executed in kernel space where the CPU pinning capability using cgroups is not available.. Each following row allows some more CPU mechanisms as indicated. For each test, the upper and lower boundary of the Shannon entropy is calculated and listed. The tests were executed on an absolute quiet system that excludes X11 and any graphical user interface.
Disabled / Altered CPU mechanisms Upper Lower
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Preemption
TLB Flush
Disabling of Frequency Scaling / Power Mgt
Disabling of MTRR
Disabling of L1 and L2 caches
Disabling of Interrupts
8.36 6.39
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Preemption
TLB Flush
Disabling of Frequency Scaling / Power Mgt
Disabling of MTRR
Disabling of L1 and L2 caches
Disabling of Interrupts
N/A
4.28 [K]  [K] 
removal of outliers
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Preemption
TLB Flush
Disabling of Frequency Scaling / Power Mgt
Disabling of L1 and L2 caches
Disabling of Interrupts
5.30 1.61
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Preemption
TLB Flush
Disabling of Frequency Scaling / Power Mgt
Disabling of Interrupts
5.51 1.59
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Preemption
TLB Flush
Disabling of Frequency Scaling / Power Mgt
6.88 3.10
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
TLB Flush
Disabling of Frequency Scaling / Power Mgt
6.91 3.06
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Frequency Scaling / Power Mgt
6.90 2.65
Prevention of System Call interference
Flush of CPU instruction pipeline
Flush of CPU caches
Disabling of Frequency Scaling / Power Mgt
Disabling of Interrupts
5.19 1.46
Prevention of System Call interference
Flush of CPU instruction pipeline
Disabling of Frequency Scaling / Power Mgt
Disabling of Interrupts
5.94 2.28
Prevention of System Call interference
Disabling of Frequency Scaling / Power Mgt
Disabling of Interrupts
5.94 1.69
Prevention of System Call interference
Flush of CPU caches
Disabling of Frequency Scaling / Power Mgt
Disabling of Interrupts
5.87 1.89
Any CPU mechanism that has not been enabled as per table above will always enlarge the CPU execution jitter based on the analyses on the jitter measurements outlined in the previous sections. Therefore, these tests are disregarded.
The combination of disabled CPU mechanisms which diminishes the CPU execution jitter the most can be illustrated with the following code supplemented by disabling the power management and frequency scaling in the system BIOS:
local_irq_save(flags);
local_irq_disable();
wbinvd();
mb();
for(i=0; i<1000; i++)
        duration = jent_fold_var_stat(0);
wbinvd();
mb();
for(i=0; i<1000; i++)
        duration_min = jent_fold_var_stat(1);
wbinvd();
mb();
local_irq_restore(flags);
local_irq_enable();
It is interesting that a particular combination of disabling CPU mechanisms causes the jitter to drop more than to disable all CPU mechanisms. Moreover, measuring the effect of disabling each CPU mechanism in isolation — as done in the preceding subsections — shows no significant drop in jitter. A rationale for this behavior cannot be given at this point.
Nonetheless, the measurements of the lower boundary would still much more entropy than needed for the operation of the CPU Jitter RNG, let alone considering the upper boundary.
To support this conclusion, the above listed code was added to the function jent_debugfs_read_func to apply the modifications to the regular random number generation. In addition, the locking found in jent_drng_get_bytes_raw must be removed to prevent complaints by the kernel for this test. Also, the Von-Neumann unbiaser must be disabled. After compilation and insertion of the kernel module, the file /sys/kernel/debug/jitterentropy/seed is to be read. After the generation of 3MB of data, the smoke test using the ent tool is performed to check the statistical behavior. The result is, as expected, appropriate:
$ ent /tmp/out && ent -b /tmp/out 
Entropy = 7.999936 bits per byte.
​
Optimum compression would reduce the size
of this 2911816 byte file by 0 percent.
​
Chi square distribution for 2911816 samples is 257.43, and randomly
would exceed this value 44.55 percent of the times.
​
Arithmetic mean value of data bytes is 127.5091 (127.5 = random).
Monte Carlo value for Pi is 3.146313017 (error 0.15 percent).
Serial correlation coefficient is -0.000537 (totally uncorrelated = 0.0).
Entropy = 1.000000 bits per bit.
​
Optimum compression would reduce the size
of this 23314880 bit file by 0 percent.
​
Chi square distribution for 23314880 samples is 0.06, and randomly
would exceed this value 81.14 percent of the times.
​
Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.146391175 (error 0.15 percent).
Serial correlation coefficient is -0.000004 (totally uncorrelated = 0.0).

6.2 Memory Access Testing

The previous section covered the exclusive analysis of the noise source of the folding operation. This section in addition covers the exclusive assessment of the memory accesses and their impact on the timing variations.
The tests are all conducted with the test tool discussed in section 6.3↓.
For conducting the memory access testing, the test tool is used to review the impact of the following settings — all other settings are left unchanged at their default:

6.2.1 Noise Source Discussion

Before measurements are presented, a discussion of the noise source needs to be conducted. The following question must be answered:
Where does the noise come from?
In more technical terms, this question can be converted to: Why do memory accesses exhibit variations when measuring the execution time of those memory accesses?
The CPU of a system executes with the speed of the processor clock. That means, the processing of one CPU instruction directly depends on the processor clock when disregarding auxiliary processing, such as fetching the instruction from memory. Now, if the CPU instruction happens to require additional data, memory move instruction(s) must be used to move the data into CPU registers in order to operate on the data. However, when fetching data from memory, the CPU must synchronize itself with the access speed of the memory in order for the memory fetch/store to succeed. The CPU must introduce wait states, because CPU instruction for a memory fetch or store can only be performed if the CPU clock to execute the memory access instruction must be aligned with the clock the memory bus executes with.
Real life, however, is a bit more complicated with the addition of caches. The caches execute at much higher speed as the real memory. The following rule applies: L1 cache is the fastest, L2 is slower than L1, and L3 is again slower than L2. That means, the number of wait states to synchronize the CPU with L1 access windows is less compared to the number of wait states needed to synchronize the CPU with L2. And similarly, the number of wait states for L3 will be higher than the one for L2. Finally, the number of wait states for memory will be higher than the ones for L3.
Now, the noise source rests on the basis that the time duration of wait states is not predictable and observable.
To ensure that sufficient uncertainty is delivered to the random number generator sufficient wait states shall be covered. Memory accesses are cached with the typical caching strategy by the CPU to fill L1 first and try to obtain predictions from it, followed by L2, followed by L3 and finally followed by real memory accesses. As established, accesses to L1 will exhibit the smallest number of wait states.
After performing some initial analyses, it was concluded that the access attempts must be as large to overflow the L1 cache and ‘‘spill’’ over to L2 accesses. This is the reason for setting the size of the memory blocks, the number of memory blocks as well as the number of access loops for the testing. When considering all three variables, the total number of memory accesses must definitely fill L1 and use at least parts of L2. This ensures that the wait states the CPU incurs for accessing L2 deliver the main noise.

6.2.2 Noise Source Measurements

After understanding the root of the noise source, this subsections shows measurements of the behavior of the noise source to answer the question:
Can the memory access timing variations be quantified?
The testing sets the number of memory blocks to 64 and the size of one memory block to 32. The measurements are taken by varying the number of memory access loops between 1 loop iteration up to 256 iterations. The following graphs list the number of memory access loops on the abscesses. The random number generator hard codes the number of memory access loops to 128, which is marked with a green line in all the graphs.
Each of the graphs contain three lines:
The first measurement shows the execution time of memory accesses depending on the number of accesses. Figure 30↓ shows the execution time (i.e. the difference between an RDTSC invocation before the first memory access and an RDTSC invocation after the last memory access).
figure pics/memaccess-i7-2640M-delta.png
Figure 30 Average time duration for memory accesses
As expected, the graph shows a linear increasing of the time duration when memory is accessed. That means, each new memory access adds on average an equal amount of execution time.
The next measurement provided with figure 31↓ shows the standard deviation of the memory access time when increasing the number of memory accesses. With the standard deviation, the size of the timing variation is depicted, i.e. how ‘‘large’’ the variations of the memory access times fluctuate.
figure pics/memaccess-i7-2640M-sigma.png
Figure 31 Standard deviation of time duration for memory accesses
The graph with the standard deviation clearly shows that it is an almost linear increase of the standard deviation. This graph implies that the execution variations increase linearly with the number of memory accesses. The conclusion that can be drawn from this result is that each addition memory access attempt will increase the timing variations and thus the uncertainty of the memory access times.
To allow comparing the standard deviation values for the different memory access times, the variation coefficient can be used. The variation coefficient ‘‘normalizes’’ the standard deviation by dividing it with the mean value of the time measurement. Figure 32↓ shows the variation coefficient for the different memory access loop iterations.
figure pics/memaccess-i7-2640M-var-coeff.png
Figure 32 Variation coefficient of time duration for memory accesses
The graph nicely show the stabilization of the variation coefficient the higher the number of memory access loops. That stabilization, i.e. flattening of the curve, demonstrates that the the standard deviation in relation to the number of memory accesses increases almost perfectly linearly. The spikes at the lower number of memory access loops are due to higher impact of slight measuring errors that have are more visible at the smaller measurements. Where do the measuring errors come from? The code that invokes the RDTSC reading also is affected by the memory access variations. When invoking that code twice (for the beginning and ending time stamps) to measure only one or two memory accesses, error added by the time reading is relatively higher than when having several tens or hundreds of memory accesses.
Another enlightening statistic is the count of how many different timing values can be detected. The following graph presented in figure 33↓ now counts how many different memory access times can be detected throughout the measurement.
figure pics/memaccess-i7-2640M-slots.png
Figure 33 Number of different of time durations for memory accesses
Before interpreting the graph, please note that the test is set up to only measure up to 200 different values. When considering the increase of the standard deviation as outlined above, the result of the graph with the number of different time measurements is fully expected and understandable: the more memory access loops are performed, the more different memory access times are measured. This result is fully expected as the increase in the memory access time variation is the factor that increases the standard deviation. The conclusion can be drawn that the more memory accesses are performed, the stronger the timing variations of these accesses are.
As a final graph, figure 34↓ shows the calculation of the Shannon Entropy value for the timing measurement. As outlined in section 6.3↓, the Shannon Entropy values are subject to a calculation error that is up to one bit. As it cannot be identified how large the error is at a given number of memory access loops, the fluctuations in that graph must be interpreted accordingly — i.e. fluctuations within one bit must be considered to show about equal measurements.
figure pics/memaccess-i7-2640M-shannon-entropy.png
Figure 34 Shannon Entropy of time durations for memory accesses
The graph showing the Shannon Entropy values support all previous graphs and conclusions by showing that the variations increase with the increase of memory access loops.
The following final conclusion can be drawn from the measurements: After having sufficient memory accesses to completely fill L1 and draw from L2 the timing variations and therefore the uncertainty regarding the total time for memory accesses increases linearly with the number of measured memory accesses. This implies that memory accesses can be considered a good source for entropy.
Note, the tests were partially redone in light of the results in section 6.1↑. In this section, various CPU mechanisms were disabled with partially having a severe impact on the measured timing variations. When altering the CPU for the memory access timing measurement, the following findings based on figure 35↓ apply:
figure pics/memaccess-i7-2640M-cpumechanisms.png
Figure 35 Impact of Selectively Disabling of CPU Mechanisms on Memory Access Timing Variations
These additional results can be summarized as follows: Changing the CPU-internal mechanisms for code execution has no impact on the memory access timing variations. Changing how the CPU accesses memory by disabling the caches significantly increases variations and thus entropy.

6.2.3 Memory Accesses and Folding Loop

After independently discussing the memory access noise source, a view of the combined noise of memory accesses and the folding loop should be performed.
The impact of the memory accesses to the folding loop can be easily shown by depicting the execution time variations of just the folding loop and then the folding loop together with the memory accesses. The following graphs show the execution timing variations on a test system where just the folding loop is challenged to produce sufficient variations. But when adding the memory access variations, more than enough timing variations are recorded. First, the graphs for the lower boundary are shown.
figure pics/userspace-foldtime-xeon-e5504-time-dist-delta-250-hist.png
Figure 36 Folding loop without memory accesses on Intel Xeon E5504 — lower boundary
figure pics/userspace-foldtime-xeon-e5504-memacc.O0-time-dist-delta-2500-hist.png
Figure 37 Folding loop with memory accesses on Intel Xeon E5504 — lower boundary
And now the graphs for the upper boundary.
figure pics/userspace-foldtime-xeon-e5504-time-dist-delta-2500-hist.png
Figure 38 Folding loop without memory accesses on Intel Xeon E5504 — upper boundary
figure pics/userspace-foldtime-xeon-e5504-memacc.O0-time-dist-delta-8300-hist.png
Figure 39 Folding loop with memory accesses on Intel Xeon E5504 — upper boundary
Similar results are obtained for other systems. And these results speak for themselves: memory access provide a significant source for variations in addition to just the folding loop.

6.3 Noise Source Testing Without Operating System

The execution timing tests discussed in section 5.1↑ do not need any specific support from the operating system it runs on. Nonetheless, an operating system is needed to allow the code to be executed on the CPU, i.e. to boot an environment that can execute some code where the results can somehow be conveyed. Or not?
The Memtest86 tool is intended to be started directly from the boot loader without any operating system running. In essence, that tool is its own operating system with the sole purpose of executing some (memory) tests.
This tool now is used to allow running the CPU execution timing tests on bare metal (err, on bare silicon) where no operating system with any parallel threads or tasks can interfere. The Memtest86 tool is modified by removing all memory tests and adding a number of CPU execution timing variation tests. The code for the tool is provided in the directory test_baremetal/.
The goal with this testing is to eliminate the impact of the operating system by only and exclusively executing the test cases on the CPU. No scheduling or context switching will occur during the test execution. Even interrupts are not processed while the tests execute. The test is implemented by only printing the results after the completion of each test (i.e. not during the execution of a test). This approach further reduces the impact of the test framework on the measurements.
This measurement is the same measurement used for determining the lower boundary of entropy throughout this document. This means that here only the worst case is analyzed.
The following tests are implemented:
Test No Test Case Description
0 This is the baseline test by simply executing two time stamp reads immediately after each other. This test shall help finding the right CPU clearing and flushing operations that eliminate all jitter.
1 This test covers the memory access operation by measuring each memory access individually.
2 The entire entropy gathering operation is tested with this test. The entropy gathering covers the folding loop operation and the memory access operation. When setting a configuration flag, the memory access operation can be disabled to ensure that this test only measures the folding loop operation. The execution time of each loop is measured.
3 This test tries to measure some CPU characteristics by placing well-crafted CPU instructions between the time measurements. However, this test is considered irrelevant for the CPU Jitter RNG measurement.
4 This test implements an automated invocation of test 1. For test 1, the memory block size, the number of blocks and the number of access loop iterations can be defined. This test repeats test 1 after incrementing the counters. Test results are displayed and can be transported to a different machine using the Morse code.
For each test, the following options can be toggled:
For each test, the following numeric options can be set:
For each test case, up to 200 different execution timing values are recorded. These records form a histogram of the execution times. For each seen timing value, the number of occurrences is recorded. Figure 40↓ illustrates the results for the execution of the tests within a KVM instance [L]  [L] This figure shall only serve as illustration for the discussion and the explanation on how to interpret the results. The test results on a KVM cannot be interpreted as bare metal testing and are therefore not used for any conclusions..
figure pics/Screenshot_Jittertest.png
Figure 40 Execution of Bare-Metal CPU Execution Jitter Test on KVM
The test framework presents the following data:

7 Conclusion

For the conclusion, we need to get back to chapter 1↑ and consider the initial goals we have set out.
First, let us have a look at the general statistical and entropy requirements. Chapter 4↑ concludes that the statistical properties of the random number bit stream generated by the CPU Jitter random number generator meets all expectations. Chapter 5↑ explains the entropy behavior and concludes that the collected entropy by the CPU execution time jitter is much larger than the entropy pool. In addition, that section determines that the way data is mixed into the entropy pool does not diminish the gathered entropy. Therefore, this chapter concludes that one bit of output of the CPU Jitter random number generator holds one bit of information theoretical entropy.
It is noteworthy that the distribution of the sole time deltas without any processing in kernel space was measured to be spaced in increments of three in figure 2↑. This property, however, did not show any impact on the distribution of time deltas resulting from the processing of the CPU Jitter random number generator, particularly the root entropy discussed in section 5.1↑.
In addition to these general goals, chapter 1↑ lists a number of special goals. These goals are covered in the following list where the list number equals to the list number in chapter 1↑.
  1. On demand operation is ensured by the fact that the entropy collection loop is triggered when the caller requests data. Multiple loops are initiated sequentially if the requested bit string is larger than 64 bits.
  2. When compiled as a user space application, the CPU Jitter random number generator returns roughly 10 kBytes per second on an Intel Core i7 2nd generation with 2.7 GHz. In kernel space, the speed is roughly the same on the same test system. An Intel Atom Z530 system with 1.6 GHz produces output at about 2kBytes per second.
  3. The design description explains that the CPU Jitter random number generator always returns entropy. The entropy is generated when the request for it is received.
  4. In virtualized environments, the fundamental property of CPU execution time jitter is still present. Moreover, reading high-resolution timing information from the CPU is typically allowed by the virtualization environment and is not subject to virtualization itself [M]  [M] Although the discussion of virtualizing the CPU time stamp — e.g. the RDTSC x86 processor instruction — is conducted once in a while, no attempts to implement such virtualization has been made. The goal of such virtualization would only be to hide the existence of a hypervisor to a guest. But current virtualization environments do not pursue that goal.. Thus, a virtual environment can execute the CPU Jitter random number generator and deliver equal entropy. Due to the additional processing that typically is present to support reading the CPU time stamp, the CPU execution time jitter is expected to be higher and thus supportive to the CPU Jitter random number generator.
  5. The design description explains that the CPU Jitter random number generator always generates fresh entropy.
  6. The reference implementation work in both kernel and user space.
  7. The heart of the CPU execution time jitter collection is about 50 lines of C code using XOR, bit shifting, and a loop operation — functions jent_gen_entropy, jent_unbiased_bit, jent_memaccess, jent_fold_time and jent_loop_shuffle implement the core. That is a fairly small and easy to understand implementation. The rest of the code just reads the data out to the caller and ensures the entropy collection loop is invoked as often as the caller requests its operation.
  8. The CPU Jitter random number generator requires access to the CPU high resolution timer. In Linux, that is granted to unprivileged processes. Therefore, every process that is in need of entropy can instantiate its own copy of the CPU Jitter random number generator.
  9. The CPU Jitter random number generator generates a fresh 64 bit random number for each request. Even though it reuses the contents of the entropy pool that was used for previous random numbers, it makes no assumption whether entropy is present or not. Moreover, the data fed into the entropy pool is not deterministic. Thus, perfect forward and backward secrecy is considered to be maintained. The question is: How do we interpret the case when an observer gets access to the entropy pool when the first few iterations of one entropy collection loop operation are performed. Let us assume that an observer accesses the entropy pool after the first iteration of the loop completes. That iteration added some but very little entropy. The observer can ‘‘brute-force’’ the previous random number [N]  [N] The same applies for the new random number when the observer would access the entropy pool in the last few iterations before completion of entropy collection loop. by guessing the few added numbers. Therefore, how do we interpret forward and backward secrecy here? The definition we apply is the following: Perfect forward and backward secrecy is ensured when an observer gains access to one random number generated by the CPU Jitter random number generator. When an observer gains access to the entropy pool while a random number is generated, the perfect forward and backward secrecy applies to all random numbers before the last generated random number and after the currently generated random number, excluding the last and currently generated random number. To make sure that the last generated random number is not lingering in memory for too long, the code invokes the entropy collection loop one more time after completing the calling application’s request which changes the entropy pool such that in an event an observer can get access to the entropy pool, he gains nothing — this code is executed if the environment where the CPU Jitter random number generator is embedded into does not implement a secure memory handling.

7.1 Threat Scenario

After explaining the functionality of the CPU Jitter random number generator, the statistical properties and the assessment of the entropy, let us try to attack the mechanism. When attacking, we have to first determine our goals. The following list enumerates the goals:

7.1.1 Interleaving of Time Stamp Collection

The interleaving attack is illustrated with figure 41↓.
figure pics/attacker-interleave-worst-case.png
Figure 41 Attack process interleaving with victim process
The left process marked as ‘‘Jitter Collector Process’’ would be a process implementing the CPU Jitter random number generator and would run unobserved — i.e. the case when no attack is staged.
Now we can conceive a scenario where the victim process executing the entropy collection loop to gather entropy. An attacker process tries somehow to gain knowledge about the time stamps obtained by the victim process during the entropy collection loop. The attacker process may try to read also the time stamps of the system.
The worst case would be, if the attacker would be able to stage his time stamp readings such that he interleaves one-to-one with the victim process time stamp collection on the same CPU core. When executing the attacker on another CPU core, the interleaving mechanism would not work. That means that the very next time stamp gathering operation that is technically possible after the victim gathered one time stamp is read by the attacker. Then, the next time stamp is again read by the victim, and so forth. Figure 41↑ illustrates the time stamp collection of the victim and attacker in the middle part.
Now, the attacker tries to deduct the victim’s time stamps from his time stamp readings. To a large degree, he is able to determine them. But the miniscule variations between adjacent time stamp readings is the source of entropy for the CPU Jitter random number generator — marked as the time span between the time stamp read operations in figure 41↑.
Comparing the attack process readings with a fully unobserved process indicates that the attacking process can never determine the victim’s time stamps more accurate than the CPU execution time jitter our random number generator is based on. An attacking process is never be able to reduce the variations of the CPU execution time jitter.

A Availability of Source Code

The source code of the CPU Jitter entropy random number generator including the documentation is available at http://www.chronox.de/jent/jitterentropy-current.tar.bz2.
The source code for the test cases and R-project files to generate the graphs is available at the same web site.

B Linux Kernel Implementation

The document describes in chapter 1↑ the goals of the CPU Jitter random number generator. One of the goals is to provide individual instances to each consumer of entropy. One of the consumers are users inside the Linux kernel.
As described above, the output of the CPU Jitter random number generator is not intended to be used directly. Instead, the output shall be used as a seed for either a whitening function or a deterministic random number generator. The Linux kernel support provided with the CPU Jitter random number generator chooses the latter approach by using the ANSI X9.31 DRNG that is provided by the Linux kernel crypto API.
Figure 42↓ illustrates the connection between the entropy collection and the deterministic random number generators offered by the Linux kernel support. The interfaces at the lower part of the illustration indicate the Linux kernel crypto API names of the respective deterministic random number generators and the file names within /sys/kernel/debug, respectively.
Every deterministic random number generator instance is seeded with its own instance of the CPU Jitter random number generator. This implementation thus uses one of the design goals outlined in chapter 1↑, namely multiple, unrelated instantiations of the CPU Jitter random number generator.
figure pics/Kernel-DRNG.png
Figure 42 Using CPU Jitter RNG to seed ANSI X9.31 DRNGs
The offered deterministic random number generators have the following characteristics:
Currently, the kernel crypto API only implements a full reset of the deterministic random number generators. Therefore, the description given above is the plan after the kernel crypto API has been extended. Currently, when hitting the re-seed threshold, the deterministic random number generator is reset with 48 bytes of entropy from the CPU Jitter random number generator. The re-key value is currently not enforced.

B.1 Kernel Crypto API Interface

When compiling the source code with the configuration option CRYPTO_CPU_JITTERENTROPY_KCAPI, the kernel crypto API bonding code is compiled. That code registers the mentioned deterministic random number generators with the kernel crypto API. The bonding code provides a very thin wrapper around the management code for the provided random number generators.
The deterministic random number generators connected with as well as the direct access to the CPU Jitter random number generator are accessible using the following kernel crypto API names:
reg(jent_rng) Regular deterministic random number generator
strong(jent_rng) Strong deterministic random number generator
raw(jent_rng) Direct access to the CPU Jitter random number generator which returns unmodified data from the entropy collection loop.
When invoking a reset operation on one of the deterministic random number generator, the implementation performs the re-seed and re-key operations mentioned above on this deterministic random number generator irrespectively whether the thresholds are hit.
A reset on the raw(jent_rng) instance is a noop.

B.2 Kernel DebugFS Interface

The kernel DebugFS interface offered with the code is only intended for debugging and testing purposes. During regular operation, that code shall not be compiled as it allows access to the internals of the random number generation process.
The DebugFS interface is compiled when enabling the CRYPTO_CPU_JITTERENTROPY_DBG configuration option. The interface registers the following files within the directory of /sys/kernel/debug/jitterentropy:
stat The stat file offers statistical data about the regular and strong random number generators, in particular the total number of generated bytes and the number of re-seeds and re-keys.
stat-timer This file contains the statistical timer data for one entropy collection loop count: time delta, delta of time deltas and the entropy collection loop counter value. This data forms the basis of the discussion in chapter 4↑. Reading the file will return an error if the code is not compiled with CONFIG_CRYPTO_CPU_JITTERENTROPY_STAT.
stat-bits This file contains the three tests of the bit distribution for the graphs in chapter 4↑. Reading the file will return an error if the code is not compiled with CONFIG_CRYPTO_CPU_JITTERENTROPY_STAT.
stat-fold This file provides the information for the entropy tests of the folding loop as outlined in section 5.1↑. Reading the file will return an error if the code is not compiled with CONFIG_CRYPTO_CPU_JITTERENTROPY_STAT.
drng The drng file offers access to the regular deterministic random number generator to pull random number bit streams of arbitrary length. Multiple applications calling at the same time are supported due to locking.
strong-rng The strong-drng file offers access to the strong deterministic random number generator to pull random number bit streams of arbitrary length. Multiple applications calling at the same time are supported due to locking.
seed The seed file allows direct access to the CPU Jitter random number generator to pull random number bit streams of arbitrary lengths. Multiple applications calling at the same time are supported due to locking.
timer The timer file provides access to the time stamp kernel code discussed in section 2↑. Be careful when obtaining data for analysis out of this file: redirecting the output immediately into a file (even a file on a TmpFS) significantly enlarges the measurement and thus make it look having more entropy than it has.
collection_loop_count This file allows access to the entropy collection loop counter. As this counter value is considered to be a sensitive parameter, this file will return -1 unless the entire code is compiled with the CRYPTO_CPU_JITTERENTROPY_STAT flag. This flag is considered to be dangerous for normal operations as it allows access to sensitive data of the entropy pool that shall not be accessible in regular operation — if an observer can access that data, the CPU Jitter random number generator must be considered to deliver much diminished entropy. Nonetheless, this flag is needed to obtain the data that forms the basis of some graphs given above.

B.3 Integration with random.c

The CPU Jitter random number generator can also be integrated with the Linux /dev/random and /dev/urandom code base to serve as a new entropy source. The provided patch instantiates an independent copy of an entropy collector for each entropy pool. Entropy from the CPU Jitter random number generator is only obtained if the entropy estimator indicates that there is no entropy left in the entropy pool.
This implies that the currently available entropy sources have precedence. But in an environment with limited entropy from the default entropy sources, the CPU Jitter random number generator provides entropy that may prevent /dev/random from blocking.
The CPU Jitter random number generator is only activated, if jent_entropy_init passes.

B.4 Test Cases

The directory tests_kernel/kcapi-testmod/ contains a kernel module that tests whether the Linux Kernel crypto API integration works. It logs its information at the kernel log.
The testing of the interfaces exported by DebugFS can be performed manually on the command line by using the tool dd with the files seed, drng, strong-drng, and timer as dd allows you to set the block size precisely (unlike cat). The other files can be read using cat.

C Libgcrypt Implementation

Support to plug the CPU Jitter random number generator into libgcrypt is provided. The approach is to add the callback to the CPU Jitter random number generator into _gcry_rndlinux_gather_random. Thus, the CPU Jitter random number generator has the ability to run every time entropy is requested. Figure 43↓ illustrates how the CPU Jitter random number generator hooks into the libgcrypt seeding framework.
figure pics/libgcrypt.png
Figure 43 Use of CPU Jitter RNG by libgcrypt
The wrapper code around the CPU Jitter random number generator provided for libgcrypt holds the following instances of the random number generator. Note, the operation of the CPU Jitter random number generator is unchanged for each type. The goal of that approach shall ensure that each type of seed request is handled by a separate and independent instance of the CPU Jitter random number generator.
weak_entropy_collector Used when GCRY_WEAK_RANDOM random data is requested.
strong_entropy_collector Used when GCRY_STRONG_RANDOM random data is requested.
very_strong_entropy_collector Used when GCRY_VERY_STRONG_RANDOM random data is requested.
The CPU Jitter random number generator with its above mentioned instances is initialized when the caller uses GCRYCTL_SET_CPU_JITTER_ENTROPY with the flag 1. At this point, memory is allocated.
Only if the above mentioned instances are allocated, the wrapper code uses them! That means the callback from _gcry_rndlinux_gather_random to the CPU Jitter random number generator only returns random bytes when these instances are allocated. In turn, if they are not allocated, the normal processing of _gcry_rndlinux_gather_random is continued.
If the user wants to disable the use of the CPU Jitter random number generator, a call to GCRYCTL_SET_CPU_JITTER_ENTROPY with the flag 0 must be made. That call deallocates the random number generator instances.
The code is tested with the test application tests_userspace/libgcrypt/jent_test.c. When using strace on this application, one can see that after disabling the CPU Jitter random number generator, /dev/random is opened and data is read. That implies that the standard code for seeding is invoked.
See patches/README for details on how to apply the code to libgcrypt.

D OpenSSL Implementation

Code to link the CPU Jitter random number generator with OpenSSL is provided.
An implementation of the CPU Jitter random number generator encapsulated into different OpenSSL Engines is provided. The relationship of the different engines to the OpenSSL default random number generator is depicted in figure 44↓.
figure pics/openssl-DRNG.png
Figure 44 CPU Jitter random number generator seeding OpenSSL default DRNG
The following OpenSSL Engines are implemented:
jitterentropy-raw The jitterentropy-raw engine provides direct access to the CPU Jitter random number generator.
jitterentropy-drng The jitterentropy-drng engine generates random numbers out of the OpenSSL default deterministic random number generator. This DRNG is seeded with 16 bytes out of CPU Jitter random number generator every 2¹⁰ bytes. After 2²⁰ bytes, the DRNG is seeded and re-keyed, if applicable, with 48 bytes after a full reset of the DRNG. When the Note, the intention of this engine implementation is that it is registered as the default OpenSSL random number generator using ENGINE_set_default_RAND(3).
jitterentropy-strong The jitterentropy-strong engine is very similar to jitterentropy-drng except that the reseeding values are 16 bytes and 2¹⁰ bytes, respectively. The goal of the reseeding is that always information theoretical entropy is present in the DRNG [O]  [O] For the FIPS 140-2 ANSI X9.31 DRNG, this equals to one AES block. For the default SHA-1 based DRNG with a block size of 160 bits, the reseeding occurs a bit more frequent than necessary, though..
The different makefiles compile the different engine shared library. The test case tests_userspace/openssl/jitterentropy-eng-test.c shows the proper working of the respective CPU Jitter random number generator OpenSSL Engines.
In addition, a patch independent from the OpenSSL Engine support is provided that modifies the RAND_poll API call to seed the OpenSSL deterministic random number generator. The RAND_poll first tries to obtain entropy from the CPU Jitter random number generator. If that fails, e.g. the initialization call fails due to missing high-resolution timer support, the standard call procedure to open /dev/urandom or /dev/random or the EGD is performed.
Figure 45↓ illustrates the operation.
figure pics/openssl.png
Figure 45 Linking OpenSSL with CPU Jitter RNG
The code is tested with the test application tests_userspace/openssl/jent_test.c. When using strace on this application, one can see that after patching OpenSSL, /dev/urandom is not opened and thus not used. That implies that the CPU Jitter random number generator code for seeding is invoked.
See patches/README for details on how to apply the code to OpenSSL.

E Shared Library And Stand-Alone Daemon

The CPU Jitter random number generator can be compiled as a stand-alone shared library using the Makefile.shared makefile. The shared library exports the interfaces outlined in jitterentropy(3). After compilation, link with the shared library using the linker option -ljitterentropy.
To update the entropy in the input_pool behind the Linux /dev/random and /dev/urandom devices, the daemon jitterentropy-rngd is implemented. It polls on /dev/random. The kernel wakes up polling processes when the entropy counter falls below a threshold. In this case, the jitterentropy-rngd gathers 256 bytes of entropy and injects it into the input_pool. In addition, /proc/sys/kernel/random/entropy_avail is read in 5 second steps. If the value falls below 1024, jitterentropy-rngd gathers 256 bytes of entropy and injects it into the input_pool. The reason for polling entropy_avail is the fact that when random numbers are extracted from /dev/urandom, the poll on /dev/random is not triggered when the entropy estimator falls.

F Folding Loop Entropy Measurements

The following sections show the measurements explained in section 5.1↑ for different CPUs. These measurements all support the conclusion in section 5.1↑.
Note, all measurements in this sections only cover the CPU execution time jitter by disabling memory accesses. By showing that CPU execution time jitter is already sufficient, the additional measurement of of memory accesses will not change the sufficiency of the timing variations.
Note, all these tests were executed in user space. Although the compilation of the CPU Jitter random number generator will always be performed without optimizations, tests are executed with and without optimizations. Testing with optimizations considers again the worst case. If testing with optimizations shows too little entropy, the test is repeated without optimizations.
A large number of tests on different CPUs with different operating systems were executed. The following table summarizes the tests by enumerating the upper and lower boundary of the Shannon Entropy for all test systems. The table lists:
The table demonstrates that the CPU Jitter random number generator delivers high-quality entropy on:
CPU WS Opt Upper Lower OS Acc
Intel(R) Xeon(R) CPU X5660 @ 2.80GHz 64 y 6.01 2.78 Linux y
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 64 y 5.99 2.67 Linux y
Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz 64 y 7.09 2.58 Linux y
AMD Generic S 64 y 3.75 1.98 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 64 y 6.91 2.42 Linux y
Intel(R) Atom(TM) CPU S1240 @ 1.60GHz 64 y 6.59 2.10 Linux y
Intel(R) Core(TM)2 Quad CPU @ 2.66GHz 64 y 6.00 2.80 Linux y
Intel(R) Xeon(R) CPU X3470 @ 2.93GHz 64 y 5.36 1.74 Linux y
Intel(R) Xeon(R) CPU X5650 @ 2.67GHz 64 y 6.78 3.33 Linux y
Intel(R) Xeon(R) CPU X5472 @ 3.00GHz 64 y 5.91 2.90 Linux y
Intel(R) Xeon(R) CPU 5150 @ 2.66GHz 64 y 7.93 4.41 Linux y
Dual-Core AMD Opteron(tm) Processor 2218 64 y 5.19 1.95 Linux y
Quad-Core AMD Opteron(tm) Processor 2356 64 y 6.62 2.65 Linux y
Intel(R) Pentium(R) CPU 1403 @ 2.60GHz 64 y 7.21 1.89 Linux y
Genuine Intel(R) CPU @ 2.83GHz 64 y 6.86 2.66 Linux y
Intel(R) Xeon(R) CPU X5550 @ 2.67GHz 64 y 6.79 3.13 Linux y
Intel(R) Xeon(R) CPU X5660 @ 2.80GHz 64 y 9.25 5.46 Linux y
Intel(R) Xeon(R) CPU E7520 @ 1.87GHz 64 y 6.87 2.57 Linux y
Genuine Intel(R) CPU @ 2.93GHz 64 y 5.94 3.09 Linux y
Dual Core AMD Opteron(tm) Processor 890 64 y 5.04 1.58 Linux y
Dual-Core AMD Opteron(tm) Processor 8218 64 y 5.54 1.94 Linux y
Intel(R) Xeon(R) CPU E5540 @ 2.53GHz 64 y 6.97 3.42 Linux y
Dual Core AMD Opteron(tm) Processor 870 64 y 5.50 2.05 Linux y
Intel(R) Xeon(R) CPU L5520 @ 2.27GHz 64 y 7.51 3.49 Linux y
QEMU Virtual CPU version 0.15.1 64 y 10.26 6.97 Linux y
Dual Core AMD Opteron(tm) Processor 280 64 y 5.11 2.25 Linux y
Genuine Intel(R) CPU 000 @ 2.40GHz 64 y 6.40 3.08 Linux y
Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz 64 y 6.66 2.09 Linux y
Intel(R) Xeon(R) CPU E5504 @ 2.00GHz 64 y 5.12 1.36 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 64 y 7.09 2.88 Linux y
Intel(R) Xeon(R) CPU X5670 @ 2.93GHz 64 y 6.80 3.06 Linux y
AMD Opteron(tm) Processor 6172 64 y 5.68 2.30 Linux y
Dual Core AMD Opteron(tm) Processor 890 64 y 5.25 2.42 Linux y
Intel(R) Xeon(R) CPU X5376 @ 2.80GHz 64 y 6.14 3.16 Linux y
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 64 y 6.01 2.25 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 64 y 7.37 2.89 Linux y
Intel(R) Xeon(R) CPU X5376 @ 2.80GHz 64 y 6.62 3.59 Linux y
Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz 64 y 6.55 2.31 Linux y
Intel(R) Xeon(R) CPU L5520 @ 2.27GHz 64 y 6.59 2.74 Linux y
Intel(R) Xeon(R) CPU X5680 @ 3.33GHz 64 y 6.29 2.60 Linux y
Intel(R) Xeon(TM) CPU 3.40GHz 64 y 6.15 2.20 Linux y
Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz 64 y 7.30 2.73 Linux y
AMD Opteron(tm) Processor 6386 SE 64 y 6.35 2.23 Linux y
Genuine Intel(R) CPU 000 @ 2.27GHz 64 y 5.92 3.20 Linux y
AMD Opteron(tm) Processor 6172 64 y 6.60 2.89 Linux y
Intel(R) Xeon(R) CPU X5472 @ 3.00GHz 64 y 6.09 2.90 Linux y
AMD Generic S 64 y 7.05 2.87 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 64 y 6.88 2.52 Linux y
Genuine Intel(R) CPU @ 2.40GHz 64 y 4.98 1.54 Linux y
AMD Generic S 64 y 7.50 2.48 Linux y
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 64 y 6.92 3.02 Linux y
Genuine Intel(R) CPU @ 2.66GHz 64 y 5.38 2.96 Linux y
Intel(R) Xeon(TM) CPU 3.73GHz 64 y 5.24 1.96 Linux y
Genuine Intel(R) CPU @ 2.80GHz 64 y 7.31 2.65 Linux y
Intel(R) Xeon(R) CPU X7560 @ 2.27GHz 64 y 6.78 3.24 Linux y
AMD Generic S 64 y 3.65 0.60 Linux y
AMD Generic S 64 n 3.74 1.62 Linux y
Intel(R) Xeon(R) CPU W3530 @ 2.80GHz 64 y 6.40 2.28 Linux y
Intel(R) Xeon(R) CPU X5660 @ 2.80GHz 32 y 9.35 4.16 Linux y
Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32 y 9.19 3.91 Linux y
Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz 32 y 9.70 4.41 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 32 y 9.62 4.05 Linux y
Intel(R) Atom(TM) CPU S1240 @ 1.60GHz 32 y 8.27 3.93 Linux y
Intel(R) Core(TM)2 Quad CPU @ 2.66GHz 32 y 7.11 3.18 Linux y
Intel(R) Xeon(R) CPU X3470 @ 2.93GHz 32 y 8.64 3.81 Linux y
Intel(R) Xeon(R) CPU X5650 @ 2.67GHz 32 y 9.14 4.68 Linux y
Intel(R) Xeon(R) CPU X5472 @ 3.00GHz 32 y 8.13 3.46 Linux y
Genuine Intel(R) CPU @ 2.83GHz 32 y 8.97 3.34 Linux y
Intel(R) Xeon(R) CPU X5550 @ 2.67GHz 32 y 9.54 4.10 Linux y
Intel(R) Xeon(R) CPU X5660 @ 2.80GHz 32 y 11.65 7.66 Linux y
Intel(R) Xeon(R) CPU E7520 @ 1.87GHz 32 y 8.34 3.28 Linux y
Genuine Intel(R) CPU @ 2.93GHz 32 y 7.93 3.29 Linux y
Intel(R) Xeon(R) CPU E5345 @ 2.33GHz 32 y 7.33 3.14 Linux y
Intel(R) Xeon(R) CPU E5540 @ 2.53GHz 32 y 9.19 4.14 Linux y
QEMU Virtual CPU version 0.15.1 32 y 11.32 7.40 Linux y
Genuine Intel(R) CPU 000 @ 2.40GHz 32 y 10.21 5.19 Linux y
Intel(R) Xeon(R) CPU E5504 @ 2.00GHz 32 y 8.85 3.34 Linux y
Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz 32 y 8.34 2.64 Linux y
Intel(R) Xeon(R) CPU E5504 @ 2.00GHz 32 y 8.23 3.42 Linux y
Intel(R) Xeon(R) CPU X5670 @ 2.93GHz 32 y 9.78 5.03 Linux y
Intel(R) Xeon(R) CPU X5472 @ 3.00GHz 32 y 8.55 4.30 Linux y
Intel(R) Xeon(R) CPU X5376 @ 2.80GHz 32 y 7.73 3.42 Linux y
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 32 y 9.14 3.76 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 32 y 9.92 4.51 Linux y
Intel(R) Xeon(R) CPU X5376 @ 2.80GHz 32 y 8.04 3.54 Linux y
Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz 32 y 10.18 5.53 Linux y
Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz 32 y 9.76 3.84 Linux y
Intel(R) Xeon(R) CPU L5520 @ 2.27GHz 32 y 9.51 4.18 Linux y
Intel(R) Xeon(TM) CPU 3.40GHz 32 y 9.05 3.99 Linux y
Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz 32 y 10.09 4.97 Linux y
Genuine Intel(R) CPU 000 @ 2.27GHz 32 y 9.21 4.69 Linux y
Intel(R) Xeon(R) CPU X5472 @ 3.00GHz 32 y 9.22 4.35 Linux y
Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz 32 y 9.54 4.07 Linux y
Genuine Intel(R) CPU @ 2.40GHz 32 y 5.96 1.82 Linux y
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz 32 y 9.61 3.84 Linux y
Genuine Intel(R) CPU @ 2.66GHz 32 y 5.92 1.64 Linux y
Intel(R) Xeon(TM) CPU 3.73GHz 32 y 5.10 1.48 Linux y
Genuine Intel(R) CPU @ 2.80GHz 32 y 9.62 4.14 Linux y
Intel(R) Xeon(TM) CPU 3.20GHz 32 y 7.80 2.74 Linux y
Intel(R) Xeon(R) CPU X7560 @ 2.27GHz 32 y 9.24 3.98 Linux y
Intel(R) Xeon(R) CPU W3530 @ 2.80GHz 32 y 9.97 4.82 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8202-E4B 64 y 11.21 7.49 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8202-E4B 64 y 10.76 6.23 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2B 64 y 8.26 3.87 Linux y
POWER5+ (gs) - CHRP IBM,9113-550 64 y 7.81 3.10 Linux y
POWER5+ (gs) - CHRP IBM,9113-550 64 y 7.80 3.10 Linux y
POWER5+ (gs) - CHRP IBM,9117-570 64 y 6.63 2.54 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2B 64 y 8.31 4.05 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2D 64 y 8.95 5.79 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2D 64 y 10.66 6.38 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2D 64 y 8.24 4.94 Linux y
POWER5+ (gs) - CHRP IBM,9131-52A 64 y 10.99 6.68 Linux y
POWER5 (gr) - CHRP IBM,9123-705 64 y 5.97 1.60 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8202-E4B 32 y 11.59 7.51 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8202-E4B 32 y 10.82 5.66 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2B 32 y 9.32 4.12 Linux y
POWER5+ (gs) - CHRP IBM,9113-550 32 y 9.02 4.13 Linux y
POWER5+ (gs) - CHRP IBM,9113-550 32 y 9.54 5.06 Linux y
POWER5+ (gs) - CHRP IBM,9117-570 32 y 10.84 6.10 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2B 32 y 9.73 4.71 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2D 32 y 11.30 6.75 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2D 32 y 9.51 4.77 Linux y
POWER7 (architected), altivec supported - CHRP IBM,8231-E2D 32 y 9.19 4.21 Linux y
POWER5+ (gs) - CHRP IBM,9131-52A 32 y 11.91 6.89 Linux y
POWER5 (gr) - CHRP IBM,9123-705 32 y 7.27 2.60 Linux y
IBM/S390 - version = 00, identification = 098942, machine = 2097 31 y 5.88 1.25 Linux y
IBM/S390 - version = 00, identification = 0A8942, machine = 2097 31 y 5.48 0.85 Linux y
IBM/S390 - version = 00, identification = 0B8942, machine = 2097 31 y 5.53 0.87 Linux y
IBM/S390 - version = 00, identification = 0D8942, machine = 2097 31 y 5.34 0.83 Linux y
IBM/S390 - version = 00, identification = 038942, machine = 2097 31 y 5.73 0.91 Linux y
IBM/S390 - version = 00, identification = 158942, machine = 2097 31 y 5.40 0.83 Linux y
IBM/S390 - version = 00, identification = 168942, machine = 2097 31 y 5.76 1.15 Linux y
IBM/S390 - version = 00, identification = 178942, machine = 2097 31 y 5.29 0.81 Linux y
IBM/S390 - version = FF, identification = 058942, machine = 2097 31 n 6.40 1.77 Linux n
IBM/S390 - version = 00, identification = 098942, machine = 2097 64 y 5.94 2.04 Linux y
IBM/S390 - version = 00, identification = 0A8942, machine = 2097 64 y 5.74 1.98 Linux y
IBM/S390 - version = 00, identification = 0B8942, machine = 2097 64 y 5.73 1.97 Linux y
IBM/S390 - version = 00, identification = 0D8942, machine = 2097 64 y 5.48 1.95 Linux y
IBM/S390 - version = 00, identification = 038942, machine = 2097 64 y 5.81 1.99 Linux y
IBM/S390 - version = 00, identification = 158942, machine = 2097 64 y 5.82 1.97 Linux y
IBM/S390 - version = 00, identification = 168942, machine = 2097 64 y 5.85 2.02 Linux y
IBM/S390 - version = 00, identification = 178942, machine = 2097 64 y 5.64 1.96 Linux y
Intel(r) Itanium(r) Processor Family 64 y 10.30 6.47 Linux y
Intel(R) Itanium(R) Processor 9350 64 y 6.01 1.52 Linux y
Intel(R) Itanium(R) Processor 9350 64 y 8.90 4.82 Linux y
Intel(r) Itanium(r) 2 Processor 1.4GHz with 12M L3 Cache for 667MHz Platforms 64 y 5.23 1.36 Linux y
Intel(R) Itanium(R) Processor 9350 64 y 10.41 6.95 Linux y
Intel(R) Itanium(R) Processor 9350 64 y 6.40 1.92 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 18M L3 Cache for 533MHz Platforms 64 y 6.48 2.44 Linux y
Intel(R) Itanium(R) Processor 9350 64 y 11.57 8.77 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 18M L3 Cache for 533MHz Platforms 64 y 8.64 4.37 Linux y
Dual-Core Intel(R) Itanium(R) Processor 9050 64 y 11.65 5.91 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 18M L3 Cache for 533MHz Platforms 64 y 9.12 4.85 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 18M L3 Cache for 533MHz Platforms 64 y 10.53 7.82 Linux y
Dual-Core Intel(R) Itanium(R) 2 Processor 9140M 64 y 6.07 1.90 Linux y
Madison up to 9M cache 64 y 4.41 1.22 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 18M L3 Cache for 533MHz Platforms 64 y 10.44 7.55 Linux y
Dual-Core Intel(R) Itanium(R) Processor 9040 64 y 11.08 6.44 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 24M L3 Cache for 533MHz Platforms 64 y 5.86 2.20 Linux y
Intel(r) Itanium(r) Processor Family 64 y 12.04 8.74 Linux y
Dual-Core Intel(R) Itanium(R) 2 Processor 9140M 64 y 6.19 2.31 Linux y
Intel(r) Itanium(r) Processor Family 64 y 4.78 2.00 Linux y
Madison up to 9M cache 64 y 9.11 4.35 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 24M L3 Cache for 533MHz Platforms 64 y 6.29 2.62 Linux y
Itanium 2 64 y 4.59 1.11 Linux y
Intel(r) Itanium(r) Processor Family 64 y 5.42 1.46 Linux y
Intel(r) Itanium(r) 2 Processor 1.6GHz with 18M L3 Cache for 533MHz Platforms 64 y 6.01 1.69 Linux y
Dual-Core Intel(R) Itanium(R) Processor 9040 64 y 6.03 1.96 Linux y
AMD E350 32 y 5.95 1.56 Linux y
AMD E350 32 n 6.30 2.75 Linux y
AMD Phenom X6-1035T 64 y 6.67 2.90 Linux y
AMD Phenom X6-1035T 64 n 6.55 2.93 Linux y
AMD Semperon 3GHz 32 y 5.00 0.91 Linux y
AMD Semperon 3GHz 32 n 5.99 2.42 Linux y
ARMv6-rev7 32 y 5.08 1.13 Android n
ARMv6-rev7 32 n 5.81 1.58 Android n
ARMv7-rev1 — Samsung Galaxy SII 32 y 7.52 2.86 Android y
ARMv7-rev1 — Samsung Galaxy SII 32 n 6.56 2.71 Android y
ARMv7 rev 2 — LG Nexus 4.2 32 n N/A N/A Android n
ARMv7 rev 1 — HTC Desire Z 32 n N/A N/A Android n
AMD Athlon 4850e 64 y 3.98 2.30 Linux y
AMD Athlon 4850e 64 n 5.06 2.75 Linux y
AMD Athlon 7550 64 y 6.65 2.75 Linux y
AMD Athlon 7550 64 n 6.74 3.40 Linux y
Intel Atom Z530 32 y 5.38 2.43 Linux y
Intel Celeron 32 y 7.97 3.33 Linux y
Intel Core2Duo T5870 32 y 6.19 2.38 Linux y
Intel Core2 Q6600 32 y 6.28 2.71 Linux y
Intel Core2 Q6600 32 n 5.81 2.51 Linux y
Intel CoreDuo L2400 32 y 6.52 2.23 Linux y
Intel Core i5-2410M 64 y 7.25 3.75 Linux y
Intel Core i5-2410M 64 n 9.52 3.72 Linux y
Intel Core i5-2430M 64 y 8.10 3.55 Linux y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 y 5.68 2.35 FreeBSD 9.1 y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 y 6.32 1.47 NetBSD 6.0 y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 y 9.40 3.86 OpenIndiana 151 y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 y 6.93 2.85 Linux no X11 y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 y 6.79 2.97 Linux with X11 y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 n 8.48 3.27 Linux with X11 y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 y 2.11 1.72 Linux with X11 compiled with Clang y
Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz 64 n 9.13 3.52 Linux with X11 compiled with Clang y
Intel Core i7-Q720 64 y 7.84 2.98 Linux y
MIPS 24Kc v7.4 32 y 7.01 3.31 Linux y
MIPS 24Kc v4.12 32 y 7.54 3.95 Linux y
MIPS 24Kc v4.12 32 n 7.78 3.47 Linux y
MIPS 4Kec V4.8 32 n 1.83 0.46 Linux n
MIPS 4Kec V6.8 32 y 7.21 2.28 Linux y
MIPS 4Kec V6.8 32 n 10.60 5.02 Linux y
Pentium Celeron Mobile 733MHz 32 y 4.57 0.70 Linux y
Pentium Celeron Mobile 733MHz 32 n 4.38 0.49 Linux y
AMD Opteron 6128 32 y 8.03 3.90 Linux y
Pentium 4 3GHz 32 y 8.37 3.88 Linux y
Pentium 4 Mobile 32 y 5.40 1.56 Linux y
IBM System Z z10 (S390) on z/VM 64 y 5.55 1.99 Linux y
IBM System Z z10 (S390) on z/VM 64 n 6.37 2.11 Linux y
UltraSparc II ? y 12.45 7.74 OpenBSD y
UltraSparc II ? n 12.05 7.32 OpenBSD y
UltraSparc IIi 440MHz ? y 11.58 6.62 Linux y
UltraSparc IIi 440MHz ? n 11.82 6.86 Linux y
UltraSparc IIIi ? y 11.08 7.20 FreeBSD y
UltraSparc IIIi ? n 10.90 6.82 FreeBSD y
VIA Nano L2200 32 y 6.35 3.12 Linux y
Intel Xeon E5504 64 y 5.96 1.23 Linux y
Intel Xeon E5504 64 n 5.59 0.83 Linux y
IBM System P POWER7 using clock_gettime 64 y N/A N/A AIX 6.1 n
IBM System P POWER5 using read_real_time 64 y 7.59 2.85 AIX 6.1 y
IBM System P POWER5 using read_real_time 64 n 11.83 6.29 AIX 6.1 y
IBM System P POWER6 using read_real_time 64 y 6.71 2.86 AIX 6.1 y
IBM System P POWER6 using read_real_time 64 n 10.71 6.83 AIX 6.1 y
IBM System P POWER7 using read_real_time 64 y 7.29 3.75 AIX 6.1 y
IBM System P POWER7 using read_real_time 64 n 11.61 7.92 AIX 6.1 y
Apple MacBook Pro Intel Core 2 Duo 32 y 6.38 2.77 Apple MacOS 10.6 y
Apple MacBook Pro Intel Core 2 Duo 32 n 5.57 1.82 Apple MacOS 10.6 y
IBM System Z z10 using STCKE 64 n 9.38 5.28 z/OS 1R13 y
Intel Core Duo Solo T1300 32 n 5.13 1.32 Genode with NOVA Microkernel y
Intel Core Duo Solo T1300 32 n 5.04 2.04 Genode with Fiasco.OC Microkernel y
Intel Core Duo Solo T1300 23 n 5.45 2.09 Genode with Pistachio Microkernel y
ARM Exynos 5250 32 n 6.88 2.00 Genode with Fiasco.OC y
ARM Exynos 5250 32 n 3.21 0.00 Genode with BaseHW n
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ 64 y 2.00 2.00 Linux y
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ 64 n 3.88 1.47 Linux y
AMD Phenom(tm) II X4 925 Processor 64 y 1.08 1.00 Linux y
AMD Phenom(tm) II X4 925 Processor 64 n 5.63 2.58 Linux y
Intel Core2 Duo 32 n 8.22 3.53 Windows 7 with Visual Studio y
Apple PPC G5 Quad Core PPC970MP 64 y 5.27 0.94 Apple MacOS X 10.5.8 y
Apple PPC G5 Quad Core PPC970MP 64 n 9.41 2.44 Apple MacOS X 10.5.8 y
Apple PPC G5 Quad Core PPC970MP 32 y 6.22 0.96 Apple MacOS X 10.5.8 y
Apple PPC G5 Quad Core PPC970MP 32 n 6.87 2.34 Apple MacOS X 10.5.8 y
Intel(R) Core(TM) i7-3537U CPU 64 n 8.76 3.81 Linux y
The table shows that all test systems at least without optimizations have a lower boundary of more than 1 bit. This supports the quality assessment of the CPU Jitter random number generator.
In addition, the table shows an interesting yet common trend: the newer the CPU is, the more CPU execution time jitter is present.
Nonetheless, the following exceptions are visible:
To illustrate the tests more, a subset of the tests listed in the aforementioned table are assessed with graphs in the following subsections. The reader will find a number of tests from the table above again in the graphs below. Note, to make sure the illustrations show the worst case, the graphs truncate the outliers in the test results with a cutoff value. Thus, the values for the Shannon Entropy in the graphs below are all slightly lower than outlined in the table above. That cutoff value is chosen to focus on the values that occur the most. That usually discards higher values due to cache or TLB misses to illustrate a system without any load, supporting the analysis of a worst case scenario.

F.1 Intel Core i5 4200U

figure pics/userspace-foldtime-i5-4200U.O0-time-dist-delta-310-hist.png
Figure 46 Lower boundary of entropy over folding loop in user space on Intel Core i7 3537U
figure pics/userspace-foldtime-i5-4200U.O0-time-dist-delta-3400-hist.png
Figure 47 Upper boundary of entropy over folding loop in user space on Intel Core i7 3537U

F.2 Intel Core i7 3537U

figure pics/userspace-foldtime-i7-3537U.O0-time-dist-delta-300-hist.png
Figure 48 Lower boundary of entropy over folding loop in user space on Intel Core i7 3537U
figure pics/userspace-foldtime-i7-3537U.O0-time-dist-delta-3500-hist.png
Figure 49 Upper boundary of entropy over folding loop in user space on Intel Core i7 3537U

F.3 Intel Core i7 2620M compiled with Clang

The graphs shown in section 5.1↑ are based on the test cases compiled with GCC and executed on an Intel Core i7 2620M. Now, the same tests on user space executed in the same environment and compiled with the Clang compiler shows interesting results. Especially, the optimizations achieved with Clang are astounding. Yet, these optimizations are not of interest, as the CPU Jitter random number generator shall be compiled without optimizations as specified in the various Makefiles and in section 3.6↑. The non-optimized compilation is in line with the expected results.
figure pics/userspace-foldtime-i7-2620M-ubuntu-clang-time-dist-delta-175-hist.png
Figure 50 Lower boundary of entropy over folding loop in user space on Intel Core i7 2620M — with optimizations
figure pics/userspace-foldtime-i7-2620M-ubuntu-clang-time-dist-delta-200-hist.png
Figure 51 Upper boundary of entropy over folding loop in user space on Intel Core i7 2620M — with optimizations
figure pics/userspace-foldtime-i7-2620M-ubuntu-clang.O0-time-dist-delta-300-hist.png
Figure 52 Lower boundary of entropy over folding loop in user space on Intel Core i7 2620M — without optimizations
figure pics/userspace-foldtime-i7-2620M-ubuntu-clang.O0-time-dist-delta-3500-hist.png
Figure 53 Upper boundary of entropy over folding loop in user space on Intel Core i7 2620M — without optimizations

F.4 Intel Core i5 2430M

figure pics/userspace-foldtime-i5-2430M-time-dist-delta-410-hist.png
Figure 54 Lower boundary of entropy over folding loop in user space on Intel Core i5 2430M
figure pics/userspace-foldtime-i5-2430M-time-dist-delta-3000-hist.png
Figure 55 Upper boundary of entropy over folding loop in user space on Intel Core i5 2430M

F.5 Intel Core i5 2410M

The test system executes a Linux system with 32 bit word size even though the CPU is capable of executing 64 bit. This shall show that the word size has no impact on the observed CPU execution time jitter.
figure pics/userspace-foldtime-i5-2410M-time-dist-delta-500-hist.png
Figure 56 Lower boundary of entropy over folding loop in user space on Intel Core i5 2410M
figure pics/userspace-foldtime-i5-2410M-time-dist-delta-3000-hist.png
Figure 57 Upper boundary of entropy over folding loop in user space on Intel Core i5 2410M

F.6 Intel Core i7 Q720

figure pics/userspace-foldtime-i7q720-time-dist-delta-200-hist.png
Figure 58 Lower boundary of entropy over folding loop in user space on Intel Core i7 Q720
figure pics/userspace-foldtime-i7q720-time-dist-delta-3000-hist.png
Figure 59 Upper boundary of entropy over folding loop in user space on Intel Core i7 Q720

F.7 Intel Xeon E5504

figure pics/userspace-foldtime-xeon-e5504-time-dist-delta-250-hist.png
Figure 60 Lower boundary of entropy over folding loop in user space on Intel Xeon E5504 — with optimizations
figure pics/userspace-foldtime-xeon-e5504-time-dist-delta-2500-hist.png
Figure 61 Upper boundary of entropy over folding loop in user space on Intel Xeon E5504 — with optimizations
As the lower boundary is already close to the one bit limit, the same tests without optimizations are performed. The following graphs, however, show even a deterioration of the entropy measurement. The reader should bear in mind that the gathering of the data took less than 20 seconds. Therefore, a short-lived skew may have been observed.
figure pics/userspace-foldtime-xeon-e5504.O0-time-dist-delta-540-hist.png
Figure 62 Lower boundary of entropy over folding loop in user space on Intel Xeon E5504 — without optimizations
figure pics/userspace-foldtime-xeon-e5504.O0-time-dist-delta-7000-hist.png
Figure 63 Upper boundary of entropy over folding loop in user space on Intel Xeon E5504 — without optimizations
After re-performing the tests, the lower boundary Shannon entropy fluctuates around 1 bit. Therefore, an additional statistical test is performed on an otherwise quiet system to see whether the entropy is still above one bit, i.e. closer to the upper boundary of the Shannon entropy:
# byte wise
$ ./ent random.out 
Entropy = 7.999992 bits per byte.
​
Optimum compression would reduce the size
of this 22188032 byte file by 0 percent.
​
Chi square distribution for 22188032 samples is 260.80, and randomly
would exceed this value 50.00 percent of the times.
​
Arithmetic mean value of data bytes is 127.5139 (127.5 = random).
Monte Carlo value for Pi is 3.141290507 (error 0.01 percent).
Serial correlation coefficient is 0.000043 (totally uncorrelated = 0.0).
​
# bit wise
$ ./ent -b random.out 
Entropy = 1.000000 bits per bit.
​
Optimum compression would reduce the size
of this 178159616 bit file by 0 percent.
​
Chi square distribution for 178159616 samples is 1.39, and randomly
would exceed this value 50.00 percent of the times.
​
Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.141272175 (error 0.01 percent).
Serial correlation coefficient is -0.000187 (totally uncorrelated = 0.0).
The statistical tests shows that still no patterns are visible. Hence, the CPU is to be considered appropriate for entropy harvesting.

F.8 Intel Core 2 Quad Q6600

The tests were executed with OpenSUSE 12.3. The tests were executed without a graphical interface.
figure pics/userspace-foldtime-core2-q6600-time-dist-delta-260-hist.png
Figure 64 Lower boundary of entropy over folding loop in user space on Intel Core 2 Quad Q6600 — with optimizations
figure pics/userspace-foldtime-core2-q6600-time-dist-delta-2100-hist.png
Figure 65 Upper boundary of entropy over folding loop in user space on Intel Core 2 Quad Q6600 — with optimizations
The same test compiled without optimizations is shown in the following graphs.
figure pics/userspace-foldtime-core2-q6600.O0-time-dist-delta-730-hist.png
Figure 66 Lower boundary of entropy over folding loop in user space on Intel Core 2 Quad Q6600 — without optimizations
figure pics/userspace-foldtime-core2-q6600.O0-time-dist-delta-6000-hist.png
Figure 67 Upper boundary of entropy over folding loop in user space on Intel Core 2 Quad Q6600 — without optimizations

F.9 Intel Core 2 Duo T5870

figure pics/userspace-foldtime-core2duot5870-time-dist-delta-3000-hist.png
Figure 68 Lower boundary of entropy over folding loop in user space on Intel Core 2 Duo T5870
figure pics/userspace-foldtime-core2duot5870-time-dist-delta-6000-hist.png
Figure 69 Upper boundary of entropy over folding loop in user space on Intel Core 2 Duo T5870

F.10 Intel Core 2 Duo With Windows 7

figure pics/userspace-foldtime-core2duo-win7-time-dist-delta-1450-hist.png
Figure 70 Lower boundary of entropy over folding loop in user space on Intel Core 2 Duo with Windows — without optimizations
figure pics/userspace-foldtime-core2duo-win7-time-dist-delta-20000-hist.png
Figure 71 Upper boundary of entropy over folding loop in user space on Intel Core 2 Duo with Windows — without optimizations

F.11 Intel Core Duo L2400

figure pics/userspace-foldtime-coreduol2400-time-dist-delta-3500-hist.png
Figure 72 Lower boundary of entropy over folding loop in user space on Intel Core Duo L2400
figure pics/userspace-foldtime-coreduol2400-time-dist-delta-9000-hist.png
Figure 73 Upper boundary of entropy over folding loop in user space on Intel Core Duo L2400

F.12 Intel Core Duo Solo T1300 With NOVA Microkernel

figure pics/userspace-foldtime-nova-T60-T1300-time-dist-delta-2550-hist.png
Figure 74 Lower boundary of entropy over folding loop in user space on Intel Core Duo Solo T1300 with Nova Microkernel
figure pics/userspace-foldtime-nova-T60-T1300-time-dist-delta-30000-hist.png
Figure 75 Upper boundary of entropy over folding loop in user space on Intel Core Duo Solo T1300 with Nova Microkernel

F.13 Intel Core Duo Solo T1300 With Fiasco.OC Microkernel

figure pics/userspace-foldtime-foc_x86_32_T60_T1300-time-dist-delta-1550-hist.png
Figure 76 Lower boundary of entropy over folding loop in user space on Intel Core Duo Solo T1300 with Fiasco.OC Microkernel
figure pics/userspace-foldtime-foc_x86_32_T60_T1300-time-dist-delta-18000-hist.png
Figure 77 Upper boundary of entropy over folding loop in user space on Intel Core Duo Solo T1300 with Fiasco.OC Microkernel

F.14 Intel Core Duo Solo T1300 With Pistachio Microkernel

figure pics/userspace-foldtime-pistachio_x86_32_T60_T1300-time-dist-delta-2550-hist.png
Figure 78 Lower boundary of entropy over folding loop in user space on Intel Core Duo Solo T1300 with Pistachio Microkernel
figure pics/userspace-foldtime-pistachio_x86_32_T60_T1300-time-dist-delta-30000-hist.png
Figure 79 Upper boundary of entropy over folding loop in user space on Intel Core Duo Solo T1300 with Pistachio Microkernel

F.15 Intel Atom Z530

figure pics/userspace-foldtime-atomz530-time-dist-delta-4700-hist.png
Figure 80 Lower boundary of entropy over folding loop in user space on Intel Atom Z530
figure pics/userspace-foldtime-atomz530-time-dist-delta-21000-hist.png
Figure 81 Upper boundary of entropy over folding loop in user space on Intel Atom Z530

F.16 Intel Core 2 Duo on Apple MacBook Pro

The following test was executed on an Apple MacBook Pro executing MacOS X 10.8. The compilation was done using GCC.
figure pics/userspace-foldtime-osx10.8-time-dist-delta-300-hist.png
Figure 82 Lower boundary of entropy over folding loop in user space on Apple MacBook Pro — with optimizations
figure pics/userspace-foldtime-osx10.8-time-dist-delta-3000-hist.png
Figure 83 Upper boundary of entropy over folding loop in user space on Apple MacBook Pro — with optimizations

F.17 Intel Celeron

figure pics/userspace-foldtime-celeron-time-dist-delta-1370-hist.png
Figure 84 Lower boundary of entropy over folding loop in user space on Intel Celeron
figure pics/userspace-foldtime-celeron-time-dist-delta-6000-hist.png
Figure 85 Upper boundary of entropy over folding loop in user space on Intel Celeron

F.18 Intel Mobile Celeron 733 MHz

The test was compiled without optimizations.
figure pics/userspace-foldtime-mobileceleron.O0-time-dist-delta-4100-hist.png
Figure 86 Lower boundary of entropy over folding loop in user space on Intel Mobile Celeron
figure pics/userspace-foldtime-mobileceleron.O0-time-dist-delta-36000-hist.png
Figure 87 Upper boundary of entropy over folding loop in user space on Intel Mobile Celeron
The tests results indicate that the CPU execution time jitter is insufficient for entropy collection. However, as this CPU is considered so old, the code has not been changed to catch this CPU behavior.

F.19 Intel Pentium P4 3GHz

figure pics/userspace-foldtime-p4-3GHz-time-dist-delta-950-hist.png
Figure 88 Lower boundary of entropy over folding loop in user space on Intel Pentium P4 3GHz
figure pics/userspace-foldtime-p4-3GHz-time-dist-delta-5000-hist.png
Figure 89 Upper boundary of entropy over folding loop in user space on Intel Pentium P4 3GHz

F.20 Intel Pentium P4 Mobile

figure pics/userspace-foldtime-p4mobile-time-dist-delta-7300-hist.png
Figure 90 Lower boundary of entropy over folding loop in user space on Intel Pentium P4 Mobile
figure pics/userspace-foldtime-p4mobile-time-dist-delta-25000-hist.png
Figure 91 Upper boundary of entropy over folding loop in user space on Intel Pentium P4 Mobile
As the Shannon entropy values and the distribution may suggest that patterns are present, the following statistical test is executed, showing that no patterns are visible. Therefore, the CPU execution time jitter is considered to be appropriate for harvesting entropy.
# byte-wise
$ ent random-p4mobile.data 
Entropy = 7.999998 bits per byte.
​
Optimum compression would reduce the size
of this 108351488 byte file by 0 percent.
​
Chi square distribution for 108351488 samples is 230.51, and randomly
would exceed this value 75.00 percent of the times.
​
Arithmetic mean value of data bytes is 127.5061 (127.5 = random).
Monte Carlo value for Pi is 3.141212037 (error 0.01 percent).
Serial correlation coefficient is 0.000075 (totally uncorrelated = 0.0).
​
# bit-wise
$ ent -b random-p4mobile.data 
Entropy = 1.000000 bits per bit.
​
Optimum compression would reduce the size
of this 866811904 bit file by 0 percent.
​
Chi square distribution for 866811904 samples is 0.12, and randomly
would exceed this value 50.00 percent of the times.
​
Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.141212037 (error 0.01 percent).
Serial correlation coefficient is 0.000023 (totally uncorrelated = 0.0).

F.21 AMD Opteron 6128

figure pics/userspace-foldtime-opteron6128-time-dist-delta-1050-hist.png
Figure 92 Lower boundary of entropy over folding loop in user space on AMD Opteron 6128
figure pics/userspace-foldtime-opteron6128-time-dist-delta-6000-hist.png
Figure 93 Upper boundary of entropy over folding loop in user space on AMD Opteron 6128

F.22 AMD Phenom II X6 1035T

figure pics/userspace-foldtime-amdphenom-X6-1035T-time-dist-delta-240-hist.png
Figure 94 Lower boundary of entropy over folding loop in user space on AMD Phenom II X6 1035T
figure pics/userspace-foldtime-amdphenom-X6-1035T-time-dist-delta-1900-hist.png
Figure 95 Upper boundary of entropy over folding loop in user space on AMD Phenom II X6 1035T

F.23 AMD Athlon 7550

figure pics/userspace-foldtime-athlon-7550-time-dist-delta-670-hist.png
Figure 96 Lower boundary of entropy over folding loop in user space on AMD Athlon 7550 — with optimizations
figure pics/userspace-foldtime-athlon-7550-time-dist-delta-4500-hist.png
Figure 97 Upper boundary of entropy over folding loop in user space on AMD Athlon 7550 — with optimizations
The same tests without optimizations show the following results:
figure pics/userspace-foldtime-athlon-7550.O0-time-dist-delta-940-hist.png
Figure 98 Lower boundary of entropy over folding loop in user space on AMD Athlon 7550 — without optimizations
figure pics/userspace-foldtime-athlon-7550.O0-time-dist-delta-8500-hist.png
Figure 99 Upper boundary of entropy over folding loop in user space on AMD Athlon 7550 — without optimizations

F.24 AMD Athlon 4850e

figure pics/userspace-foldtime-athlon-4850e-time-dist-delta-3650-hist.png
Figure 100 Lower boundary of entropy over folding loop in user space on AMD Athlon 4850e — with optimizations
figure pics/userspace-foldtime-athlon-4850e-time-dist-delta-6500-hist.png
Figure 101 Upper boundary of entropy over folding loop in user space on AMD Athlon 4850e — with optimizations
The optimized tests show very low variations, albeit the graphs are slightly misleading as one histogram bar contain up to three consecutive values. The same tests without optimizations show the following results:
figure pics/userspace-foldtime-athlon-4850e.O0-time-dist-delta-6500-hist.png
Figure 102 Lower boundary of entropy over folding loop in user space on AMD Athlon 4850e — without optimizations
figure pics/userspace-foldtime-athlon-4850e.O0-time-dist-delta-15000-hist.png
Figure 103 Upper boundary of entropy over folding loop in user space on AMD Athlon 4850e — without optimizations

F.25 AMD E350

figure pics/userspace-foldtime-amde350-time-dist-delta-520-hist.png
Figure 104 Lower boundary of entropy over folding loop in user space on AMD E350 -- with optimizations
figure pics/userspace-foldtime-amde350-time-dist-delta-4500-hist.png
Figure 105 Upper boundary of entropy over folding loop in user space on AMD E350 -- with optimizations
The same tests without optimizations show the following results:
figure pics/userspace-foldtime-amde350.O0-time-dist-delta-1150-hist.png
Figure 106 Lower boundary of entropy over folding loop in user space on AMD E350 -- without optimizations
figure pics/userspace-foldtime-amde350.O0-time-dist-delta-13000-hist.png
Figure 107 Upper boundary of entropy over folding loop in user space on AMD E350 -- without optimizations

F.26 AMD Semperon 3GHz

figure pics/userspace-foldtime-amdsemperon-3GHz-time-dist-delta-770-hist.png
Figure 108 Lower boundary of entropy over folding loop in user space on AMD Semperon 3GHz — with optimizations
figure pics/userspace-foldtime-amdsemperon-3GHz-time-dist-delta-6000-hist.png
Figure 109 Upper boundary of entropy over folding loop in user space on AMD Semperon 3GHz — with optimizations
The graphs show that lower boundary of the CPU timing jitter over the folding operation contains less than 1 bit of entropy. This statement has the potential to significantly weaken the quality of this random number generator. However, as outlined at the beginning, the tests are performed with optimized code. Optimization streamline the code such that the resulting binary does not fully follow the strict C code sequences, but the compiler tries to ensure that the result is always the same. As the quality of the CPU Jitter random number generator depends on the timing behavior and not so much on the result of computations, optimizations are not important for the random number generator.
To ensure that optimizations are the problem of the insufficient execution jitter as the execution time is made too fast on fast, but less complex CPUs, the same test without optimizations is invoked again. To compile code without optimizations, either use no special flags or -O0.
figure pics/userspace-foldtime-amdsemperon-3GHz.O0-time-dist-delta-1140-hist.png
Figure 110 Lower boundary of entropy over folding loop in user space on AMD Semperon 3GHz — without optimizations
figure pics/userspace-foldtime-amdsemperon-3GHz.O0-time-dist-delta-11500-hist.png
Figure 111 Upper boundary of entropy over folding loop in user space on AMD Semperon 3GHz — without optimizations
Looking at the Shannon Entropy value a conclusion can be drawn that without optimizations, the required CPU execution timing jitter is present with a sufficient rate.
To support the conclusion that the compilation of non-optimized code on an AMD Semperon still produces high-quality random numbers, the statistical testing with ent is performed:
# byte-wise
$ ent entropy.amdsemperon.O0 
Entropy = 7.999968 bits per byte.
​
Optimum compression would reduce the size
of this 5525504 byte file by 0 percent.
​
Chi square distribution for 5525504 samples is 247.18, and randomly
would exceed this value 50.00 percent of the times.
​
Arithmetic mean value of data bytes is 127.5252 (127.5 = random).
Monte Carlo value for Pi is 3.142346161 (error 0.02 percent).
Serial correlation coefficient is -0.000274 (totally uncorrelated = 0.0).
​
# bit-wise
$ ent -b entropy.amdsemperon.O0 
Entropy = 1.000000 bits per bit.
​
Optimum compression would reduce the size
of this 44204032 bit file by 0 percent.
​
Chi square distribution for 44204032 samples is 2.54, and randomly
would exceed this value 25.00 percent of the times.
​
Arithmetic mean value of data bits is 0.5001 (0.5 = random).
Monte Carlo value for Pi is 3.142346161 (error 0.02 percent).
Serial correlation coefficient is 0.000115 (totally uncorrelated = 0.0).
Statistical Properties of Non-Optimized Code on AMD Semperon

F.27 VIA Nano L2200

figure pics/userspace-foldtime-viananol2200-time-dist-delta-1910-hist.png
Figure 112 Lower boundary of entropy over folding loop in user space on VIA Nano L2200
figure pics/userspace-foldtime-viananol2200-time-dist-delta-11000-hist.png
Figure 113 Upper boundary of entropy over folding loop in user space on VIA Nano L2200

F.28 MIPS 24KC v7.4

This test was executed on a Ubiquiti NanoStation M5 providing a Freifunk router. The OS on that system is modified with OpenWRT.
figure pics/userspace-foldtime-mips24Kc-v7.4-time-dist-delta-4600-hist.png
Figure 114 Lower boundary of entropy over folding loop in user space on MIPS 24KC v7.4
figure pics/userspace-foldtime-mips24Kc-v7.4-time-dist-delta-38000-hist.png
Figure 115 Upper boundary of entropy over folding loop in user space on MIPS 24KC v7.4

F.29 MIPS 24KC v4.12 Ikanos Fusiv Core

This test without optimization was executed on a Fritz Box 7390 providing a home router. The OS on that Fritz Box is modified with Freetz.
figure pics/userspace-foldtime-MIPS24KcV4.12.O0-time-dist-delta-7500-hist.png
Figure 116 Lower boundary of entropy over folding loop in user space without optimization on MIPS 24KC v4.2
figure pics/userspace-foldtime-MIPS24KcV4.12.O0-time-dist-delta-85000-hist.png
Figure 117 Upper boundary of entropy over folding loop in user space without optimization on MIPS 24KC v7.4

F.30 MIPS 4KEc V6.8

This test was executed on a Fritz Box 7270 providing a home router. The OS on that Fritz Box is modified with Freetz.
figure pics/userspace-foldtime-mips4Kec-V6.8-time-dist-delta-5100-hist.png
Figure 118 Lower boundary of entropy over folding loop in user space on MIPS 4KEc V6.8 — with optimizations
figure pics/userspace-foldtime-mips4Kec-V6.8-time-dist-delta-40100-hist.png
Figure 119 Upper boundary of entropy over folding loop in user space on MIPS 4KEc V6.8 — with optimizations
Just to give the reader an impression how the optimization changes the measurement, here is the same CPU measurement without optimization.
figure pics/userspace-foldtime-mips4Kec-V6.8.O0-time-dist-delta-12000-hist.png
Figure 120 Lower boundary of entropy over folding loop in user space on MIPS 4KEc V6.8 — without optimizations
figure pics/userspace-foldtime-mips4Kec-V6.8.O0-time-dist-delta-140000-hist.png
Figure 121 Upper boundary of entropy over folding loop in user space on MIPS 4KEc V6.8 — without optimizations
To support the conclusion that this CPU is an appropriate source for entropy, the following statistical analysis was performed. This analysis shows the suitability of the gathered data:
# byte-wise
$ ent random.fritz7270.out
Entropy = 7.999997 bits per byte.
​
Optimum compression would reduce the size
of this 58580496 byte file by 0 percent.
​
Chi square distribution for 58580496 samples is 241.25, and randomly
would exceed this value 50.00 percent of the times.
​
Arithmetic mean value of data bytes is 127.4989 (127.5 = random).
Monte Carlo value for Pi is 3.141300135 (error 0.01 percent).
Serial correlation coefficient is 0.000129 (totally uncorrelated = 0.0).
​
# bit-wise
$ ent -b random.fritz7270.out
Entropy = 1.000000 bits per bit.
​
Optimum compression would reduce the size
of this 468643968 bit file by 0 percent.
​
Chi square distribution for 468643968 samples is 0.01, and randomly
would exceed this value 75.00 percent of the times.
​
Arithmetic mean value of data bits is 0.5000 (0.5 = random).
Monte Carlo value for Pi is 3.141300135 (error 0.01 percent).
Serial correlation coefficient is 0.000042 (totally uncorrelated = 0.0).

F.31 MIPS 4KEc V4.8

This test was executed on a T-Com Speedport W701V providing a home router. The OS on that Speedport router is modified with Freetz.
The following measurements were conducted without optimizations.
figure pics/userspace-foldtime-mips4Kec-V4.8.O0-time-dist-delta-30000-hist.png
Figure 122 Lower boundary of entropy over folding loop in user space on MIPS 4KEc V4.8 — without optimizations
figure pics/userspace-foldtime-mips4Kec-V4.8.O0-time-dist-delta-120000-hist.png
Figure 123 Upper boundary of entropy over folding loop in user space on MIPS 4KEc V4.8 — without optimizations
The graph indicates and the measurement of the Shannon Entropy concludes that the CPU execution time jitter on this CPU is too small. The reason for that is the coarse counter which increments in multiples of 1,000. However, the good news is that on this CPU, the jent_entropy_init(3) call would fail, informing the caller about to not use the CPU Jitter random number generator.

F.32 ARM Exynos 5250 with Fiasco.OC Microkernel

The following measurements were conducted without optimizations.
figure pics/userspace-foldtime-foc-Exynos5250-time-dist-delta-1600-hist.png
Figure 124 Lower boundary of entropy over folding loop in user space on Exynos 5250 with Fiasco.OC — without optimizations
figure pics/userspace-foldtime-foc-Exynos5250-time-dist-delta-22100-hist.png
Figure 125 Upper boundary of entropy over folding loop in user space on Exynos 5250 with Fiasco.OC — without optimizations

F.33 ARMv7 rev 1 — Samsung Galaxy S2

This test was executed on a Samsung Galaxy S2 with CyanogenMod 9 (Android 4.1). The clocksource (which is the backend to the clock_gettime(CLOCK_REALTIME) system call) is:
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
mct-frc
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
mct-frc
figure pics/userspace-foldtime-armv7-rev1-time-dist-delta-3500-hist.png
Figure 126 Lower boundary of entropy over folding loop in user space on ARMv7 rev 1 — with optimizations
figure pics/userspace-foldtime-armv7-rev1-time-dist-delta-23000-hist.png
Figure 127 Upper boundary of entropy over folding loop in user space on ARMv7 rev 1 — with optimizations
Although the tests with optimizations already indicate sufficient entropy, the same test without optimizations is conducted with the following results just to illustrate the appropriateness of the entropy source.
figure pics/userspace-foldtime-armv7-rev1.O0-time-dist-delta-5200-hist.png
Figure 128 Lower boundary of entropy over folding loop in user space on ARMv7 rev 1 — without optimizations
figure pics/userspace-foldtime-armv7-rev1.O0-time-dist-delta-45000-hist.png
Figure 129 Upper boundary of entropy over folding loop in user space on ARMv7 rev 1 — without optimizations

F.34 ARMv7 rev 2 — LG Nexus 4.2

The tests on a LG Nexus 4 with a ARMv7 rev 2 CPU and a Linux kernel 3.4 yielded in the following result: most of the time delta values were zero. This implies that the jent_entropy_init(3) call rejects this system.

F.35 ARMv7 rev 0 — Samsung Galaxy S4

The tests on a Samsung Galaxy S4 with a ARMv7 rev 0 CPU and a Linux kernel 3.4 yielded in the following result: most of the time delta values were zero. This implies that the jent_entropy_init(3) call rejects this system.
The clocksources tested are: gp_timer. When enabling the dg_timer clocksource, the system reboots after 10 seconds and can therefore not be used.

F.36 ARMv7 rev 1 — HTC Desire Z

The tests on a HTC Desire Z with a ARMv7 rev 1 CPU and a Linux kernel 2.6.32 shows the following results: most of the time delta values were zero. This implies that the jent_entropy_init(3) call rejects this system.
It is unclear whether the coarse timing values is due to an old hardware timer or whether the Linux system does not support the readout of the high-resolution timer. The Linux kernel up to version 3.2 did not implement a callback for random_get_entropy on an ARM platform. Therefore it is possible that the old Android version on the smartphone did not implement access to a potentially available high-resolution timer.

F.37 ARMv6 rev 7

This test was executed on a Raspberry Pi with Linux kernel 3.6.
figure pics/userspace-foldtime-ARMv6-rev7-time-dist-delta-5000-hist.png
Figure 130 Lower boundary of entropy over folding loop in user space on ARMv6 rev 7 — with optimizations
figure pics/userspace-foldtime-ARMv6-rev7-time-dist-delta-26000-hist.png
Figure 131 Upper boundary of entropy over folding loop in user space on ARMv6 rev 7 — with optimizations
Just to give the reader an impression how the optimization changes the measurement and to demonstrate that without optimizations the entropy is higher, here is the same CPU measurement without optimization.
figure pics/userspace-foldtime-ARMv6-rev7.O0-time-dist-delta-9000-hist.png
Figure 132 Lower boundary of entropy over folding loop in user space on ARMv6 rev 7 — without optimizations
figure pics/userspace-foldtime-ARMv6-rev7.O0-time-dist-delta-90000-hist.png
Figure 133 Upper boundary of entropy over folding loop in user space on ARMv6 rev 7 — without optimizations
Even though the Shannon Entropy would allow the CPU execution jitter to be used, the timer is too coarse and jent_entropy_init does not pass this CPU, because the timer is too coarse as it increments in steps of 1,000. As it is visible in the graphs with the lower boundaries, the majority of entropy comes from two values for the time delta.

F.38 IBM POWER7 with AIX 6.1

The tests with AIX 6.1 executing within an IBM LPAR on an IBM POWER7 CPU and the code obtaining the timer with the POSIX function call of clock_gettime yielded the following result: the time delta values were all divisible by 1,000. This implies that the jent_entropy_init(3) call rejects this system.
However, AIX provides a second function to read a high-resolution timer: read_real_time. When using this function — which is the case as defined in jitterentropy-base-user.h — returns a time stamp that is fine grained with the following graphs.
figure pics/userspace-foldtime-p7-aix61-time-dist-delta-115-hist.png
Figure 134 Lower boundary of entropy over folding loop in user space on AIX 6.1 and POWER7 — with optimizations
figure pics/userspace-foldtime-p7-aix61-time-dist-delta-1300-hist.png
Figure 135 Upper boundary of entropy over folding loop in user space on AIX 6.1 and POWER7 — with optimizations
Just to give the reader an impression how the optimization changes the measurement and to demonstrate that without optimizations the entropy is higher, here is the same CPU measurement without optimization.
figure pics/userspace-foldtime-p7-aix61.O0-time-dist-delta-1160-hist.png
Figure 136 Lower boundary of entropy over folding loop in user space on AIX 6.1 and POWER7 — without optimizations
figure pics/userspace-foldtime-p7-aix61.O0-time-dist-delta-15000-hist.png
Figure 137 Upper boundary of entropy over folding loop in user space on AIX 6.1 and POWER7 — without optimizations

F.39 IBM POWER7 with Linux

figure pics/userspace-foldtime-power7-time-dist-delta-180-hist.png
Figure 138 Lower boundary of entropy over folding loop in user space on IBM POWER7
figure pics/userspace-foldtime-power7-time-dist-delta-1900-hist.png
Figure 139 Upper boundary of entropy over folding loop in user space on IBM POWER7

F.40 IBM POWER5 with Linux

figure pics/userspace-foldtime-power5-time-dist-delta-370-hist.png
Figure 140 Lower boundary of entropy over folding loop in user space on IBM POWER5
figure pics/userspace-foldtime-power5-time-dist-delta-3000-hist.png
Figure 141 Upper boundary of entropy over folding loop in user space on IBM POWER5

F.41 Apple G5 QuadCore PPC 970MP

The following tests were executed on an Apple G5 Quad-Core with PowerPC 970MP CPU and Apple MacOS X 10.5.8. The word size of the tests is 32 bit. However, the 64 bit word size show similar results as indicated in the table at the beginning of this appendix.
The tests show that the optimized compilation contain insufficient jitter as the Shannon Entropy of the lower boundary is less than 1 bit, whereas the non-optimized compilation shows a sufficient jitter.
figure pics/userspace-foldtime-ppc-G5-macos-10.5.8-time-dist-delta-20-hist.png
Figure 142 Lower boundary of entropy over folding loop in user space on Apple G5 with optimizations
figure pics/userspace-foldtime-ppc-G5-macos-10.5.8-time-dist-delta-300-hist.png
Figure 143 Upper boundary of entropy over folding loop in user space on on Apple G5 with optimizations
figure pics/userspace-foldtime-ppc-G5-macos-10.5.8.O0-time-dist-delta-165-hist.png
Figure 144 Lower boundary of entropy over folding loop in user space on Apple G5 without optimizations
figure pics/userspace-foldtime-ppc-G5-macos-10.5.8.O0-time-dist-delta-2500-hist.png
Figure 145 Upper boundary of entropy over folding loop in user space on on Apple G5 with optimizations

F.42 SUN UltraSparc IIIi

This test was executed on a SUN UltraSparc-IIIi with FreeBSD 9.1.
The graph for the lower boundary is impressive: it looks like a normal distribution!
figure pics/userspace-foldtime-ultrasparcIIIi-time-dist-delta-2800-hist.png
Figure 146 Lower boundary of entropy over folding loop in user space on SUN UltraSparc IIIi — with optimizations
figure pics/userspace-foldtime-ultrasparcIIIi-time-dist-delta-9000-hist.png
Figure 147 Upper boundary of entropy over folding loop in user space on SUN UltraSparc IIIi — with optimizations
Just to give the reader an impression how the optimization changes the measurement and to demonstrate that without optimizations the entropy is higher, here is the same CPU measurement without optimization.
figure pics/userspace-foldtime-ultrasparcIIIi.O0-time-dist-delta-4500-hist.png
Figure 148 Lower boundary of entropy over folding loop in user space on SUN UltraSparc IIIi — without optimizations
figure pics/userspace-foldtime-ultrasparcIIIi.O0-time-dist-delta-30000-hist.png
Figure 149 Upper boundary of entropy over folding loop in user space on SUN UltraSparc IIIi — without optimizations

F.43 SUN UltraSparc II

This test was executed on a SUN UltraSparc-II with OpenBSD 5.3.
figure pics/userspace-foldtime-ultrasparcII-time-dist-delta-8200-hist.png
Figure 150 Lower boundary of entropy over folding loop in user space on SUN UltraSparc II — with optimizations
figure pics/userspace-foldtime-ultrasparcII-time-dist-delta-21000-hist.png
Figure 151 Upper boundary of entropy over folding loop in user space on SUN UltraSparc II — with optimizations
The same tests were executed without optimizations. No material differences in the distribution are present.

F.44 SUN UltraSparc IIi (Sabre)

This test was executed on a SUN UltraSparc-IIi with 440MHz executing Gentoo 2.1. The operating system is configured without a graphical interface.
figure pics/userspace-foldtime-ultrasparc-IIi-440MHz-gentoo-time-dist-delta-6700-hist.png
Figure 152 Lower boundary of entropy over folding loop in user space on SUN UltraSparc IIi — with optimizations
figure pics/userspace-foldtime-ultrasparc-IIi-440MHz-gentoo-time-dist-delta-25000-hist.png
Figure 153 Upper boundary of entropy over folding loop in user space on SUN UltraSparc-IIi — with optimizations
Just to give the reader an impression how the optimization changes the measurement and to demonstrate that without optimizations the entropy is higher, here is the same CPU measurement without optimization.
figure pics/userspace-foldtime-ultrasparc-IIi-440MHz-gentoo.O0-time-dist-delta-13700-hist.png
Figure 154 Lower boundary of entropy over folding loop in user space on SUN UltraSparc IIi — without optimizations
figure pics/userspace-foldtime-ultrasparc-IIi-440MHz-gentoo.O0-time-dist-delta-130000-hist.png
Figure 155 Upper boundary of entropy over folding loop in user space on SUN UltraSparc IIi — without optimizations

F.45 IBM System Z z10

This test was executed on an IBM System Z EC12 offering a z/VM virtual machine with one CPU to a z/OS 1R13.
figure pics/userspace-foldtime-zos.O0-time-dist-delta-1100000000000000-hist.png
Figure 156 Lower boundary of entropy over folding loop in user space on z/OS — without optimizations
Due to the values being so large, the value for the Shannon Entropy is truncated in the graph above. That value is the same as printed in the able at the beginning of this appendix: 5.28 bits.
figure pics/userspace-foldtime-zos.O0-time-dist-delta-15000000000000000-hist.png
Figure 157 Upper boundary of entropy over folding loop in user space on zOS — without optimizations
Due to the values being so large, the value for the Shannon Entropy is truncated in the graph above. That value is the same as printed in the able at the beginning of this appendix: 9.38 bits.
The graphs show a similar pattern to other systems in other sections. However, the timer values have much larger numbers than for any other test. The reason is the way how the timer is read and the fact that System Z mainframes have a timer that has a much higher resolution. The timer is read with the STCKE processor instruction such that the moving 64 low bits of the 128 bit value returned by STCKE are returned. That means that the lowest 7 bits are cut off which do not contain timer information as specified in the processor manual.

F.46 IBM System Z z10

This test was executed on an IBM System Z z10 offering a z/VM virtual machine with one CPU to a SLES11 SP2.

F.46.1 64 bit Word Size

The processor is identified as ‘‘version = FF, identification = 058942, machine = 2097’’.
figure pics/userspace-foldtime-s390-zvm-time-dist-delta-230-hist.png
Figure 158 Lower boundary of entropy over folding loop in user space on IBM System Z z10 — 64 bit with optimizations
figure pics/userspace-foldtime-s390-zvm-time-dist-delta-2200-hist.png
Figure 159 Upper boundary of entropy over folding loop in user space on IBM System Z z10 — 64 bit with optimizations
Just to give the reader an impression how the optimization changes the measurement and to demonstrate that without optimizations the entropy is higher, here is the same CPU measurement without optimization.
figure pics/userspace-foldtime-s390-zvm.O0-time-dist-delta-730-hist.png
Figure 160 Lower boundary of entropy over folding loop in user space on IBM System Z z10 — 64 bit without optimizations
figure pics/userspace-foldtime-s390-zvm.O0-time-dist-delta-9300-hist.png
Figure 161 Upper boundary of entropy over folding loop in user space on IBM System Z z10 — 64 bit without optimizations

F.46.2 31 bit Word Size

The same hardware system is tested when compiling the test with 31 bit word size.
figure pics/userspace-foldtime-s390-zvm-31bit-time-dist-delta-260-hist.png
Figure 162 Lower boundary of entropy over folding loop in user space on IBM System Z z10 — 31 bit with optimizations
figure pics/userspace-foldtime-s390-zvm-31bit-time-dist-delta-2600-hist.png
Figure 163 Upper boundary of entropy over folding loop in user space on IBM System Z z10 — 31 bit with optimizations
As already indicated in the table at the beginning of Appendix F↑ the measurements of 31 bit compilations on an IBM System Z with optimizations show way to little entropy at the lower boundary. This is visible in the graphs above. Therefore, the tests are re-performed without optimizations, i.e. the compilation of a regular CPU Jitter random number generator. These new measurements are given in the graphs below. They show a significant improvement over the optimized code. The test without optimizations show a sufficient entropy that is significantly higher than the optimized code. Therefore, when using the non-optimized code, which is the case for the regular runtime of the RNG, the 31 bit word size on IBM System Z is considered appropriate.
figure pics/userspace-foldtime-s390-zvm-31bit.O0-time-dist-delta-1100-hist.png
Figure 164 Lower boundary of entropy over folding loop in user space on IBM System Z z10 — 31 bit without optimizations
figure pics/userspace-foldtime-s390-zvm-31bit.O0-time-dist-delta-13600-hist.png
Figure 165 Upper boundary of entropy over folding loop in user space on IBM System Z z10 — 31 bit without optimizations

F.47 Intel Core i7 2620M With RDTSC Instruction

The following tests were executed on the same hardware, but with different operating systems to allow analyzing the impact of the operating system on the CPU execution time jitter.
To avoid any interference from context switches and similar, the time stamp is gathered by fetching it directly from the CPU with the following assembler code:
/* taken from Linux kernel */
#define DECLARE_ARGS(val, low, high)    unsigned low, high
#define EAX_EDX_VAL(val, low, high)     ((low) | ((__u64)(high) << 32))
#define EAX_EDX_RET(val, low, high)     "=a" (low), "=d" (high)
​
static inline void jent_get_nstime(__u64 *out)
{
        DECLARE_ARGS(val, low, high);
        asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
        *out = EAX_EDX_VAL(val, low, high);
}
With this change, the CPU Jitter random number generator only uses the malloc and free functions during startup and termination from the operating systems and no other mechanism! The header file that provides this code is found in arch/jitterentropy-base-x86.h and is a drop-in replacement of jitterentropy-base-user.h.
As different compilers are used to generate the binaries for the different operating systems, all tests were compiled without optimization.
When comparing the different graphs, the following findings can be drawn:
Still, the final conclusion is that regardless of the used operating system, the CPU execution time jitter measurements indicate that the random number generator can be used on all systems.

F.47.1 Ubuntu Linux 13.04 with KDE

The following test was executed on an Ubuntu 13.04 with the graphical environment of KDE was running.
figure pics/userspace-foldtime-i7.O0-time-dist-delta-650-hist.png
Figure 166 Lower boundary of entropy over folding loop in user space on Ubuntu Linux 13.04 with KDE and Intel Core i7 2620M
figure pics/userspace-foldtime-i7.O0-time-dist-delta-8500-hist.png
Figure 167 Upper boundary of entropy over folding loop in user space on Ubuntu Linux 13.04 with KDE and Intel Core i7 2620M

F.47.2 Ubuntu Linux 13.04 without X11

The following test was executed on an Ubuntu 13.04 that was booted with the kernel command line option of init=/bin/bash. This option implies that no user space processes besides init and bash were running. Especially, no X11 windowing system was executing.
figure pics/userspace-foldtime-i7-2620M-ubuntu-noX-time-dist-delta-770-hist.png
Figure 168 Lower boundary of entropy over folding loop in user space on Ubuntu Linux 13.04 without X11 and Intel Core i7 2620M
figure pics/userspace-foldtime-i7-2620M-ubuntu-noX-time-dist-delta-10000-hist.png
Figure 169 Upper boundary of entropy over folding loop in user space on Ubuntu Linux 13.04 without X11 and Intel Core i7 2620M

F.47.3 OpenIndiana 151a7

The desktop version of OpenIndiana was installed which implied that X11 with Gnome was up and running.
figure pics/userspace-foldtime-i7-2620M-openindiana151-time-dist-delta-870-hist.png
Figure 170 Lower boundary of entropy over folding loop in user space on OpenIndiana 151a7 and Intel Core i7 2620M
figure pics/userspace-foldtime-i7-2620M-openindiana151-time-dist-delta-12000-hist.png
Figure 171 Upper boundary of entropy over folding loop in user space on OpenIndiana 151a7 and Intel Core i7 2620M

F.47.4 NetBSD 6.0

The LiveCD image for NetBSD was used that did not execute X11 and hardly any other user space applications.
figure pics/userspace-foldtime-i7-2620M-netbsd6.0-time-dist-delta-310-hist.png
Figure 172 Lower boundary of entropy over folding loop in user space on NetBSD 6.0 and Intel Core i7 2620M
figure pics/userspace-foldtime-i7-2620M-netbsd6.0-time-dist-delta-4500-hist.png
Figure 173 Upper boundary of entropy over folding loop in user space on NetBSD 6.0 and Intel Core i7 2620M

F.47.5 FreeBSD 9.1

The LiveCD image for FreeBSD was used that did not execute X11 and hardly any other user space applications.
figure pics/userspace-foldtime-i7-2620M-freebsd9.1-time-dist-delta-350-hist.png
Figure 174 Lower boundary of entropy over folding loop in user space on FreeBSD 9.1 and Intel Core i7 2620M
figure pics/userspace-foldtime-i7-2620M-freebsd9.1-time-dist-delta-4000-hist.png
Figure 175 Upper boundary of entropy over folding loop in user space on FreeBSD 9.1 and Intel Core i7 2620M

G BSI AIS 20 / 31 NTG.1 Properties

The CPU Jitter random number generator is believed to comply with the BSI AIS 20 / 31 NTG.1 properties as follows:
NTG.1.1 See comments in jent_entropy_init which show that statistical defects in the noise source, i.e. the timing jitter are identified. Covered
NTG.1.2 The jent_read_entropy function will only read from an entropy pool when the entire entropy collection loop is completed. As this loop fills the entropy pool with full entropy as described in chapter 5↑ supported by section 5.1↑, even in the worst case the CPU Jitter random number generator ensures to provide the requested entropy to the caller. Covered
NTG.1.3 The timing jitter is an unpredictable noise source as shown in chapter 2↑. The entropy in the noise source is magnified by mixing it into an entropy pool by retaining its entropy and ensuring that the entropy is conveyed to the caller by delivering a random bit string. Section 7↑ bullet 9 outlines the perfect forward and backward secrecy. Covered
NTG.1.4 To generate 128 bits from the RNG, we pull twice from entropy pool and concatenate the two random values — as it is done in jent_read_entropy. Let us assume the birthday paradox where a collision of 2⁶⁴ random values of size 128 bits occurs with the probability of
P(collisions after 264 random values) = 0.3935
The number of n pariwise different bit strings with a length of 128 bits is
A = 2128⋅(2128 − 1)⋅...⋅(2128 − n + 1)
The probability therefore is:
P(n) = (A)/(2128n)
Using the estimation
A > (2128 − n + 1)n
we have a lower boundary for P. Thus, we can calculate, for example, that 2⁵⁵ bit strings of length 128 bits following each other are pairwise different with probability of P>0.999996. Thus, when using k > 2⁵⁵ bit string with length 128 bits, and these strings show no collision with probability of P > 1 - e, with e 3.8e-6, the bit strings are pairwise different. This result satisfies even AVA_VAN.5. The analysis rests on the assumption that the bit stream follow an rectangular distribution, i.e. the bit stream is a white noise. The discussion of the statistical properties of the bit stream in chapter 4↑ demonstrates the property of a white noise. Covered
NTG.1.5 All of the following statistical tests pass even though the seed source is not processed with a cryptographic whitening function!
NTG.1.6 Using ent we get 7.9999... bits of entropy per byte. That value is the Shannon entropy of the input data. This value implies that we have more than 99.7% (Shannon) entropy in the output. In addition, the discussion in chapter 5↑ supports the statement that more than 99.7% entropy is present in the output. Covered

H Thanks

Special thanks for providing input as well as mathematical support goes to:
Also, special thanks go to all people executed the test applications to gather the results in appendix F↑.

I License

The implementation of the CPU Jitter random number generator, all support mechanisms, the test cases and the documentation are subject to the following license.
Copyright Stephan Müller <smueller@chronox.de>, 2013.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
  1. Redistributions of source code must retain the above copyright notice, and the entire permission notice in its entirety, including the disclaimer of warranties.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.
ALTERNATIVELY, this product may be distributed under the terms of the GNU General Public License, in which case the provisions of the GPL are required INSTEAD OF the above restrictions. (This clause is necessary due to a potential bad interaction between the GPL and the restrictions contained in a BSD-style copyright.)
THIS SOFTWARE IS PROVIDED ‘‘AS IS’’ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.