TIP: Click on subject to list as thread! ANSI
echo: os2prog
to: Eddy Thilleman
from: Denis Tonn
date: 1998-12-30 16:11:08
subject: more power

Original from  Eddy Thilleman  to Denis Tonn on 12-27-1998
Original Subject: more power

                         ---------------------------------------

 DT> Memory is virtualized in OS/2. 
 
ET> Memory = virtual memory?

 Yes, but most programmers can ignore the distinction. 

 DT> This means that when you allocate memory, it does not always
 DT> exist in RAM. Portions of it will, as the program is actively
 DT> using that memory area, but unused (or infrequently used) will
 DT> not be backed by RAM pages until it is needed.
 
ET> As I understand now:
ET> ---------------------------------------------------------------------------
ET> allocation = reservation of memory addresses
ET> unused memory which is not used before is reserved but not really allocated
ET> (=allocation from the program's point of view means virtual allocation)
ET> memory which is in use or used before is allocated
ET> ---------------------------------------------------------------------------
ET> is that correct?

 I think you have it.. To be honest I can't tell for sure if you mean 
something subtly different in the above.. 
 
 DT> RAM is the portions of "memory" that are being actively used.
 
ET> RAM = physical memory? RAM pages are portions of physical memory?

 Yes. 

 DT> The concepts of "address space" and "memory"
are interlinked.
 
ET> Because "address space" can be anywhere in virtual memory?

 Yes and no. The Intel processor has a 4GB virtual (and physical, but 
not relevent) addresses at any instant in time. Not all addresses 
will be "valid" in a process (holes in the valid address range). Each 
process will have a different set of "holes" and a different set of 
RAM pages backing these virtual addresses (different data/code). 

 DT> This will be different depending on the code that is running,
 DT> the system address space, 
 
ET> Could you please define the term "system address space"?

 The kernel has 4GB selectors. It can "see" the whole system, 
including the shared and private address ranges of a process. The 
private and shared addresses ranges will have a different "valid set" 
and different data depending on the context (process switch). 
 The "system arena" is mapped across all processes and these addresses
will be the same in all process contexts. It is only reachable with a 
selector that has a "large" 4GB limit (kernel). 
 The "system address space" is the system arena plus the context of 
the current (active) process. 

 DT> a DLL with protected data (Warp 3 GA and OS/2 2.x only), 
 
ET> What's "protected data"? Is protected data not possible in 
ET> Warp 4? Isn't that used? If a DLL uses protected data, that 
ET> DLL won't run in Warp 4? 
 
 Not used in Warp 4. In Warp 3 GA (Fixpacks have changed this to be 
the same as Warp 4 GA), 32 bit applications are given a CS/DS/ES which
has a limit of 448MB. The area between 448MB and 512MB was reserved 
for "protected data" accessable by special DLLs. On init of these
DLLs, they are given another selector (which they had to save) that
has a limit of 512MB. 
 Since nobody except IBM was really using this area, it was decided to
eliminate it. All apps and DLL selectors now have a limit of 512MB. 
The "protected" data selector is still available to these DLLs, even 
though it is no longer "protected" from apps and other DLLs. The DLLs 
won't "break".. 

 DT> or an application (process name).
 
ET> application code within an OS/2 .EXE file, not in a DLL?

 Yes. 

 DT> Each process in OS/2 effectively has a separate LDT and page
 DT> tables. 
 
 DT> There is a guaranteed minimum of 64MB of "private" address space
 
ET> Why guaranteed? Is that 64MB an arbitrary amount of private 
ET> address space? Why "private" between quotes?

 Arbitrary number. It is faster to do context switching by direct copy
of page directory entries from the PTDA control block if the app uses
less than 64MB of private address space (16 directory entries maps 
64MB of RAM pages). It is a tradeoff. They could have stored the whole
page directory in the PTDA, but that would make the PTDA that much
larger (64 bytes vs 4K bytes of the page directory). 

 See below as to why I place "private" in quotes. Different processes 
that start with the same executable will share read only pages. 

 DT> Keeping the shared code/data at the same location in all
 DT> processes means the same pointers (and loader fixups) can be
 DT> used in all processes. 
 
ET> So pointers to shared code/data don't have to be changed? 

 Yes. 

ET> So pointers to a DLL don't have to be swapped when process 
ET> switching occurs, keeping speed as high as possible?

 More than that. The fixups are usually scattered all though an 
executable (DLL or EXE). If the DLL's (and shared data) were NOT at 
the same location in all processes, either the loader would have to be
invoked every time a task switch occured or all the DLL code would 
have to be swapped (swapping is 4K pages at a time). This can have a 
domino effect, since a DLL can reference another DLL. The performance 
impact would be considerable.. 

 DT> Now, there is a concept of "instance data" allocated in the
 DT> "shared address arena". 
 
ET> That's clever! While the code in a DLL and its data structure is the same for 
ET> all processes, the actual data this DLL code produces may 
ET> be different for each process and thus this data is kept 
ET> apart for each process in its own piece of memory but 
ET> occurs on the same address for each process.

 Yep.. But don't overuse/misuse it. As I recommended, it is better to 
only store pointers to privately allocated memory in an instance data 
area if the "per process instance data" is larger than 64K. This is 
modified to 4K when dealing with memory above the 512MB limit (Warp
server SMP and Aurora beta). 

 DT> There is also an area of the "shared arena" that is not
 DT> "tiled", thus allowing multiple small (16 bit only) DLL's to
 DT> occupy the same RAM page. 
 
ET> This is only useful for different 16-bit DLL's, this seems to have the same 
ET> function as for instance data in the shared arena: let 
ET> process specific (different) DLL code to occupy the same 
ET> address. If this was the same DLL code it would be shared 
ET> across processes anyway whithout swapping this.

 It's only real "use" is to decrease exhaustion of addresses in the 
shared arena and excessive use of RAM (even a 16 byte DLL takes up a 
whole 4K page otherwise). Multiple small 16 DLLs are "packed" onto a 
single page (and 64K allocation). The base address in the LDT is NOT 
on a 64K boundry for these DLL's (requiring a different technique to 
convert an address). 
 The MEMMAN=NOPACK option will stop this "packing" of small 16 bit 
DLLs in the shared arena (sometimes needed when tracing in these 
DLLs).

 DT> The aaaa is an example of a common read only piece of code in
 DT> the executable (same process name). 
 
ET> I thought that one process name is linked to one PID, not an executable file? 

 The process "name" is derived from the first executable to load into 
the process. The PID is the unique identifier for a process. Two 
copies of PMSHELL.EXE will have the same "name", but different PIDs. 

ET> What say above seems to imply that one process name is 
ET> linked to one executable file? Because I think that one 
ET> process name can only be exclusively linked to one PID or 
ET> exclusively linked to one executable file and not to one of 
ET> each at the same time, one of them excludes the other?

 The EXE loads into the first 64MB address range in a process. This 
includes both code and static data (defined in the source). The code 
(or data) at the same address in different processes maps to totally 
different RAM pages (thus different instructions) when running
different EXEs. 
 When the EXEs that started in the process are the same, the addresses
will be the same in both processes and the non-writable parts (code is
not modifiable) will always have the exact same "information" backing
it, the system can "optimize" the RAM usage by mapping the same RAM
page into both processes. In effect this is "shared memory" in the
private address range (which is why I placed private in quotes above).

 It all relates back to the RAM page backing the memory. If the page 
is the same across all processes that use it, then the memory is 
"shared". If not, then it is "private". All shared
memory (under OS/2) 
will have the same virtual address in all processes that use it. Keep 
in mind that not all processes will have acceess to all shared memory, 
only the ones that obtain (via API calls) access to the particular 
"shared" module or data. 
 Addresses in the "shared arena" will map the same memory in all
processes. Instance data is a special case of this, the address is the
same in all contexts, and the "meaning" of the data (structures) is
the same but the actual "state" can be different from process to
process (private memory). 
 Addresses in the "private arena" will map to unique memory in each
process. Same address, different code/data. Shared code is a special 
case of this, since two (or more) processes started from the same EXE
can share the code (read only) pages across processes (shared memory).

 DT> This is not controllable by the programmer, although the user
 DT> can affect this. 
 
ET> Why? This has to occur in runtime? How can a user affect 
ET> this and why not the programmer?

 This shared mapping of code pages in the private arena occurs at load 
time. The loader is in sole control of this. The way it determines if 
it can "share" these pages is via the full drive/path/filename 
(filespec) of the EXE being loaded into the processes. If they are 
exactly the same, it will share the read only pages. 
 If the user has 2 copies of the same exe in different directories,
and starts 2 processes from these 2 executables, the loader will not
"share" the RAM pages between the processes. 
 The only way the programmer can affect this is when loading the 
program under a debugger. The debugger may need to set breakpoints in
the code (thus breaking the read only requirement), or he can play 
"user" and load 2 copies of the EXE into processes from different 
directories. 

 DT> Warp server SMP and the Aurora beta allow applications to
 DT> "allocate" in the area above the 512MB limit 
 DT> (up to a config.sys controlled limit of 3.0GB total). 
 
ET> Why a config.sys controlled limit? Is there a system wide 
ET> price to be paid for allocating above the 512MB?

 Yes, there are control blocks required for each thread, each process 
and each memory allocation (and a lot more things). These memory 
addresses are allocated in resident or swappable memory in the kernel.
 An app address range of 3GB leaves only 1GB for the VDD/PDD/IFS code,
the kernel code, and all the control blocks. This can/will reduce the 
total amount of processes that can be running at the same time. There
are some architectural changes that can be implemented to reduce this
effect, but they have not been made (yet). 

 DT> specify that memory usable by 16:16 code be allocated in the
 DT> region below the 512MB address line (the "compatability
 DT> region") when it allocates it.
 
ET> So pointers to memory below the 512MB can be converted from 
ET> 32-bit to 16-bit format in exact the same way as in OS/2 
ET> v2, Warp 3 and 4?

 Yep. 

 DT> The memory above the 512MB line has a similar organization
 DT> into "private" and "shared" regions as the
memory below the
 DT> 512MB line.
 
ET> A similar organization? That implies that the memory above the 512MB has a 
ET> slightly different organization than the memory below the 
ET> 512MB?

 In the essences of the discussion so far they are the same. There are
some differences in the control structures when analyzing a system
dump (or debug kernel session). There isn't a fixed value for
"guaranteed" high private and high shared arenas (1/8 of the himem 
area each).
 The major differences as far as programmers are concerned is that
memory allocations above the 512MB limit are NOT tiled, always
starting on a 4K boundry, and single allocations (memory objects)
cannot span the 512MB boundry. Of course, the API calls relating to 
address and memory allocation (and queries) have been updated.. 



 There are a few more "details", mostly interesting only to deep level
system debuggers. Too much to go into detail here. There is an INF of
a debugging "handbook" that covers this (and a lot more topics)
available at the OS/2 developers site. Look for a file name of
SG244640.ZIP It does not cover any of the "himem" information though.. 

 


   Denis       

 All opinions are my very own, IBM has no claim upon them
. 
. 
.
 

 






--- Maximus/2 3.01
* Origin: T-Board - (604) 277-4574 (1:153/908)
SEEN-BY: 396/1 632/0 371 633/260 262 267 270 371 635/444 506 728 639/252
SEEN-BY: 670/218
@PATH: 153/908 8086 800 140/1 396/1 633/260 635/506 728 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.