C Coding Basics

middings · Post by **middings** » Sat Feb 11, 2017 1:22 pm

IIRC, when the increment operator is applied to a pointer, the size of the data type pointed to is taken into account. This behavior is useful when using a pointer with a list of data. After being incremented, the pointer points to the next element in the list.

PurpleGurl · Post by **PurpleGurl** » Sat Feb 11, 2017 2:07 pm

middings wrote:IIRC, when the increment operator is applied to a pointer, the size of the data type pointed to is taken into account. This behavior is useful when using a pointer with a list of data. After being incremented, the pointer points to the next element in the list.

Yes, so in my assembly example, I should have used extended registers and added 4 to EBX. I am not sure, but I think INC only adds one in assembly, but in C, you are probably right.

I could use clarification on this fragment:

Code: Select all

*pulTransferLen = (ret == CR_SUCCESS) ? *pulLength : 0;

I know about this containing a ternary operator sequence. The basic format is:
a?b:c

That means if condition A is true, then B is executed, otherwise do C. That is a shortcut for an IF...ELSE block.

Where I can use the explanation is how the order of operations plays into the above code snippet. Does it mean the following? If ret is successful, then the value of putTransferLen is assigned the value of pulLength, and otherwise it gets set to zero.

Also, can someone share the difference in the nuances of the ternary operators between C and C++? The Wikipedia article says there are some difference in the languages in how this is handled, and that these operators can even be used in situations where IF...ELSE cannot be used in C++.

Post by **hbelusca** » Sat Feb 11, 2017 4:35 pm

Hi!

PurpleGurl wrote:Yes, so in my assembly example, I should have used extended registers and added 4 to EBX. I am not sure, but I think INC only adds one in assembly, but in C, you are probably right.

Yes. The processor (the guy who'll run the assembler code) knows nothing about the size of the variables pointed by pointers, so if you have at some point an INC ebx (and ebx could represent either a pointer to a char or to an unsigned long or to something else), the INC will just increment ebx by 1. So you need to be careful about that and know exactly the things you're manipulating (and their sizes) in ASM. But in higher languages such as C, the compiler allows you to perform pointer arithmetic and therefore takes into account the size of the variables pointed by the pointers (by using the type of the pointer), and therefore, if you have a:

Code: Select all

char* pstr = &whatever;

doing:

Code: Select all

pstr = pstr + 1;

will increment the pointer value by 1, while if you have:

Code: Select all

long* pstr = &whatever;

doing:

Code: Select all

pstr = pstr + 1;

will increment the pointer value by the size of 'long' on your platform (can be 4 bytes or 8 on x64) : sizeof(long). This would be equivalent to do:

Code: Select all

pstr = (long*)((ULONG_PTR)pstr + sizeof(long));

(this is basically what the compiler would compile the previous example under the hood), where I use the type 'ULONG_PTR' as a portable means to say an unsigned integer of size 4 bytes (for x86 platforms), or 8 bytes (for x64 platforms).

PurpleGurl wrote: I could use clarification on this fragment:
Code: Select all
*pulTransferLen = (ret == CR_SUCCESS) ? *pulLength : 0;
I know about this containing a ternary operator sequence. The basic format is:
a?b:c

That means if condition A is true, then B is executed, otherwise do C. That is a shortcut for an IF...ELSE block.

Where I can use the explanation is how the order of operations plays into the above code snippet. Does it mean the following? If ret is successful, then the value of putTransferLen is assigned the value of pulLength, and otherwise it gets set to zero.

Exactly: you set the value of *pulTransferLen (the value of the variable pointed by pulTransferLen) to either *pulLength (if ret is == CR_SUCCESS), or to zero otherwise.

PurpleGurl · Post by **PurpleGurl** » Sat Feb 11, 2017 5:50 pm

I've seen several checks for null pointers added lately. Now, I assume the significance of not dereferencing null pointers is that you don't want to read the wrong data (garbage in the current context), or worse, corrupt a memory location (such as the interrupt vector table, if it exists) during a write.

I don't know about protected mode, but I do know that in real mode, the first kilobyte is the interrupt vector table. It contains 256 pointers in the format of seg:offset, and both segment and offset are unsigned words. How I once tested for a mouse driver was to first load the segment and offset of vector entry 0x33 and first see if they were null, then test the address location pointed to by the vector table to see if that was initialized to null. If none of it was null, then I'd presume there was a driver and would call Int 0x33 with the query command. Then if it said it was present and active, my code would then make use of the mouse commands. So I first tested to see if the vector table entry for the mouse driver was null. If it were null, then there would be no runnable code nor any way to find it, so no use in proceeding. If there was a seemingly valid segment and offset, then I'd test the first opcode, since from what I could find, valid opcodes don't start with null. I didn't test against 0x90, which would be valid, but not sure why it would exist there. I mean, 0x90 is NOP (no operation), and its main use is for code alignment, though someone could use it for patching code when you want to use less or smaller opcodes, or as part of a crude delay loop. So if it were needed for alignment, wouldn't it make more sense for the driver to load the code starting with an EVEN address boundary and assign that to the vector table? But I only tested for null there and no other opcode checking. Then I took the risk of doing an INT 33h call with AX set to whatever the mouse driver presence detect command is. Then AX or at least AL should contain the status (if nothing hangs), and then one could test to see if the return result gives the explicit result that means the driver exists. If 0 or anything unexpected is returned, then assume the mouse driver is unusable.

OT humor: The CPU is a guy? I thought in modern ones, it is a collective team of girls.

Post by **Z98** » Sat Feb 11, 2017 6:08 pm

Dereferencing NULL in user mode code will crash the program.

PurpleGurl · Post by **PurpleGurl** » Sun Feb 12, 2017 6:32 am

Z98 wrote:Dereferencing NULL in user mode code will crash the program.

You mean like an illegal operation or a protection error?

I'm not aware of how the memory is laid out in protected or virtual mode, nor am I aware of what accesses requires Ring 0 or what can be done in Ring 3. I understand that segment 0 in real mode is the vector table, and above that is the BIOS Parameter Block, so I can understand why that would be off-limits as that is so intimate to the system.

Code: Select all

if (lRet != ERROR_SUCCESS && (!wcscmp(valueName, L"") || valueName == NULL))	     
if (lRet != ERROR_SUCCESS && (valueName == NULL || !valueName[0]))

How might changing to the 2nd line prevent null dereferencing? I imagine order of operation (involving the logical OR) has something to do with it. Is wcscmp a function? Is that where the null dereference might take place?

Post by **hto** » Sun Feb 12, 2017 10:54 pm

You mean like an illegal operation or a protection error?

Yes, and it can be caught by SEH…

I'm not aware of how the memory is laid out in protected or virtual mode, nor am I aware of what accesses requires Ring 0 or what can be done in Ring 3.

Both kernel and user-mode code runs in paging mode, where any page of the virtual address space can correspond to any page of the physical memory (or to nothing, as in the case of null address).

I understand that segment 0 in real mode is the vector table, and above that is the BIOS Parameter Block, so I can understand why that would be off-limits as that is so intimate to the system.

Right, user-mode programs has no access to this area. Even the kernel has no direct access to it by its usual address.

I imagine order of operation (involving the logical OR) has something to do with it. Is wcscmp a function? Is that where the null dereference might take place?

Yes, yes, yes.

PurpleGurl · Post by **PurpleGurl** » Sat Mar 18, 2017 7:54 pm

Can someone explain structure dereferencing to me? I had to look up what -> meant, so I know what it is, but not quite what it does.

Wikipedia says: Structure dereference ("member b of object pointed to by a") a->b

A recent snippet is:

Code: Select all

if (MasterQueryContext->FileObject->FsContext2 != (PVOID)DFS_DOWNLEVEL_OPEN_CONTEXT)

I've worked with arrays and matrices in BASIC and QuickBasic, but not anything similar in C.

Post by **hbelusca** » Sat Mar 18, 2017 9:11 pm

With 'a' a pointer to some structure containing a member 'b', the syntax:

Code: Select all

a->b

is equivalent to:

Code: Select all

(*a).b

which means, first the pointer 'a' is dereferenced, then we consider the member 'b' of the object "*a" pointed by 'a'.
So the code snippet you have pasted:

Code: Select all

if (MasterQueryContext->FileObject->FsContext2 != (PVOID)DFS_DOWNLEVEL_OPEN_CONTEXT)

can be rewritten as:

Code: Select all

if ((*((*MasterQueryContext).FileObject)).FsContext2 != (PVOID)DFS_DOWNLEVEL_OPEN_CONTEXT)

I guess you now see why the '->' notation was introduced, and why everybody uses it in these conditions...

(readability purposes).

PurpleGurl · Post by **PurpleGurl** » Sat Mar 18, 2017 10:54 pm

So is the struct as used in the example like a matrix in Basic where there are two dimensions rather than one? I ask because -> is used twice on the same structure.

An array in BASIC is like a variable, but with numbered elements. So if you do a DIM A$(10) if I remember right, then it is like having 10 string variables, and you can access each with a number after the string or variable name. Then a matrix is similar, but has width as well as depth. I mean, you could initialize a 10x10 matrix with 100 total elements using DIM A$(10,10) -- assuming I didn't forget the format and commands. So the difference is a list vs. a table.

middings · Post by **middings** » Sat Mar 18, 2017 11:19 pm

A matrix can be thought of as a type of structure, one limited to elements of all the same primitive data type.
Structures are more general than matrices. Structures are something like records in a database. Each element of a structure can be defined to hold a different data type and the data types are not limited to primitive data types.

PurpleGurl · Post by **PurpleGurl** » Sun Mar 19, 2017 8:11 pm

I see what you mean about primitive types. In BASIC/QuickBasic, you could only use arrays and matrices for strings, signed integers, signed long, signed single float, and signed double float. You couldn't use unsigned anything nor mix and match. I guess if you wanted to use an array of doubles and put string descriptors in there, you could if you messed around in assembler too. In QuickBasic, arrays and matrices would only hold string descriptors if the data type was string. That way the strings could vary in length while the array/matrix stayed a constant size. The descriptors held the lengths and addresses of the strings.

Post by **hbelusca** » Mon Mar 20, 2017 12:19 am

QuickBasic seems to have exactly what corresponds to C structures: this is called a "record type" : http://wjesus.org/EQbasic_9.htm

Reactionist · Post by **Reactionist** » Mon Mar 20, 2017 3:20 pm

hbelusca wrote:QuickBasic seems to have exactly what corresponds to C structures: this is called a "record type" : http://wjesus.org/EQbasic_9.htm

Almost all modern BASICs support compound data types such as variants, structures, unions, etc. A C-language "structure" (a.k.a. "struct") is what BASIC would normally refer to as "User Defined Type" (a.k.a. "UDT"). You can read up on those in simple terms in almost any BASIC's online manual. Here are some, to name but a few:

Visual Basic 6: http://www.vb6.us/tutorials/user-defined-types-udt-vb

PowerBASIC for Windows: http://www.powerbasic.com/help/pbwin/ht ... (UDTs).htm

thinBasic: http://www.thinbasic.com/public/product ... l/type.htm

and many, many more...

PurpleGurl · Post by **PurpleGurl** » Mon Mar 20, 2017 10:18 pm

hbelusca wrote:QuickBasic seems to have exactly what corresponds to C structures: this is called a "record type" : http://wjesus.org/EQbasic_9.htm

Not quite. Yes, the concept exists and I forgot all about that one. However, you can only use the QuickBasic types -- ie., the records can only contain string, signed integer, signed long, signed single float, and signed double float.

The lack of unsigned numbers was one of the reasons I supplemented my QuickBasic programming with assembly. For instance, DOS calls such as to the mouse driver used unsigned. Sure, you could use long integers to hold the number which might be up to 64k then use a formula to convert to signed integer, but that was messy and pulled in the long math (32-bit emulator library for 16-bit CPUs) and perhaps even the floating point emulator library. So it was easier to write my mouse routines in true Assembly. Sure, you could use QuickBasic's Interrupt and InterruptX commands, but that was messy with setting up the record structure to simulate the CPU registers. It was better just to write entire subroutines in Assembler and directly deal with unsigned numbers and interrupts.

I refrain from using the name QBasic as it is not the compiler version (QuickBasic), but the cut-down one that Microsoft included complimentary with MS-DOS. They are the same language, however.

I only brought up QuickBasic since that is what I'm familiar with and wanted a frame of reference. This record thing seems to explain what a struct in C is the best in my mind. Thank you.
---
I noticed this:

Code: Select all

static ULONG Warn; if (!Warn++) UNIMPLEMENTED;

I noticed that was changed from BOOLEAN. My question is why it is done this way and not a ternary operator sequence or an If...Else clause? I mean, I know about the operator magic here. Now the above code does not prevent the memory manager spam on a permanent basis either. It changes from every 256 times to 1 in every 4.2 million times.

C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Re: C Coding Basics

Who is online