If you're capturing VHS, then the capture card is probably giving you 640x480.
For (somewhat) completeness, what's actually on the wire might naively be called 640x240 @ 59.94fps, but that's only because half of the lines are sent for each frame. Odd numbered lines for one frame, even numbered for the next, odd, even, odd, even, etc. You only get 29.97 full frames per second, but that's not really accurate either, because each of those partial frames, and in fact each line and even each "pixel" represents a different point in time.
(They're not really pixels. It's a continuous smear of brightness all the way across each line. It only so happens that sampling that brightness 640 times per line gives pretty close to square pixels in the computer.)
That signal is based on transmitting only a single value at any given point in time, while both the camera and the TV stay in sync with each other to draw that value at the same time - and therefore the same place - that the camera saw it. Interlacing, as described above, was used to send only half of the information - which made it much easier with the technology at the time - while still producing what appeared to the viewers' eyes on a phosphorescent CRT TV to be a complete picture.
Some de-interlacing algorithms are better than others at converting that "smeared-time" signal into a series of still images like we expect now, but all of them should at least give you a correctly-framed image, even if it "combs" badly when there's horizontal motion. That is, as long as the sync signals work.
If the sync signal doesn't get all the way across, then the picture lines up wherever it wants to, often wandering, and "wraps" around the edges of the screen.