CS019 from Brown University
modified 20/04/2024 22:53Intro
I’ve started doing this like 8 months ago, but then got a job and didn’t have the time/motivation to work on this. Today ([2023-10-19 Thu]), I’ve decided to continue working on this, since a lot of the things it teaches are things that I’ve used and have benefited from, as well as prerequisites to more interesting courses that I plan to continue doing (simple data structures, big-oh analysis, well-known cs conepts & so on). I’ll pick up from where I left off – big-oh analysis, specifically the big-oh lab, and the data scripting assignment. I’ll also be writing here as I work through the assignments and readings. Let’s go!
Big-oh notation
- essentially allows us to predict stuff about the performance of a program looking only at its source code; not all sciences have something like This
- big-oh is a function: input is the size of the initial input, output is how long the function took; big-oh is really just the upper bound (what’s the most time it can take)
- O(3k) is really the same as O(k), because the constant (3) doens’t matter, you can just have a bigger c constant. then there’s a question of “what is big enough”? is something like 100000*O(k) big enough to be bigger than O(k^2)? well not really, you just take O(100000*k) <= 100000*O(k^2); but why can’t you then take k as the constant? well, it’s not really a constant anymore: O(k * k) <= k * O(k^2), but since k is a variable number – not a constant, you can’t do that.
TODO Big-O lab
Sortacle
not sure if what i did passes all tests, but it passes all tests i could’ve came up with. however, i think i’m cheating by making is-valid
just sort and check?
oh, so reading OP’s solution, it seems that he just assumes that the functions will be sorted, and he just checks instead that everything matches..
ok, after reading the paper, it seems that my solution is actually incorect, as it will only work for testing one kind of sorting – ascending, based only on age. instead, the point was to test that ANY sorter function works. i should probably go back and change it…
- problem we’re trying to solve (something i’ve also have had to deal with at work) is that we write concrete tests for code that isn’t concrete (in this case, when ages of 2 people are equal, the ordering does not matter, but a test-case where it does would fail, because it would check the wrog thing);
- assignment is writing a function that check the relationships, instead of concrete results
- basically, we
- write the random generator
- then a function that checks pariwise that the elements from the first is <= the element from the other, for every element in the lists,
- cases here:
- [3, 1, 2] and [1, 2, 3]: is 3 > 1: yes -> pop 3 and 1, set 3 as curent max; is 1 > 2: no
- cases here:
- then a function that takes a function and calls
is-valid
on a list of 50 randomly-generated results, and the result of applying said function to that list (not sure if 50 is the best number)
Tables
Virtual Art Store
First assignment requires making a program that will find and calculate the price of a artwork using two tables. The code itself doesn’t differ a lot accross them, though, since the logic is essentially the same for all of the tasks.
All of the assignments require a “program plan” (they don’t really mention what that means, so I just did some thinking and came up with a pseudo-code implementation of what I’m supposed to do an went from there, researching the table functions that will allow me to filter data, how to access the values of the rows etc.)
https://dcic-world.org/2023-02-21/processing-tables.html#%28part._task-plans%29 Seems like good advice.
1
For this, we don’t need to handle exceptions in any way. This means we’re not supossed to specifically treat cases where the artwork or the currency isn’t in the table isn’t in the table.
- Find artwork by id in artwork table
- If artwork has same currency as the one we need, return
artwork["cost"]
- Otherwise, find the currency in the currency table where
currency["from-c"] \== artwork["currency"] && currency["to-c"] \== required-currency
. Returnartwork["cost"] * currency
.
Here’s my implementation:
fun get-first(predicate :: (Row -> Boolean), table :: Table) -> Row:
table.filter(predicate).row-n(0)
where:
get-first((lam(x): x["foo"] == 1 end), table: foo row: 1 end) is [T.raw-row: {"foo"; 1}]
# duplicate
get-first((lam(x): x["foo"] == 1 end), table: foo row: 1 row: 1 end) is [T.raw-row: {"foo"; 1}]
# not found
get-first((lam(x): x["foo"] == 2 end), table: foo row: 1 end) raises "row-n-too-large"
end
fun get-art-in-1(art-table :: Table, currency-table :: Table, art_id :: Number, curr :: String) -> Number:
artwork = get-first((lam(a): a["id"] == art_id end), art-table)
art-curr = artwork["currency"]
art-cost = artwork["cost"]
if art-curr == curr:
art-cost
else:
conv-curr = get-first((lam(c): (c["from-c"] == art-curr) and (c["to-c"] == curr) end), currency-table)
conv-curr["conv-rate"] * art-cost
end
where:
# we don't have to worry about these 2 cases:
# 1. the article isn't in the table
# get-art-in-1(artt(1, 10, "eur"), curt("usd", "mdl", 18), 2, "eur") is ???
# 2. the currency isn't in the table
# get-art-in-1(artt(1, 10, "eur"), curt("usd", "mdl", 18), 1, "usd") is ???
get-art-in-1(one-art-table, no-curr-table, 1, "usd") is 10
get-art-in-1(one-art-table, one-curr-table, 1, "mdl") is 10 * 18
end
I defined a generic get-first
that will not handle duplicates/missing rows in any way.
2
This is esentially the same as the previous task but we’re supposed to raise an exception if some rows are duplicated or no row is found. The only thing that changes here is the function we use to do the actual lookup – get-first-or-raise
.
- Find artwork by id in artwork table
- If not found or duplicated, raise an error
- If artwork has same currency as the one we need, return
artwork["cost"]
- Otherwise, find the currency in the currency table where
currency["from-c"] \== artwork["currency"] && currency["to-c"] \== required-currency
.
Return artwork["cost"] * currency
- If not found or duplicated, raise an error
fun get-first-or-raise(predicate :: (Row -> Boolean), table :: Table) -> Row:
matching-rows = table.filter(predicate)
if matching-rows.length() == 0:
raise("Row matching predicate not found")
else if matching-rows.length() > 1:
raise("Multiple rows matched the predicate")
else:
matching-rows.row-n(0)
end
where:
get-first-or-raise((lam(x): x["foo"] == 1 end), table: foo row: 1 end) is get-first((lam(x): x["foo"] == 1 end), table: foo row: 1 end)
get-first-or-raise((lam(x): x["foo"] == 1 end), table: foo end) raises "Row matching predicate not found"
get-first-or-raise((lam(x): x["foo"] == 1 end), table: foo row: 1 row: 1 end) raises "Multiple rows matched the predicate"
end
fun get-art-in-2(art-table :: Table, currency-table :: Table, art_id :: Number, curr :: String) -> Number:
artwork = get-first-or-raise((lam(a): a["id"] == art_id end), art-table)
art-curr = artwork["currency"]
art-cost = artwork["cost"]
if art-curr == curr:
art-cost
else:
conv-curr = get-first-or-raise((lam(c): (c["from-c"] == art-curr) and (c["to-c"] == curr) end), currency-table)
conv-curr["conv-rate"] * art-cost
end
where:
get-art-in-2(one-art-table, no-curr-table, 1, "usd") is get-art-in-1(one-art-table, no-curr-table, 1, "usd")
get-art-in-2(one-art-table, one-curr-table, 1, "mdl") is get-art-in-1(one-art-table, one-curr-table, 1, "mdl")
get-art-in-2(one-art-table, no-curr-table, 1, "mdl") raises "Row matching predicate not found"
end
I’m yet to find a way to generify this for all 3 cases.
3
This is a bit tricky because we can’t use our previously-defined functions, and can’t generify the structure of the function, because there’s additional logic in the case that the currency is not found the first time.
- Find artwork by id in artwork table
- If not found or duplicated, raise an error
- If artwork has same currency as the one we need, return
artwork["cost"]
- Otherwise, find the currency in the currency table where
currency["from-c"] \== artwork["currency"] && currency["to-c"] \== required-currency
. - If found: Return
artwork["cost"] * currency
. If duplicated, raise an error. - If not found: find a currency where
(c["to-c"] \== art-curr) and (c["from-c"] \== curr)
(reverse of the currency). If not found or duplicated, raise an error. Return(1 / rev-curr["conv-rate"]) * art-cost
fun get-art-in-3(art-table :: Table, currency-table :: Table, art_id :: Number, curr :: String) -> Number:
artwork = get-first-or-raise((lam(a): a["id"] == art_id end), art-table)
art-curr = artwork["currency"]
art-cost = artwork["cost"]
if art-curr == curr:
art-cost
else:
maybe-curr = currency-table.filter((lam(c): (c["from-c"] == art-curr) and (c["to-c"] == curr) end))
if maybe-curr.length() > 1:
raise("Multiple rows matched the predicate")
else if maybe-curr.length() == 1:
maybe-curr.row-n(0)["conv-rate"] * art-cost
else:
(1 / (get-first-or-raise((lam(c): (c["to-c"] == art-curr) and (c["from-c"] == curr) end), currency-table)["conv-rate"])) * art-cost
end
end
where:
get-art-in-3(one-art-table, no-curr-table, 1, "usd") is get-art-in-2(one-art-table, no-curr-table, 1, "usd")
get-art-in-3(one-art-table, one-curr-table, 1, "mdl") is get-art-in-2(one-art-table, one-curr-table, 1, "mdl")
get-art-in-3(one-art-table, no-curr-table, 1, "mdl") raises "Row matching predicate not found"
get-art-in-3(duplicate-art-table, no-curr-table, 1, "mdl") raises "Multiple rows matched the predicate"
get-art-in-3(one-art-table, duplicate-curr-table, 1, "mdl") raises "Multiple rows matched the predicate"
get-art-in-3(one-art-table, inverse-of-one-curr-table, 1, "mdl") is (1 / (1 / 18)) * 10
get-art-in-3(one-art-table, duplicate-inverse-of-one-curr-table, 1, "mdl") raises "Multiple rows matched the predicate"
end
4
We’re not supposed to write any code for this one, only a Testing Plan. This eans we need to think of a list of cases to test, which will guarantee that our solution is correct. Or, as they put in the Sortacle assignment: suppose you were asked to grade a solution while giving partial credit; (…) decompose the problem into a set of sub-problems, so that the combination of these solves the problem as a whole
- No row in either column should appear more than once. If they do, an exception should be raised
- If a direct conversion is available, it should be used (if the artwork’s currency is usd and we require usd, no conversion is needed)
- If there is a conversion from the currency of the artwork to the required currency, it should be used (if the artwork’s currency is usd and we require eur, but a usd -> eur conversion is found, it should be used)
- If there is a conversion from the required currency to the currency of the artwork, it should be used in reverse (if the artwork’s currency is usd and we require eur but a eur -> usd conversion is found, 1/conversion-rate shuold be used)
- If there is one or more intermediary conversions that would end with our required currency, it should be used (if the artwork’s currency is usd and we require eur, but a usd -> gbp and gbp -> eur conversion exists, it should be used). It can also include reverses (if a usd->gbp and eur->gbp conversionts are found, it still works as we can do eur->gbp and 1/gbp->eur)
- If no such conversions are found, an exception should get raised
5
Again, no code required. We need to compare the strengths and weaknesses of representing the currency as a datatype.
- Strengths:
- clearer purpose: we’re specific about what represents a currency and what currencies are allowed
- validation: if we use a currency that isn’t in the datatype, we’ll get an error about it, making sure all input of that column is valid
- Weaknesses:
- every time we want to use a new currency, we have to add it to the datatype
Titanic
We’re also required to provide a Program Plan for the assignments.
Program plan
- We’ll need a function that extracts the name and a function that extracts the title out of
raw-name
- We’ll need a function that counts the frequencies of all words. It should produce a list of
WordFreq
objects. - We’ll need to then go through the freqeuencies and sort them. I’d use a dict if I were in Python, but here we can use a custom datetype, something like
type NameFreq : name :: String, occurences :: Number end
. Then sorting and displaying it is easy.
We may not even need to know the fact that we’re dealing with a name, and rename NameFreq
to something generic, like WordFreq
. This way, we can also use it for titles and their frequencies, not only names.
- Joining everything together, we’ll have 3 functions: most-popular-male-names which will do something like
map freq.name for every freq in rsort-freq(find-freq(set(filter(get-name if sex is male))))
We’ll have the same thing for females.
map freq.name for every freq in rsort-freq(find-freq(set(filter(get-name if sex is female))))
And for titles we’ll just do find-freq(set(get-title for every person)))
Sets
How to represent sets as lists:
-
Problems
- sets != lists: set.size() != list.size() if list contains duplicates
- order doesn’t matter for sets; it does for lists
-
Main question: what do we do about duplicates?
-
Choice 1: don’t store duplicates
insert
will check if element is already in set (but only once, so it’s not O(k^2), but only O(k)) before inserting- this means
insert
is more expensive, but we getsize
andis-in
(member
) for free & they’re faster tooinsert
does ais-in
(O(k)) for each element, and then callslink
(O(c), c is constant); still O(k)
- for k elements,
length
andis-in
are O(k) (linear, meaning it grows directly proportional to the number of elements)
- this means
- better if we expect a lot of duplicates and we be calling
insert
andis-in
orsize
often
-
Choice 2: store duplicates
insert
islink
size
is constant time (justlink
)size
andis-in
will have to check for duplicatessize
andis-in
are going to be linear, but since we’re saving duplicates, k is going to be bigger, so it’ll take more timesize
will have to check if each element is in the list (O(k)) for each element (O(k)) so we have O(k^2)/quadratic (and since the set contains duplicates, k is going to be way bigger)
- better if we know we won’t have a lot of duplicates and we’ll be calling
insert
often andsize
rarely
-
When the performance depends on a sequence of operatinos big-Oh is too lose to allow measuring each operation, and we should instead measure the entire sequence.
-
I won’t be doing the excersises, but let’s think about them
- if we use the representation without duplicates
remove
will be linear (at worst), since we only need to traverse the list once and remove the element (a constant time operation), but we only traverse the entire list if the element we need to remove is the last element of the list. - if we use the representation with duplicates
remove
will also be linear, but the constant operation will need to be performed for all of the elements, so it’s linear at best (we need to traverse the entire list no matter what, to make sure there isn’t any other duplicate left) - it’s impossible for
one(s)
to produce the same thing asone(others(s))
becauseone
returns a random value, andothers
may return the same value, but we have only one occurence of each element, thus it doesn’t make sense forone
andothers
to return the same element. - Set<Any>) -> set<Any>=
- if we use the representation without duplicates